Learning Objectives

Following this assignment students should be able to:

  • ensure proficiency and understanding of course learning objectives
  • affirm course success by completing challenge exercises

Exercises

  1. Vector Review

    The number of birds banded at a series of sampling sites has been counted by your field crew. The data are organized in two vectors. The first vector contains the alphanumeric code for each site and the second vector contains the number of birds banded per site. Cut and paste the vectors into your assignment and then answer the following questions by printing them to the screen.

    sites <- c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8",
    "B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "C1", "C2",
    "C3", "C4", "D1", "D2", "D3", "D4", "D5", "D6")
    
    counts <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36, 25, 9,
    38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32)
    
    1. How many sites are there?
    2. How many birds were counted at the 7th site?
    3. How many birds were counted at the last site?
    4. What is the total number of birds counted across all sites?
    5. What is the average number of birds seen on a site?
    6. What is the total number of birds counted on sites with codes beginning with C? Don’t just identify sites by eye, in the real world there could be hundreds or thousands of sites.
    [click here for output]
  2. Data Management Review

    Dr. Granger is interested in studying the relationship between the length of house-elves’ ears and aspects of their DNA. This research is part of a larger project attempting to understand why house-elves possess such powerful magic. She has obtained DNA samples and ear measurements from a small group of house-elves to conduct a preliminary analysis (prior to submitting a grant application to the Ministry of Magic) and she would like you to conduct the analysis for her (she might know everything there is to know about magic, but she sure doesn’t know much about computers). She has placed the data in a file on the web for you to download.

    Write an R script that:

    • Imports the data
    • For each row in the dataset checks to see if the ear length is "large" (>10 cm) or "small" (<=10 cm) and determines the GC-content of the DNA sequence (i.e., the percentage of bases that are either G or C)
    • Stores this information in a table where the first column has the ID for the individual, the second column contains the string "large" or the string "small" depending on the size of the individuals ears, and the third column contains the GC content of the DNA sequence.
    • Exports this table to a csv (comma separated values) file titled grangers_analysis.csv.
    • Prints the average GC-contents for large-eared elves and small-eared elves to the screen.

    As you start to work on more complex problems it’s important to break them down into manageable pieces. One natural way to break this list of things down is: 1) import data; 2) determine size category; 3) determine GC-content; 4) calculate the size category and GC-content for each row of data and store it; 5) export this data to csv; 6) calculate and print the average GC-content for large and small ears.

    Use functions to break the code up into manageable pieces. Remember to document your code well.

    There are several different specific approaches you could take to doing calculations for each row of data. One is to use dplyr using the rowwise() function (here’s an example). Another is to loop over the rows in the data.frame using

    for (row in 1:nrow(data)){...}

    A third is to break the data.frame into vectors and use sapply().

    Ask your instructor if you have questions about the best choices.

    [click here for output] [click here for output]
  3. Unit Conversion Challenge

    Measures of the amount of energy used by biological processes are critical to understanding many aspects of biology from cellular physiology to ecosystem ecology. There are many different units for energy use and their utilization varies across methods, research areas, and lab groups. Write a function, convert_energy_units(energy_value, input_unit, output_unit) to convert units between the following energy values - Joules(J), Kilojoules(KJ), Calories(CAL), and Kilocalories (KCAL; this is unit used for labeling the amount of energy contained in food). A Kilojoule is 1000 Joules, a Calorie is 4.1868 Joules, a Kilocalorie is 4186.8 Joules. An example of a call to this function would look like:

    energy_in_cal <- 200
    energy_in_j <- convert_energy_units(energy_in_cal, "CAL", "J")
    

    Make this function more efficient by linking if else statements. If either the input unit or the output unit do not match the five types given above, have the function print - “Sorry, I don’t know how to convert “ + the name of the unit provided. Instead of writing an individual conversion between each of the different currencies (which would require 12 if statements) you could choose to convert all of the input units to a common scale and then convert from that common scale to the output units. This approach is especially useful since we might need to add new units later and this will be much easier using this approach.

    Use your function to answer the following questions:

    1. What is the daily metabolic energy used by a human (~2500 KCALs) in Joules.
    2. How many times more energy does a common seal use than a human? The common seal uses ~52,500 KJ/day (Nagy et al. 1999). Use the daily human metabolic cost given above.
    3. How many ergs (ERG) are there in one kilocalorie. Since we didn’t include the erg conversion this should trigger our ‘don’t know how to convert’ message
    [click here for output]
  4. Tree Biomass Challenge

    Understanding the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions.

    We don’t normally measure the mass of a tree, but take a measurement of the diameter or circumference of the trunk and then estimate mass using equations like M = 0.124 * D2.53.

    1. Estimate tree biomass for each species in a 96 hectare area of the Western Ghats in India using the following steps.

    • Download the data and load the data into R.
    • Write a function that takes a vector of tree diameters as an argument and
      returns a vector of tree masses. (Thanks to vector math this function is basically just the equation above).
    • Create a dplyr pipeline that
      • Adds a new column (using mutate and your function) that contains masses calculated from the diameters
      • Groups the data frame into species using the SpCode column
      • And then calculates biomass (i.e., the sum of the masses) for each species (using summarize)
      • Stores the result as a data frame
    • Display the resulting data frame

    2. Plot a histogram of the species biomass values you just calculated.

    • Use 10 bins in the histogram (using the bins argument)
    • Use a log10 scale for the x axis (using scale_x_log10)
    • Change the x axis label to Biomass and the y axis label to Number of Species (using labs)
    [click here for output] [click here for output]

Assignment submission & checklist