Learning Objectives

Following this assignment students should be able to:

  • use joins to combine tables in SQL
  • nest the results of one query as the input for another
  • understand the basic rules of tidy data

Reading

Exercises

  1. -- JOIN 1 --

    The Plots table in the Portal database can be joined to the Surveys table by joining plot_id to plot_id and the Species table can be joined to the Surveys table by joining species_id to species_id.

    The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the natural population dynamics of all of the rodent species at the site, so write a query that returns the year, month, day, genus and species of every individual captured on the Control plots. Choose only rodent species (i.e., species for which the taxa field in the species table is Rodent) and exclude all individuals that have not been identified to genus (i.e., species for which the species_id field in the species table is UR). For easier viewing, sort the results alphabetically by genus and species and then chronologically. Save this query as data_from_controls.

    [click here for output]
  2. -- JOIN 2 --

    You are curious about what other kinds of animals get caught in the Sherman traps used to census the rodents. Write a query that returns a list of the genus, species, and taxa (from the species table) for non-rodent individuals that are caught on the Control plots. Non-rodents are indicated in the taxa column of the Species table. You are only interested in which species are captured, so make this list unique (only one line for each species). Please also sort the results by genus and species. Save this query as non_rodents_on_controls.

    [click here for output]
  3. -- JOIN 3 --

    We want to do an analysis comparing the size of individuals on the control plots to the Long-term Krat Exclosures. Write a query that returns the year, genus, species, weight and the plot_type for all cases where the plot type is either Control or Long-term Krat Exclosure. Be sure to choose only rodents and exclude individuals that have not been identified to genus as in Joins 1. Remove any records where the weight is missing. Save this query as size_comparison_controls_vs_krat_exclosures.

    [click here for output]
  4. -- JOIN 4 --

    Write a query that displays the total number of rodent individuals sampled on each plot_type. Save this query as individuals_per_plot_type.

    Note that the number of plots varies among plot types so this number isn’t very informative. We’ll learn how to take this into account soon.

    [click here for output]
  5. -- Nesting Queries --

    Write a query that returns the average number of individuals sampled on each plot type on a per plot basis. In other words, how many individuals are sampled on average on a plot of a given plot type? So, if 500 individuals are sampled on a plot type that has 5 plots, then the result should be 100 individuals.

    To do this you will need to combine multiple queries. It will be easiest for first figure out which queries you will need, then write the first query and make sure it is working, and then add the second query.

    Save this query as per_plot_individuals_per_plot_type.

    [click here for output]
  6. -- Real Portal Data --

    A lot of real data isn’t very tidy, mostly because most scientists aren’t taught about how to structure their data in a way that is easy to analyze.

    Download a messy version of some of the Portal Project data. Note that there are multiple tabs in this spreadsheet.

    Think about what could be improved about this data. In a text file:

    1-5. Describe five things about this data that are not tidy and how you could fix each of those issues.

    6. Could this data easily be imported into a database in its current form?

    7. Do you think it’s a good idea to enter the data like this and clean it up later, or to have a good data structure for analysis by the time data is being entered? Why?