Learning Objectives

Following this assignment students should be able to:

  • use version control to keep track of changes to code
  • collaborate with someone else via a remote repository

Reading

Exercises

  1. -- Set-up Git --

    This exercise and Version Control Basics assignment references the Data Management Review problem. It will not be necessary to complete the Data Management Review exercise for this assignment, though we encourage the review and self-evaluation of your problem solving wizardry.

    You’re continuing your analyses of house-elves with Dr. Granger. Unfortunately you weren’t using version control and one day your cat jumped all over your keyboard and managed to replace your analysis code with:

    asd;fljkzbvc;iobv;iojre,nmnmbveaq389320pr9c9cd
    
    ds8
    a
    d8of8pp
    

    before somehow hitting Ctrl-s and overwriting all of your hard word.

    Determined to not let this happen again you’ve committed to using git for version control.

    Install git for your operating system following the setup instructions. Then create a new project for this assignment in RStudio with the following steps:

    1. File -> New Project -> New Directory -> Empty Project
    2. Choose where to put your project
    3. Select Create a git repository
    4. If everything worked in the upper right corner of RStudio you should see a Git tab
  2. -- First Commit --

    This is a follow up to Set-up Git.

    Create a new file for your analysis named houseelf-analysis.R and add a comment at the top describing what the analysis is intended to do.

    Commit this file to version control with a good commit message. Then check to see if you can see this commit in the history.

  3. -- Importing Data --

    This is a follow up to First Commit.

    1. Download a copy of the main data file and save it to the a data subdirectory in your project folder.
    2. Commit this file to version control.
    3. Add some code to houseelf-analysis.R that imports the data into R.
    4. Commit these changes to version control
  4. -- Commit Multiple Files --

    This is a follow up to Importing Data.

    After talking with Dr. Granger you realize that houseelf_earlength_dna_data.csv is only the first of many files to come. To help keep track of the files you’ll need to number them, so rename the current file houseelf_earlength_dna_data_1.csv and change your R code to reflect this name change.

    Git will initially think you’ve deleted houseelf_earlength_dna_data.csv and created a new file houseelf_earlength_dna_data_1.csv. But once you click on both the old and new files to stage them, git will recognize what’s been done and indicate that it is renaming the files and indicate this with an R.

    In a single commit, add renaming of the data file and the changes to the R file.

  5. -- Adding a Remote --

    This is a follow up to Commit Multiple Files.

    Dr. Granger contacts you and lets you know that she’d like to be able to see what you’ve been doing and to share some more files with you. She’s been learning version control herself while on sabbatical and so she suggests that you use a shared git repository on the hosting site Github.

    1. Create an account on Github.
      • If you want to work in a public repository you can create one by clicking
        on the + button in the top right hand corner of the Github website. If you’d rather have a private repository for class, email your username to your professor and they will create a repository for you.
    2. Connect your local git repository to your remote repository on Github.
      • Click on the with the word More next to it and select Shell.
      • Go to the Github webpage for your repository and copy the two lines of code under push an existing repository from the command line.
      • Paste them into the Shell.
      • Press enter.
    3. Go back to the Github webpage for your repository and you should see your files.
  6. -- Pushing Changes --

    This is a follow up to Adding a Remote.

    Now that you’ve set up your remote repository for collaborating with Dr. Granger you’d better get to work since she can see everything you’re doing.

    1. Write a function to calculate the GC-content of a sequence, regardless of the capitalization of that sequence. (Hint: using the function str_to_lower or str_to_upper in the stringr package might be useful). This function should also be able to take a vector of sequences and return a vector of GC-contents (it probably does this without any extra work so give it a try).
    2. Commit this change.
    3. Once you’ve committed the change click the Push button in the upper right corner of the window and then click OK when git is done pushing.
    4. You should be able to see the changes you made on Github.
    5. Email your teacher to let them know that you’ve finished this exercise.
  7. -- Pulling and Pushing --

    This is a follow up to Pushing Changes.

    STOP: Wait until your teacher has told you they’ve updated your repository following the last exercise before doing this one.

    While you were working on your vectorized GC-content function, Dr. Granger (who has suddenly developed some pretty impressive computational skills) has been writing a vectorized ear length categorizer. To get it you’ll need to pull the most recent changes from Github.

    1. On the Git tab click on the Pull button with the blue arrow. You should see some text that looks like:

      From github.com:ethanwhite/gryffindorforever
         1e24ac8..815e600  master     -> origin/master
      Updating 1e24ac8..815e600
      Fast-forward
       testme.txt | 1 +
       1 file changed, 1 insertion(+)
      create mode 100644 youareawesome.txt
      
    2. Click OK.
    3. You should see the new function in your repository.

      get_size_class <- function(ear_length){
         # Calculate the size class for one or more earth lengths
         ear_lengths <- ifelse(ear_length > 10, "large", "small")
         return(ear_lengths)
      }
      
    4. Write some new code that creates a data frame with information about the individual ID, the earth length class, and the gc-content for each individual.
    5. Save this data frame as a csv file using write.csv()
    6. Commit the new code and the resulting csv file and push the results to Github.