Programming Fundamentals 2

Learning Objectives

Following this assignment students should be able to:

understand and use the basic relational operators

use an if statement to evaluate conditionals

understand how to decompose complex problems

Reading

Topics
- Conditionals
- Problem decomposition
Readings
- Software Carpentry lesson on making choices

Exercises

-- Choice Operators --

Create the following variables.
```
w <- 10.2
x <- 1.3
y <- 2.8
z <- 17.5
dna1 <- "attattaggaccaca"
dna2 <- "attattaggaacaca"
```
Use them to print whether or not the following statements are

TRUE or FALSE.
1. w is greater than 10
2. w + x is less than 15
3. x is greater than y
4. 2 * x + 0.2 is equal to y
5. dna1 is the same as dna2
6. dna1 is not the same as dna2
7. The number of occurrences of the base t is the same in dna1 and dna2
8. w is greater than x, and y is greater than z
9. x times w is between 13.2 and 13.5
10. dna1 is longer than 5 bases, or z is less than w * x
11. The combined length of dna1 and dna2 is greater than or equal to 30
12. (w + x + y) divided by the logarithm (base 10) of 100 is equal to 7.15
13. The GC content (which is always a percentage) of dna1 is not the same as the GC content of dna2
[click here for output]
-- Modify the Code 2 --

The following function is intended to check if two geographic points are close to one another. If they are it should return TRUE. If they aren’t, it should return FALSE. Two points are considered near to each other if the absolute value of the difference in their latitudes is less than one and the absolute value of the difference in their longitudes is less than one.
1. Fill in the _________ in the function to make it work.
```
near <- function(lat1, long1, lat2, long2){
    # Check if two geographic points are near each other 
    if ((abs(lat1 - lat2) < 1) & (_________){
        near <- TRUE
    } else {
        near <- _________
    }
    return(near)
}
```
2. Improve the documentation for the function so that it is clear what near means and what output the user should expect.
3. Check if Point 1 (latitude = 29.65, longitude = -82.33) is near Point 2 (latitude = 41.74, longitude = -111.83).
4. Check if Point 1 (latitude = 29.65, longitude = -82.33) is near Point 2 (latitude = 30.5, longitude = -82.8).
5. Create a new version of the function that improves it by allowing the user to pass in a parameter that sets what “near” means. To avoid changing the existing behavior of the function (since some of your lab mates are using it already) give the parameter a default value of 1.
6. Improve the documentation for the new function so that it reflects this new behavior
7. Check if Point 1 (latitude = 48.86, longitude = 2.35) is near Point 2 (latitude = 41.89, longitude = 2.5), when near is set to 7.
[click here for output]
-- Function with Choices --

Write a function that concatenates and prints:

The ultimate answer to the ultimate question of life, the universe, and everything is: XXX.

Where XXX is either a string or a number that is passed to the function as a parameter. Use this function to print out the answer if the input parameter is 42, but don’t actually do the printing from inside the function (think about why printing from outside the function might generally be more useful).

If you don’t understand why this question is fun/funny you can Google it or, better yet, actually read Hitchhiker’s Guide to the Galaxy, which is one of the funniest books ever written. There is also a very valuable programming lesson to this story.
[click here for output]
-- DNA or RNA --

Write a function, dna_or_rna(sequence), that determines if a sequence of base pairs is DNA, RNA, or if it is not possible to tell given the sequence provided. Since all the function will know about the material is the sequence the only way to tell the difference between DNA and RNA is that RNA has the base Uracil ("u") instead of the base Thymine ("t"). Have the function return one of three outputs: "DNA", "RNA", or "UNKNOWN".
1. Use the function and a for loop to print the type of the sequences in the following list.
2. Use the function and sapply to print the type of the sequences in the following list.
```
sequences = c("ttgaatgccttacaactgatcattacacaggcggcatgaagcaaaaatatactgtgaaccaatgcaggcg", "gauuauuccccacaaagggagugggauuaggagcugcaucauuuacaagagcagaauguuucaaaugcau", "gaaagcaagaaaaggcaggcgaggaagggaagaagggggggaaacc", "guuuccuacaguauuugaugagaaugagaguuuacuccuggaagauaauauuagaauguuuacaacugcaccugaucagguggauaaggaagaugaagacu", "gauaaggaagaugaagacuuucaggaaucuaauaaaaugcacuccaugaauggauucauguaugggaaucagccggguc")
```
Optional: For a little extra challenge make your function work with both upper and lower case letters, or even strings with mixed capitalization
[click here for output]
-- Data Management Review --

Dr. Granger is interested in studying the relationship between the length of house-elves’ ears and aspects of their DNA. This research is part of a larger project attempting to understand why house-elves possess such powerful magic. She has obtained DNA samples and ear measurements from a small group of house-elves to conduct a preliminary analysis (prior to submitting a grant application to the Ministry of Magic) and she would like you to conduct the analysis for her (she might know everything there is to know about magic, but she sure doesn’t know much about computers). She has placed the data in a file on the web for you to download.

Write an R script that:
- Imports the data
- For each row in the dataset checks to see if the ear length is "large" (>10 cm) or "small" (<=10 cm) and determines the GC-content of the DNA sequence (i.e., the percentage of bases that are either G or C)
- Stores this information in a table where the first column has the ID for the individual, the second column contains the string "large" or the string "small" depending on the size of the individuals ears, and the third column contains the GC content of the DNA sequence.
- Exports this table to a csv (comma separated values) file titled grangers_analysis.csv.
- Prints the average GC-contents for large-eared elves and small-eared elves to the screen.
As you start to work on more complex problems it’s important to break them down into manageable pieces. One natural way to break this list of things down is: 1) import data; 2) determine size category; 3) determine GC-content; 4) calculate the size category and GC-content for each row of data and store it; 5) export this data to csv; 6) calculate and print the average GC-content for large and small ears.

Use functions to break the code up into manageable pieces. Remember to document your code well.

There are several different specific approaches you could take to doing calculations for each row of data. One is to use dplyr using the rowwise() function (here’s an example). Another is to loop over the rows in the data.frame using

for (row in 1:nrow(data)){...}

A third is to break the data.frame into vectors and use sapply().

Ask your instructor if you have questions about the best choices.
[click here for output] [click here for output]

Data Science for Agriculture

Assignment

Learning Objectives

Reading

Exercises

-- Choice Operators --

-- Modify the Code 2 --

-- Function with Choices --

-- DNA or RNA --

-- Data Management Review --