Learning Objectives
Following this assignment students should be able to:
- Create an R Markdown document
- Read data into R directly from an Excel spreadsheet
- Embed R code for data analysis within a document
- Embed R code to generate and display figures and tables within a document
- Create a BibTex/BibLaTeX bibliography database
- Generate citations and references within an R Markdown document
Reading
-
Topics
- R Markdown
-
Readings
Exercises
-- R Markdown Basics --
Dr. Raun has asked you to do some preliminary analysis of some data from a study conducted to investigate the interactions between foliar N application and frost. You decide to put some of your new R skills to use and produce the report using R Markdown.
To do so, open a new R Markdown file within R Studio. Use “Efaw Foliar N Freeze Mitigation Report” for the title and your name for the author. The new file will contain an example template for an R Markdown document. Click on the Knit HTML button to compile your R Markdown document. Compare the resulting output with the R Markdown script. Read through the template R Markdown file and the resulting output for an explanation of how R Markdown works.
Next, create a subdirectory named data and download the data you will use (Efaw_Freeze2014.xlsx). Because the data are in an MS Excel spreadsheet, you will want to use the handy
read.xlsx()
function to read in the data. To do so, you will need to install the xlsx R package first.-
Within your R Markdown file, create a new R code block with the chunk name “read_data” that loads the xlsx package using
library()
and reads in the data using theread.xlsx()
function. (Hint: You will need to use sheetIndex=1. Also, don’t modify the Excel file to read it in. Use the startRow and endRow options withinread.xlsx()
to read in the header row and data.) -
Add a line to the code block that changes the variable for the change in NDVI to all positive values.
-
Add a line to the code block that uses the
as.factor()
function to convert the treatment column from a numeric column to a factor column. -
Add a line that summarizes your data frame using the
summary()
function. -
The displayed R code and summary output are useful as you were checking that the data were read in properly, but you probably don’t want them in your final report. Use the
echo=FALSE
andinclude=FALSE
options to the code chunk to keep these from being displayed in your document. -
Below your “read_data” code chunk, add a header entitled Introduction to your document using the
##
notation and add a sentence below introducing the overall purpose of the report.
-
-- R Markdown Data Analysis --
This is a follow-up to R Markdown Basics.
You now have all the data read into R and you are ready to begin your analysis. You consult with a colleague who has done this kind of work for Dr. Raun before and are delighted to discover that she uses R, too. She shares a function with you that she wrote to automate parts of the analysis:
analyze <- function(formula,data){ library(agricolae,quietly=TRUE) fit.lm <- lm(formula,data=data) fit.anova <- anova(fit.lm) fit.test <- LSD.test(y=data[,as.character(formula[[2]])], trt=data[,as.character(formula[[3]])], DFerror=tail(fit.anova$Df,1), MSerror=tail(fit.anova$`Mean Sq`,1)) return(fit.test$groups) }
You notice that the function loads the
agricolae
package, which you have not used before. You make a note that you’ll need to install it before you can use the function.-
Add a section to your R Markdown file entitled Results and copy the function code into a new R code chunk with the name “define_analysis_function”. Knit your document and inspect the output. Set the code chunk options so that neither R code nor any output are displayed for this code chunk.
-
Create a new code chunk named “run_analysis” and add R code to use the
analyze()
function to calculate treatment means and mean groupings using Fisher’s LSD for post-freeze decline in NDVI and harvested yield. Assign each of these to a separate object. Set the code chunk options so that neither R code nor any output are displayed for this code chunk. -
Use in-line R code chunks to write a sentence for each variable that states the minimum and maximum mean values for NDVI decline, rounded to the 2^nd^ decimal place, and for harvested yield, rounded to the nearest bushel.
-
-- R Markdown Figures --
This is a follow-up to R Markdown Data Analysis.
Now that you have completed your analysis, you want to visualize the analysis for more user-friendly presentation. You decide to use the
ggplot()
to create some bar graphs that will display the data.-
Add an R code chunk named “NDVI_plot” with the code needed to produce a barplot of the treatment means for the post-freeze NDVI decline. Be sure to label the axes appropriately.
-
Use
geom_text()
to add labels to the bars to indicate significant differences between treatment means. Use thevjust
,nudge_y
, andsize
arguments forgeom_text()
to size and position the labels appropriately. -
Add a sensible caption to the figure (including a figure number) using the
fig.cap
chunk option. -
Add a new R code chunk named “yield_plot” with R code to produce an equivalent figure for the yield data. Be sure to include appropriate axis labels, bar labels for statistically significant differences, and a suitable figure caption.
-
Set the
echo
chunk option for both code chunks so that no R code is displayed in the final output. -
Add a sentence or two to your Results section describing the figures and referencing them appropriately.
-
-- R Markdown Tables --
This is a follow-up to R Markdown Figures.
Because you are unsure how Dr. Raun will prefer the data to be presented, you decide to include the data in a table, as well.
-
Within a new code chunk named “results_table”, create a data frame that has columns with treatment means for post-freeze NDVI decline and yield. Set the column names to “NDVI Decline” and “Yield”. Set the row names to abbreviations for the corresponding treatments (i.e. not just treatment number). (Hint: I suggest you round the values appropriately and then convert these to text using
paste()
. You also will need to add a dash in your NDVI column for the missing treatment level) -
Add the statistical significance labels to your results table using the
paste0()
function. -
Modify your code to add Markdown notation to make the significance labels superscript.
-
Use the
kable()
function to generate your table. -
Use the
caption
argument to thekable()
function to add a suitable caption for the results table. -
Set the code chunk options so that your R code is not displayed.
-
-- R Markdown References --
This is a follow-up to R Markdown Tables.
Your next steps as you write up your report are to document your methods with proper citations and to discuss your results while citing appropriate references. Fortunately, R Markdown provides a useful framework for handling citations. Create a new text file with R Studio using the name “bibliography.bib” (“.bib” is the file extension for a BibTeX or BibLaTeX bibliography database). This is where you will put the bibliographic information for each citation.
-
Add a Methods section to your document and describe the basic methods used for your analysis. Be sure to mention that you used the R statistical programming language and mention the names of the R packages you used to analyze the data and produce the figures.
-
In your console, type the command
citation()
. The resulting output will include a BibTeX entry for R that begins with@Manual
and is enclosed by curly brackets{}
. Copy the BibTeX entry into your “bibliography.bib” file and add a suitable citation key (such asRproject
) between the first{
and the,
. Using the same citation key, add a citation to your R Markdown file in the Methods section immediately after where you mention using R. -
Return to the console and use the
citation()
function to generate BibTeX entries for each of the packages used in your analysis. Add a unique citation key to each entry and use the key in your Methods section to cite each package appropriately. -
Within your R Markdown file, scroll to the end of the Results section and start a new section entitled Discussion. You recall that Dr. Raun mentioned a paper he published on late-season foliar N application for wheat in 2002 so you decide to compare those results to your current results. Go to Google Scholar and search for the paper using the query
author:Raun late-season foliar nitrogen wheat
. You are confident you will cite this paper, so click theCite
link under the appropriate entry in the search results. Then click theBibTeX
link at the bottom of the pop-out window. Copy the resulting BibTeX entry into your “bibliography.bib” file. Feel free to change or shorten the citation key, if you like. -
Briefly read over the paper and write one or two sentences in the Discussion section comparing your results to those of the paper. Be sure to include a citation for the paper.
-
Once you knit your document, you notice that you’ve forgotten to add a header for the References section. Add this as the last line of your R Markdown file.
-