Chapter 9: Reproducible Research

R Markdown and Quarto for literate programming, reports, and presentations.

9.1 Why Reproducibility Matters

A reproducible analysis lets anyone (including your future self) re-run the code and get the same results. R Markdown and Quarto weave narrative text, code, and output into a single document, eliminating copy-paste errors between your analysis and your report.

9.2 R Markdown Basics (.Rmd)

An .Rmd file has three parts: YAML header, Markdown text, and code chunks.

---
title: "Quarterly Sales Report"
author: "Analyst"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float: true
    theme: flatly
---

## Introduction

This report analyzes quarterly sales data.

```{r setup, include=FALSE}
library(tidyverse)
library(knitr)
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```

```{r summary-stats}
mtcars |>
  group_by(cyl) |>
  summarize(avg_mpg = mean(mpg)) |>
  kable(digits = 1)
```
Rendering Click "Knit" in RStudio or run rmarkdown::render("report.Rmd") from the console. The output format is controlled by the output: field in the YAML header.

9.3 Code Chunk Options

OptionDefaultPurpose
echoTRUEShow the code in the output
evalTRUERun the code
includeTRUEInclude code AND output (FALSE hides both)
messageTRUEShow messages (e.g., from library loading)
warningTRUEShow warnings
fig.width7Figure width in inches
fig.height5Figure height in inches
cacheFALSECache chunk results for faster re-rendering

9.4 Inline R Code

Embed computed values directly in your prose:

The average mpg is `r round(mean(mtcars$mpg), 1)` across
`r nrow(mtcars)` cars in the dataset.

This renders as: "The average mpg is 20.1 across 32 cars in the dataset." Values update automatically when data changes.

9.5 Output Formats

# In the YAML header, change output: to one of:
output: html_document     # interactive HTML
output: pdf_document      # LaTeX PDF (requires tinytex)
output: word_document     # Microsoft Word
output:
  ioslides_presentation:  # HTML slides
    widescreen: true
Tip: Install tinytex for PDF output install.packages("tinytex"); tinytex::install_tinytex() gives you a lightweight LaTeX distribution that works seamlessly with R Markdown.

9.6 Quarto: The Next Generation (.qmd)

Quarto is R Markdown's successor, supporting R, Python, Julia, and Observable JS. The syntax is nearly identical but uses a different engine.

---
title: "Sales Analysis"
format: html
execute:
  echo: true
  warning: false
---

```{r}
library(ggplot2)
ggplot(mpg, aes(displ, hwy)) + geom_point()
```

Key Differences from R Markdown

9.7 Quarto Chunk Options Syntax

```{r}
#| label: fig-scatter
#| fig-cap: "Displacement vs Highway MPG"
#| fig-width: 8
#| fig-height: 5

ggplot(mpg, aes(displ, hwy, color = drv)) +
  geom_point() +
  theme_minimal()
```

9.8 Cross-Referencing Figures and Tables

Quarto has built-in cross-referencing. Label a figure chunk with #| label: fig-name and reference it in text with @fig-name. Tables use #| label: tbl-name and @tbl-name.

As shown in @fig-scatter, there is a negative relationship
between engine displacement and highway MPG.
The summary statistics are reported in @tbl-summary.

```{r}
#| label: fig-scatter
#| fig-cap: "Displacement vs Highway MPG by Drive Type"

ggplot(mpg, aes(displ, hwy, color = drv)) +
  geom_point() +
  theme_minimal()
```

```{r}
#| label: tbl-summary
#| tbl-cap: "Summary Statistics by Drive Type"

mpg |>
  dplyr::group_by(drv) |>
  dplyr::summarize(n = dplyr::n(), mean_hwy = mean(hwy)) |>
  knitr::kable()
```
Cross-reference prefixes Quarto requires specific prefixes for cross-references to work: fig- for figures, tbl- for tables, eq- for equations, and sec- for sections. Without the correct prefix, the reference will not render as a clickable link.

9.9 Bibliographies with .bib Files

Both R Markdown and Quarto support automatic citation management via BibTeX .bib files. Add references to your YAML header and cite them in text.

---
title: "My Analysis"
bibliography: references.bib
csl: apa.csl                   # optional: citation style
format: html
---

## Literature Review

Previous work has shown that R is effective for
reproducible research [@wickham2023; @xie2024].
As noted by @wickham2023, tidy data principles
simplify the analysis pipeline.

## References

The corresponding references.bib file:

@book{wickham2023,
  author = {Wickham, Hadley and Cetinkaya-Rundel, Mine and Grolemund, Garrett},
  title = {R for Data Science},
  year = {2023},
  publisher = {O'Reilly Media},
  edition = {2nd}
}

@book{xie2024,
  author = {Xie, Yihui and Dervieux, Christophe and Riederer, Emily},
  title = {R Markdown Cookbook},
  year = {2024},
  publisher = {Chapman and Hall/CRC}
}
Getting .bib entries Export BibTeX entries from Google Scholar (click the cite icon, then "BibTeX"), Zotero, or Mendeley. Many journal websites also provide BibTeX downloads. CSL (Citation Style Language) files control formatting; download styles from zotero.org/styles.

9.10 Parameterized Reports

Parameterized reports let you render the same template with different inputs, producing separate reports for each department, region, or time period.

---
title: "Sales Report"
params:
  region: "West"
  year: 2024
output: html_document
---

```{r}
# Access parameters with params$region, params$year
df <- sales_data |>
  dplyr::filter(region == params$region, year == params$year)
cat("Report for", params$region, "in", params$year, "\n")
```

# Render from R with custom parameters:
# rmarkdown::render("report.Rmd",
#                   params = list(region = "East", year = 2023))

# Render multiple reports in a loop:
# for (r in c("West", "East", "South", "North")) {
#   rmarkdown::render("report.Rmd",
#     params = list(region = r),
#     output_file = paste0("report_", r, ".html"))
# }

9.11 Custom CSS in R Markdown

You can customize the appearance of HTML output by including a custom CSS file or inline styles.

---
title: "Styled Report"
output:
  html_document:
    css: custom.css
    theme: null        # disable default theme to use only custom CSS
---

/* custom.css */
body { font-family: "Georgia", serif; max-width: 800px; margin: auto; }
h1, h2 { color: #1a365d; }
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
th { background-color: #276DC3; color: white; }
.highlight { background-color: #eff6ff; padding: 10px; border-radius: 5px; }

9.12 Quarto Presentations (revealjs)

Quarto can produce beautiful slide decks using the revealjs framework, directly from your analysis code.

---
title: "Quarterly Results"
author: "Data Team"
format:
  revealjs:
    theme: moon
    slide-number: true
    transition: slide
    code-fold: true
    footer: "Confidential"
---

## Key Findings

- Revenue increased 12% YoY
- Customer retention improved

```{r}
#| fig-height: 4
ggplot(economics, aes(date, unemploy)) +
  geom_line(color = "#276DC3") +
  theme_minimal()
```

## Detailed Analysis {.smaller}

Use `{.smaller}` for smaller text on dense slides.

. . .

Content after `. . .` appears incrementally.
Quarto presentation themes Available revealjs themes include: default, moon, dark, simple, solarized, blood, night, serif, league, beige, sky. You can also create custom themes with SCSS files. Use code-fold: true to let audience members optionally expand code blocks.

9.13 Quarto Websites and Blogs

Quarto can generate multi-page websites and blogs from a collection of .qmd files. This is useful for research group sites, course materials, or project documentation.

# _quarto.yml (project configuration)
project:
  type: website

website:
  title: "My Research Lab"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: about.qmd
        text: About
      - href: publications.qmd
        text: Publications

format:
  html:
    theme: cosmo
    css: styles.css

# For a blog, use type: website with a listing page:
# index.qmd with "listing:" in the YAML to auto-list posts
# Posts go in a posts/ subdirectory

# Render the entire site:
# quarto render

# Preview locally:
# quarto preview

9.14 Publication Tables with gt and flextable

Beyond kable(), the gt package creates highly customizable HTML/LaTeX tables, and flextable is the best option for Word output.

library(gt)

# Publication-quality table with gt
mtcars |>
  dplyr::group_by(cyl) |>
  dplyr::summarize(
    n = dplyr::n(),
    avg_mpg = mean(mpg),
    sd_mpg = sd(mpg),
    avg_hp = mean(hp)
  ) |>
  gt() |>
  tab_header(
    title = "Vehicle Performance by Cylinder Count",
    subtitle = "Data from mtcars dataset"
  ) |>
  fmt_number(columns = c(avg_mpg, sd_mpg, avg_hp), decimals = 1) |>
  cols_label(
    cyl = "Cylinders", n = "N",
    avg_mpg = "Mean MPG", sd_mpg = "SD MPG", avg_hp = "Mean HP"
  ) |>
  tab_source_note("Source: Motor Trend, 1974")

# Save as Word-compatible table with flextable
library(flextable)

ft <- mtcars |>
  dplyr::slice_head(n = 5) |>
  dplyr::select(mpg, cyl, hp, wt) |>
  flextable() |>
  set_header_labels(mpg = "MPG", cyl = "Cylinders",
                     hp = "Horsepower", wt = "Weight") |>
  theme_vanilla() |>
  autofit()

# Render in R Markdown/Quarto or save directly:
# save_as_docx(ft, path = "table.docx")
gt vs. flextable vs. kableExtra gt excels at HTML and LaTeX output with a grammar-of-tables philosophy. flextable is the best choice when your primary output is Word (.docx), as it renders natively in Word documents. kableExtra extends kable() with styling options and works well for quick HTML/LaTeX tables.

9.15 Tables in R Markdown/Quarto

# Simple kable table
library(knitr)
kable(head(mtcars, 5), caption = "First 5 cars")

# Styled table with kableExtra
library(kableExtra)
kable(head(mtcars, 5)) |>
  kable_styling(bootstrap_options = c("striped", "hover"))

Exercises

Exercise 9.1

Create an R Markdown document that loads the palmerpenguins dataset, produces a summary table grouped by species (using kable()), and includes a scatterplot of bill length vs. bill depth. Render it to HTML with a table of contents.

Exercise 9.2

Convert your R Markdown file from Exercise 9.1 to a Quarto .qmd file. Use the #| chunk option syntax and add a figure caption with cross-referencing (@fig-scatter). Render to both HTML and PDF.

Exercise 9.3

Create a parameterized R Markdown report that takes a species parameter (default: "Adelie"). The report should filter the palmerpenguins dataset to that species, show a summary table using gt, and include a histogram of body mass. Render the report three times, once for each species, producing three separate HTML files.

Exercise 9.4

Build a Quarto revealjs presentation about the mtcars dataset. Include at least 4 slides: (1) a title slide, (2) a slide with a summary table, (3) a slide with a ggplot scatter plot using cross-referencing, and (4) a slide with incremental bullet points summarizing key findings. Use the moon theme and add slide numbers.

External Resources

Key Takeaways

← Chapter 8: ML / tidymodels Chapter 10: Project →