Chapter 1: Getting Started with R

Install R and RStudio, manage packages, and write your first R script.

1.1 Installing R and RStudio

R is the language; RStudio is the integrated development environment (IDE) that makes working with R far more productive. Install them in order:

  1. Download and install R from cran.r-project.org External
  2. Download RStudio Desktop (free) from posit.co External

After installation, open RStudio. You will see four panes: Source (top-left), Console (bottom-left), Environment (top-right), and Files/Plots/Help (bottom-right).

Tip: RStudio Projects Always work inside an RStudio Project (.Rproj). It sets your working directory automatically and keeps files organized. Create one via File > New Project.

1.2 The Console and Scripts

You can type R commands directly in the Console for quick exploration. For reproducible work, write code in a script (.R file) and run lines with Ctrl+Enter (Windows/Linux) or Cmd+Enter (macOS).

# This is a comment — R ignores everything after #
print("Hello, R!")

# Basic arithmetic
2 + 3       # 5
10 / 3      # 3.333...
2 ^ 10      # 1024
17 %% 5     # 2  (modulo)

1.3 Assignment and Basic Types

Use the <- operator (preferred) or = to assign values to variables. R has several basic data types:

# Assignment
x <- 42              # numeric (double)
name <- "Alice"      # character
flag <- TRUE         # logical
count <- 5L           # integer (note the L suffix)

# Check types
class(x)       # "numeric"
class(name)    # "character"
is.logical(flag)  # TRUE
typeof(count)  # "integer"
Why <- instead of =? Both work for assignment, but <- is the R convention. It avoids confusion with named arguments inside function calls, e.g., rnorm(n = 10).

1.4 Installing and Loading Packages

R's power comes from its package ecosystem. Packages are installed once from CRAN and loaded each session:

# Install a package (run once)
install.packages("tidyverse")

# Load it into the current session
library(tidyverse)

# Check installed packages
installed.packages()[, "Package"] |> head(10)

# Update all packages
update.packages(ask = FALSE)
Common gotcha install.packages() downloads from the internet. library() loads an already-installed package into memory. Forgetting library() is the most common beginner error.

1.5 RStudio Panes Walkthrough

Understanding the four RStudio panes will speed up your workflow considerably. Each pane serves a distinct purpose and can be resized or rearranged via Tools > Global Options > Pane Layout.

PaneLocationPurposeKey Tabs
SourceTop-leftWrite and edit scripts, R Markdown, and Quarto documentsScript files, data viewer
ConsoleBottom-leftInteractive R session: run commands one at a timeConsole, Terminal, Background Jobs
EnvironmentTop-rightView objects in memory (variables, data frames, functions)Environment, History, Connections, Tutorial
OutputBottom-rightBrowse files, view plots, read help, manage packagesFiles, Plots, Packages, Help, Viewer
Tip: Keyboard shortcuts Press Ctrl+1 to jump to the Source pane, Ctrl+2 for the Console. Ctrl+Shift+M inserts the pipe operator. See all shortcuts with Alt+Shift+K (Windows/Linux) or Option+Shift+K (macOS).

1.6 R vs. Python: A Quick Comparison

Many students wonder whether to learn R or Python. The honest answer: learn both eventually, but R excels in statistical analysis and data visualization. Here is a side-by-side comparison:

FeatureRPython
Primary strengthStatistical computing, visualizationGeneral-purpose programming, ML deployment
Indexing1-based0-based
Data framesBuilt-in (data.frame, tibble)Requires pandas
Visualizationggplot2 (grammar of graphics)matplotlib, seaborn, plotly
Statistical modelinglm(), glm(), 18,000+ CRAN packagesstatsmodels, scikit-learn
IDERStudio (purpose-built)VS Code, PyCharm, Jupyter
Package managerinstall.packages() from CRANpip install from PyPI
Assignment<- (convention)=
Pipe operator|> or %>%Method chaining with .
When to use which? For coursework in statistics, econometrics, or research methods, R is typically the better choice. For building web applications, working with APIs, or deploying machine learning models in production, Python is more common. Many researchers use both: R for analysis and visualization, Python for automation and deployment.

1.7 Project Organization with .Rproj Files

An RStudio Project (.Rproj file) anchors your working directory to the project folder. This means you can use relative paths (e.g., "data/survey.csv") instead of absolute paths (e.g., "C:/Users/Alice/Desktop/project/data/survey.csv"), making your code portable across computers.

# Check your current working directory
getwd()

# Set it manually (avoid this — use .Rproj instead)
setwd("/path/to/my/project")

# Recommended project structure:
# my_project/
#   my_project.Rproj
#   data/           <- raw and processed data
#   scripts/        <- R scripts
#   output/         <- tables, figures
#   docs/           <- notes, reports
Never use setwd() in shared scripts Hardcoded paths like setwd("C:/Users/Alice/...") will break on anyone else's computer. Use an .Rproj file and relative paths. If you must reference a path, use the here package: here::here("data", "survey.csv").

1.8 Saving and Loading Data

R provides several native formats for saving objects. Understanding when to use each one will save you time.

# Save a single object as .rds (recommended)
my_data <- data.frame(x = 1:5, y = rnorm(5))
saveRDS(my_data, file = "output/my_data.rds")

# Load it back (assign to any name)
loaded_data <- readRDS("output/my_data.rds")

# Save the entire workspace as .RData (all objects)
save.image(file = "my_workspace.RData")

# Load workspace (restores all saved objects)
load("my_workspace.RData")

# Save specific objects
save(my_data, file = "selected_objects.RData")
Prefer .rds over .RData saveRDS()/readRDS() saves a single object and lets you assign it to any variable name on load. save()/load() preserves original variable names, which can silently overwrite objects in your environment. For reproducible workflows, .rds is safer.

1.9 CRAN vs. GitHub Packages

Most R packages live on CRAN (the Comprehensive R Archive Network), but some cutting-edge or in-development packages are only available on GitHub. You can install those with the remotes package.

# Standard CRAN install
install.packages("dplyr")

# Install from GitHub (development version)
install.packages("remotes")   # one-time setup
remotes::install_github("tidyverse/dplyr")

# Install from GitHub with a specific branch or tag
remotes::install_github("user/repo@v2.0.0")
remotes::install_github("user/repo", ref = "develop")

# Check package version
packageVersion("dplyr")
When to install from GitHub? Install from GitHub when you need a bug fix or feature that has not yet been released to CRAN. Be aware that GitHub versions may be less stable. For coursework and reproducible research, prefer CRAN versions unless you have a specific reason to use the development build.

1.10 Common Errors and Troubleshooting

Every R beginner encounters the same handful of errors. Knowing what they mean will save hours of frustration.

Error MessageLikely CauseFix
could not find function "xyz"Package not loadedRun library(package_name)
object 'xyz' not foundVariable not defined or typoCheck spelling, run earlier code
unexpected ')' or unexpected '}'Mismatched parentheses or bracesCount opening/closing brackets
non-numeric argument to binary operatorMath on character dataCheck class() of your variables
there is no package called 'xyz'Package not installedRun install.packages("xyz")
replacement has X rows, data has YVector length mismatch in assignmentVerify lengths with length()
Console shows + instead of >Incomplete expression (missing quote or paren)Press Esc and fix the line
The + prompt trap If the console shows + instead of >, R is waiting for you to finish a command. This usually means you forgot a closing parenthesis, bracket, or quote. Press Esc to cancel and re-examine your code.

1.11 Getting Help

# Help on a function
?mean
help("mean")

# Search for a topic
??"linear model"

# See function arguments
args(lm)

# Browse vignettes for a package
vignette(package = "dplyr")

1.12 Your First Complete Script

# my_first_script.R
# A quick taste of R: generate data, compute stats, plot

# Generate 200 random normal values
set.seed(2024)
data <- rnorm(n = 200, mean = 50, sd = 10)

# Summary statistics
mean(data)
sd(data)
summary(data)

# Quick histogram
hist(data, col = "#276DC3", main = "Simulated Data",
     xlab = "Value", breaks = 20)

Exercises

Exercise 1.1

Create an RStudio Project called r_guide_practice. Inside it, write a script that:

  1. Assigns your name to a variable my_name
  2. Assigns your birth year to birth_year
  3. Computes your approximate age and prints it with cat("Age:", age, "\n")

Exercise 1.2

Install the palmerpenguins package and load it. Run head(penguins) to preview the data. How many rows and columns does it have? (Use dim(penguins).)

Exercise 1.3

Create an organized project folder called ch1_practice with subdirectories data/, scripts/, and output/. In your script, create a data frame with 5 rows and 3 columns of your choice. Save it as an .rds file in the output/ folder, then read it back and verify the contents match using identical().

Exercise 1.4

Intentionally trigger three different errors from the troubleshooting table above (e.g., call a function from an unloaded package, reference an undefined variable, and leave a parenthesis open). For each, copy the error message and write a one-sentence explanation of what went wrong and how you fixed it.

External Resources

Key Takeaways

← Index Chapter 2: Data Structures →