Install R and RStudio, manage packages, and write your first R script.
R is the language; RStudio is the integrated development environment (IDE) that makes working with R far more productive. Install them in order:
After installation, open RStudio. You will see four panes: Source (top-left), Console (bottom-left), Environment (top-right), and Files/Plots/Help (bottom-right).
.Rproj). It sets your working directory automatically and keeps files organized. Create one via File > New Project.
You can type R commands directly in the Console for quick exploration. For reproducible work, write code in a script (.R file) and run lines with Ctrl+Enter (Windows/Linux) or Cmd+Enter (macOS).
# This is a comment — R ignores everything after # print("Hello, R!") # Basic arithmetic 2 + 3 # 5 10 / 3 # 3.333... 2 ^ 10 # 1024 17 %% 5 # 2 (modulo)
Use the <- operator (preferred) or = to assign values to variables. R has several basic data types:
# Assignment x <- 42 # numeric (double) name <- "Alice" # character flag <- TRUE # logical count <- 5L # integer (note the L suffix) # Check types class(x) # "numeric" class(name) # "character" is.logical(flag) # TRUE typeof(count) # "integer"
<- is the R convention. It avoids confusion with named arguments inside function calls, e.g., rnorm(n = 10).
R's power comes from its package ecosystem. Packages are installed once from CRAN and loaded each session:
# Install a package (run once) install.packages("tidyverse") # Load it into the current session library(tidyverse) # Check installed packages installed.packages()[, "Package"] |> head(10) # Update all packages update.packages(ask = FALSE)
install.packages() downloads from the internet. library() loads an already-installed package into memory. Forgetting library() is the most common beginner error.
Understanding the four RStudio panes will speed up your workflow considerably. Each pane serves a distinct purpose and can be resized or rearranged via Tools > Global Options > Pane Layout.
| Pane | Location | Purpose | Key Tabs |
|---|---|---|---|
| Source | Top-left | Write and edit scripts, R Markdown, and Quarto documents | Script files, data viewer |
| Console | Bottom-left | Interactive R session: run commands one at a time | Console, Terminal, Background Jobs |
| Environment | Top-right | View objects in memory (variables, data frames, functions) | Environment, History, Connections, Tutorial |
| Output | Bottom-right | Browse files, view plots, read help, manage packages | Files, Plots, Packages, Help, Viewer |
Many students wonder whether to learn R or Python. The honest answer: learn both eventually, but R excels in statistical analysis and data visualization. Here is a side-by-side comparison:
| Feature | R | Python |
|---|---|---|
| Primary strength | Statistical computing, visualization | General-purpose programming, ML deployment |
| Indexing | 1-based | 0-based |
| Data frames | Built-in (data.frame, tibble) | Requires pandas |
| Visualization | ggplot2 (grammar of graphics) | matplotlib, seaborn, plotly |
| Statistical modeling | lm(), glm(), 18,000+ CRAN packages | statsmodels, scikit-learn |
| IDE | RStudio (purpose-built) | VS Code, PyCharm, Jupyter |
| Package manager | install.packages() from CRAN | pip install from PyPI |
| Assignment | <- (convention) | = |
| Pipe operator | |> or %>% | Method chaining with . |
An RStudio Project (.Rproj file) anchors your working directory to the project folder. This means you can use relative paths (e.g., "data/survey.csv") instead of absolute paths (e.g., "C:/Users/Alice/Desktop/project/data/survey.csv"), making your code portable across computers.
# Check your current working directory getwd() # Set it manually (avoid this — use .Rproj instead) setwd("/path/to/my/project") # Recommended project structure: # my_project/ # my_project.Rproj # data/ <- raw and processed data # scripts/ <- R scripts # output/ <- tables, figures # docs/ <- notes, reports
setwd("C:/Users/Alice/...") will break on anyone else's computer. Use an .Rproj file and relative paths. If you must reference a path, use the here package: here::here("data", "survey.csv").
R provides several native formats for saving objects. Understanding when to use each one will save you time.
# Save a single object as .rds (recommended) my_data <- data.frame(x = 1:5, y = rnorm(5)) saveRDS(my_data, file = "output/my_data.rds") # Load it back (assign to any name) loaded_data <- readRDS("output/my_data.rds") # Save the entire workspace as .RData (all objects) save.image(file = "my_workspace.RData") # Load workspace (restores all saved objects) load("my_workspace.RData") # Save specific objects save(my_data, file = "selected_objects.RData")
saveRDS()/readRDS() saves a single object and lets you assign it to any variable name on load. save()/load() preserves original variable names, which can silently overwrite objects in your environment. For reproducible workflows, .rds is safer.
Most R packages live on CRAN (the Comprehensive R Archive Network), but some cutting-edge or in-development packages are only available on GitHub. You can install those with the remotes package.
# Standard CRAN install install.packages("dplyr") # Install from GitHub (development version) install.packages("remotes") # one-time setup remotes::install_github("tidyverse/dplyr") # Install from GitHub with a specific branch or tag remotes::install_github("user/repo@v2.0.0") remotes::install_github("user/repo", ref = "develop") # Check package version packageVersion("dplyr")
Every R beginner encounters the same handful of errors. Knowing what they mean will save hours of frustration.
| Error Message | Likely Cause | Fix |
|---|---|---|
could not find function "xyz" | Package not loaded | Run library(package_name) |
object 'xyz' not found | Variable not defined or typo | Check spelling, run earlier code |
unexpected ')' or unexpected '}' | Mismatched parentheses or braces | Count opening/closing brackets |
non-numeric argument to binary operator | Math on character data | Check class() of your variables |
there is no package called 'xyz' | Package not installed | Run install.packages("xyz") |
replacement has X rows, data has Y | Vector length mismatch in assignment | Verify lengths with length() |
Console shows + instead of > | Incomplete expression (missing quote or paren) | Press Esc and fix the line |
+ instead of >, R is waiting for you to finish a command. This usually means you forgot a closing parenthesis, bracket, or quote. Press Esc to cancel and re-examine your code.
# Help on a function ?mean help("mean") # Search for a topic ??"linear model" # See function arguments args(lm) # Browse vignettes for a package vignette(package = "dplyr")
# my_first_script.R # A quick taste of R: generate data, compute stats, plot # Generate 200 random normal values set.seed(2024) data <- rnorm(n = 200, mean = 50, sd = 10) # Summary statistics mean(data) sd(data) summary(data) # Quick histogram hist(data, col = "#276DC3", main = "Simulated Data", xlab = "Value", breaks = 20)
Create an RStudio Project called r_guide_practice. Inside it, write a script that:
my_namebirth_yearcat("Age:", age, "\n")Install the palmerpenguins package and load it. Run head(penguins) to preview the data. How many rows and columns does it have? (Use dim(penguins).)
Create an organized project folder called ch1_practice with subdirectories data/, scripts/, and output/. In your script, create a data frame with 5 rows and 3 columns of your choice. Save it as an .rds file in the output/ folder, then read it back and verify the contents match using identical().
Intentionally trigger three different errors from the troubleshooting table above (e.g., call a function from an unloaded package, reference an undefined variable, and leave a parenthesis open). For each, copy the error message and write a one-sentence explanation of what went wrong and how you fixed it.
<- for assignment. R has four basic types: numeric, character, logical, integer.install.packages() downloads; library() loads. You need both.?function_name and vignettes to learn any package.