Build layered, publication-quality graphics using the grammar of graphics.
Every ggplot2 plot follows a consistent template: data + aesthetic mappings + geometric objects. Additional layers control scales, facets, coordinate systems, and themes.
library(ggplot2) # Template ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + <GEOM_FUNCTION>()
# mpg dataset: engine displacement vs. highway mpg ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() # Map color and shape to categorical variables ggplot(mpg, aes(x = displ, y = hwy, color = drv, shape = drv)) + geom_point(size = 2.5, alpha = 0.7) # Add a smoothed trend line ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(alpha = 0.5) + geom_smooth(method = "lm", se = TRUE)
# Time series with economics dataset ggplot(economics, aes(x = date, y = unemploy)) + geom_line(color = "#276DC3", linewidth = 0.8) + labs(title = "US Unemployment Over Time", x = "Date", y = "Unemployed (thousands)")
# geom_bar() counts observations ggplot(mpg, aes(x = class)) + geom_bar(fill = "#276DC3") # geom_col() plots precomputed values avg_mpg <- mpg |> dplyr::group_by(class) |> dplyr::summarize(avg_hwy = mean(hwy)) ggplot(avg_mpg, aes(x = reorder(class, avg_hwy), y = avg_hwy)) + geom_col(fill = "#276DC3") + coord_flip() + labs(x = "Vehicle Class", y = "Mean Highway MPG")
# Histogram ggplot(mpg, aes(x = hwy)) + geom_histogram(binwidth = 2, fill = "#276DC3", color = "white") # Density overlay by group ggplot(mpg, aes(x = hwy, fill = drv)) + geom_density(alpha = 0.4)
Facets split a plot into panels by one or two categorical variables.
# facet_wrap — one variable ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + facet_wrap(~ drv, nrow = 1) # facet_grid — two variables ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + facet_grid(drv ~ cyl)
p <- ggplot(mpg, aes(x = displ, y = hwy, color = class)) + geom_point(size = 2) p + labs( title = "Engine Size vs. Fuel Efficiency", subtitle = "Data from EPA, 1999-2008", x = "Displacement (L)", y = "Highway MPG", color = "Vehicle Class", caption = "Source: mpg dataset" ) + theme_minimal(base_size = 13)
theme_classic(), theme_bw(), theme_light(), or install the ggthemes package for more options like theme_economist().
geom_smooth() and stat_smooth() are interchangeable and add fitted regression lines or loess curves to scatterplots. Controlling the method, formula, and confidence interval gives you publication-ready overlays.
# Linear regression line with 95% confidence band ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(alpha = 0.5) + stat_smooth(method = "lm", formula = y ~ x, color = "red") # Polynomial fit ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(alpha = 0.5) + stat_smooth(method = "lm", formula = y ~ poly(x, 2), color = "blue") # Separate regression lines by group ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point(alpha = 0.5) + stat_smooth(method = "lm", se = FALSE)
Scale functions control how data values map to visual properties. The naming pattern is scale_{aesthetic}_{type}().
# Continuous axis: custom breaks and labels ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + scale_x_continuous( breaks = seq(1, 7, by = 1), limits = c(1, 7) ) + scale_y_continuous( breaks = seq(10, 50, by = 10), labels = function(x) paste0(x, " mpg") ) # Discrete scale: reorder and relabel ggplot(mpg, aes(x = drv, y = hwy)) + geom_boxplot() + scale_x_discrete(labels = c("4" = "4WD", "f" = "Front", "r" = "Rear")) # Log scale (useful for skewed data like income or counts) ggplot(diamonds, aes(x = carat, y = price)) + geom_point(alpha = 0.1) + scale_x_log10() + scale_y_log10()
Box plots show the distribution of a continuous variable by groups. Adding jittered points reveals the raw data underneath.
# Basic boxplot ggplot(mpg, aes(x = class, y = hwy)) + geom_boxplot(fill = "#dbeafe", outlier.shape = NA) + geom_jitter(width = 0.2, alpha = 0.4, color = "#276DC3") + coord_flip() + labs(title = "Highway MPG by Vehicle Class", x = NULL, y = "Highway MPG") + theme_minimal()
y directly instead of using coord_flip().
Heatmaps are excellent for visualizing correlation matrices, cross-tabulations, or any grid of values.
# Correlation heatmap library(dplyr) library(tidyr) cor_data <- mtcars |> select(mpg, hp, wt, qsec, disp) |> cor() |> as.data.frame() |> mutate(var1 = rownames(cor(select(mtcars, mpg, hp, wt, qsec, disp)))) |> pivot_longer(-var1, names_to = "var2", values_to = "r") ggplot(cor_data, aes(x = var1, y = var2, fill = r)) + geom_tile(color = "white") + scale_fill_gradient2(low = "#d73027", mid = "white", high = "#276DC3", midpoint = 0) + geom_text(aes(label = round(r, 2)), size = 3.5) + labs(title = "Correlation Matrix", x = NULL, y = NULL) + theme_minimal()
Use annotate() to add text, rectangles, arrows, or other elements that are not mapped to data.
ggplot(mpg, aes(x = displ, y = hwy)) + geom_point(alpha = 0.5) + annotate("text", x = 6, y = 40, label = "Outlier region", color = "red", size = 4) + annotate("rect", xmin = 5, xmax = 7, ymin = 35, ymax = 45, alpha = 0.1, fill = "red") + annotate("segment", x = 5.5, xend = 5, y = 44, yend = 44, arrow = arrow(length = unit(0.2, "cm")), color = "red")
The patchwork package makes it simple to arrange multiple ggplot objects into a single figure, replacing the need for grid.arrange() or cowplot.
# install.packages("patchwork") library(patchwork) p1 <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() + ggtitle("Scatter") p2 <- ggplot(mpg, aes(x = hwy)) + geom_histogram(binwidth = 2, fill = "#276DC3") + ggtitle("Histogram") p3 <- ggplot(mpg, aes(x = class)) + geom_bar(fill = "#276DC3") + ggtitle("Bar") # Side by side p1 | p2 # Stacked p1 / p2 # Complex layout with annotation (p1 | p2) / p3 + plot_annotation( title = "MPG Dataset Overview", tag_levels = "A" # auto-labels: (A), (B), (C) )
| places plots side by side. / stacks them vertically. + adds them to a layout grid. Use plot_layout(widths = c(2, 1)) to control relative sizes. plot_annotation() adds titles, subtitles, and panel tags.
Approximately 8% of men have some form of color vision deficiency. Using the viridis scale ensures your plots remain interpretable for all readers.
# Continuous viridis scale ggplot(diamonds |> dplyr::sample_n(2000), aes(x = carat, y = price, color = depth)) + geom_point(alpha = 0.6) + scale_color_viridis_c() + theme_minimal() # Discrete viridis scale for categorical data ggplot(mpg, aes(x = displ, y = hwy, color = drv)) + geom_point(size = 2) + scale_color_viridis_d(option = "D") + # options: A-H theme_minimal() # Other color-blind safe palettes # scale_color_brewer(palette = "Set2") # ColorBrewer # scale_color_manual(values = c(...)) # fully custom
viridis, ColorBrewer qualitative palettes, or adding shape/linetype as a redundant encoding.
The plotly package can convert any ggplot into an interactive, zoomable HTML widget with a single function call. This is invaluable for exploratory analysis and presentations.
# install.packages("plotly") library(plotly) p <- ggplot(mpg, aes(x = displ, y = hwy, color = class, text = paste("Model:", model))) + geom_point(size = 2) + theme_minimal() # Convert to interactive — hover to see tooltips ggplotly(p, tooltip = c("text", "x", "y"))
text aesthetic in aes() with paste() to build informative tooltips. Then pass tooltip = "text" to ggplotly(). This is especially useful when each point represents a specific entity (a company, a patient, a country).
# Save the last plot ggsave("my_plot.png", width = 8, height = 5, dpi = 300) # Save a specific plot object ggsave("my_plot.pdf", plot = p, width = 8, height = 5)
Using the diamonds dataset, create a scatterplot of carat (x) vs. price (y) colored by cut. Add geom_smooth(method = "lm"), apply theme_minimal(), and save it as a PNG at 300 DPI.
Build a bar chart showing the count of vehicles per manufacturer in the mpg dataset. Sort bars from most to least using fct_infreq() from the forcats package. Use coord_flip() for readability.
Create a correlation heatmap using geom_tile() for the numeric columns of the mtcars dataset. Use scale_fill_gradient2() with a diverging palette centered at zero. Add correlation coefficient labels with geom_text(). Which pair of variables has the strongest positive correlation? The strongest negative?
Using the patchwork package, create a three-panel figure from mpg: (A) a boxplot of hwy by drv with jittered points, (B) a density plot of hwy colored by drv, and (C) a scatterplot of displ vs. hwy with regression lines by drv. Arrange them as two on top, one on bottom, with automatic panel tags and a shared title. Use a color-blind friendly palette throughout. Save the result as a PDF at 10 x 7 inches.
+.aes(); set constants outside.facet_wrap and facet_grid create small multiples for group comparisons.labs() for labels and a theme_*() for consistent styling.ggsave() exports plots in any format (PNG, PDF, SVG) at specified resolution.