Matplotlib and Seaborn
Matplotlib is Python's foundational plotting library. Seaborn builds on top of it with higher-level statistical graphics and better default aesthetics. Together they cover most visualization needs in analytics work.
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd import numpy as np # Set a clean style sns.set_theme(style="whitegrid", font_scale=1.1)
Every Matplotlib plot is built from two objects: the Figure (the overall window or page) and one or more Axes (individual plots inside the figure). Understanding this distinction is essential because the object-oriented interface gives you far more control than the plt.plot() shortcut.
# The object-oriented approach: explicit fig and ax fig, ax = plt.subplots(figsize=(8, 4)) # ax is a single Axes object — you call methods on it directly ax.plot([1, 2, 3, 4], [10, 20, 25, 30], marker="o") ax.set_xlabel("Quarter") ax.set_ylabel("Sales ($K)") ax.set_title("Quarterly Sales") # fig controls the overall canvas fig.suptitle("Company Performance Dashboard", fontsize=14, y=1.02) # Multiple axes: subplots returns an array fig2, axes = plt.subplots(2, 3, figsize=(14, 8)) # axes is a 2D numpy array: axes[row, col] print(f"axes shape: {axes.shape}") # (2, 3) axes[0, 0].set_title("Top-left plot") axes[1, 2].set_title("Bottom-right plot") plt.tight_layout() plt.show()
plt.plot() function is a convenience wrapper that implicitly creates a figure and axes behind the scenes. It works fine for quick, single plots. But when you need multiple panels, precise layout control, or want to embed plots in a GUI, always use the explicit fig, ax = plt.subplots() approach. Professional code almost always uses the object-oriented API.
months = np.arange(1, 13) revenue = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78] plt.figure(figsize=(8, 4)) plt.plot(months, revenue, marker="o", linewidth=2, color="#3776AB") plt.xlabel("Month") plt.ylabel("Revenue ($K)") plt.title("Monthly Revenue Trend") plt.xticks(months) plt.tight_layout() plt.show()
np.random.seed(42) ad_spend = np.random.uniform(10, 100, 50) sales = 2.5 * ad_spend + np.random.normal(0, 20, 50) plt.figure(figsize=(6, 5)) plt.scatter(ad_spend, sales, alpha=0.7, edgecolors="w", s=60) plt.xlabel("Ad Spend ($K)") plt.ylabel("Sales ($K)") plt.title("Ad Spend vs. Sales") plt.tight_layout() plt.show()
categories = ["Electronics", "Apparel", "Grocery", "Home"] values = [340, 215, 520, 180] plt.figure(figsize=(7, 4)) bars = plt.bar(categories, values, color=["#3776AB", "#4b8bbe", "#6ba3d6", "#9dc3e6"]) plt.ylabel("Revenue ($K)") plt.title("Revenue by Category") # Add value labels on bars for bar in bars: plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 5, f"${bar.get_height()}K", ha="center", fontsize=10) plt.tight_layout() plt.show()
delivery_times = np.random.normal(loc=5.2, scale=1.1, size=500) plt.figure(figsize=(7, 4)) plt.hist(delivery_times, bins=25, color="#3776AB", edgecolor="white", alpha=0.8) plt.axvline(delivery_times.mean(), color="red", linestyle="--", label="Mean") plt.xlabel("Delivery Time (days)") plt.ylabel("Frequency") plt.title("Distribution of Delivery Times") plt.legend() plt.tight_layout() plt.show()
The difference between a quick exploratory plot and a presentation-ready figure often comes down to how you handle axis ticks, labels, and legends. Matplotlib gives you full control over each of these elements.
fig, ax = plt.subplots(figsize=(9, 5)) # Plot two series months = np.arange(1, 13) revenue_2023 = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78] revenue_2024 = [48, 50, 55, 58, 63, 68, 65, 70, 75, 80, 82, 90] ax.plot(months, revenue_2023, marker="o", label="2023", color="#3776AB") ax.plot(months, revenue_2024, marker="s", label="2024", color="#e74c3c") # Customize x-axis ticks: show month abbreviations month_labels = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] ax.set_xticks(months) ax.set_xticklabels(month_labels, rotation=45, ha="right") # Customize y-axis: start at 0 and add dollar formatting ax.set_ylim(bottom=0) ax.set_ylabel("Revenue ($K)", fontsize=12) ax.set_xlabel("Month", fontsize=12) ax.set_title("Year-over-Year Revenue Comparison", fontsize=14, fontweight="bold") # Legend: control position, frame, and font ax.legend(loc="upper left", frameon=True, fontsize=11, shadow=True) # Minor gridlines for readability ax.grid(True, which="major", alpha=0.3) ax.minorticks_on() plt.tight_layout() plt.show()
loc values include "upper left", "upper right", "lower left", "lower right", and "best" (auto-picks the least-obstructive corner). You can also place the legend outside the plot with ax.legend(bbox_to_anchor=(1.05, 1), loc="upper left"), which moves it just to the right of the axes.
Color choice matters both for aesthetics and for accessibility. Seaborn provides curated palettes, while Matplotlib offers colormaps for continuous data. Choosing the right one depends on whether your data is categorical, sequential, or diverging.
# Seaborn palettes for categorical data fig, axes = plt.subplots(1, 3, figsize=(14, 3)) palettes = ["Set2", "husl", "colorblind"] for ax, pal in zip(axes, palettes): colors = sns.color_palette(pal, 6) for i, c in enumerate(colors): ax.bar(i, 1, color=c, width=0.8) ax.set_title(f"Palette: {pal}") ax.set_xticks([]) ax.set_yticks([]) plt.tight_layout() plt.show() # Colormaps for continuous/heatmap data # Sequential: "viridis", "Blues", "YlOrRd" — for data with a natural order # Diverging: "RdBu", "coolwarm" — for data with a meaningful center (e.g., 0) # "colorblind" palette: safe for ~8% of men who have color vision deficiency data = np.random.randn(10, 10) fig, axes = plt.subplots(1, 3, figsize=(14, 4)) for ax, cmap in zip(axes, ["viridis", "Blues", "RdBu"]): im = ax.imshow(data, cmap=cmap, aspect="auto") ax.set_title(f"cmap='{cmap}'") fig.colorbar(im, ax=ax, shrink=0.8) plt.tight_layout() plt.show()
Annotations draw the reader's attention to specific data points or regions. The ax.annotate() function lets you place text with an optional arrow connecting it to a point on the plot.
fig, ax = plt.subplots(figsize=(9, 5)) months = np.arange(1, 13) revenue = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78] ax.plot(months, revenue, marker="o", color="#3776AB", linewidth=2) # Annotate the peak value peak_idx = np.argmax(revenue) ax.annotate( f"Peak: ${revenue[peak_idx]}K", xy=(months[peak_idx], revenue[peak_idx]), # point to annotate xytext=(months[peak_idx] - 3, revenue[peak_idx] + 5), # text position fontsize=11, arrowprops=dict(arrowstyle="->", color="#333", lw=1.5), bbox=dict(boxstyle="round,pad=0.3", facecolor="#ffffcc", edgecolor="#999") ) # Annotate the dip in July ax.annotate( "Summer dip", xy=(7, 58), xytext=(8.5, 50), fontsize=10, arrowprops=dict(arrowstyle="->", color="red"), color="red" ) # Add a shaded region to highlight Q4 ax.axvspan(10, 12, alpha=0.1, color="green", label="Q4 Holiday Season") ax.legend() ax.set_xlabel("Month") ax.set_ylabel("Revenue ($K)") ax.set_title("Annotated Revenue Trend") plt.tight_layout() plt.show()
fig, axes = plt.subplots(1, 2, figsize=(12, 4)) # Left: line plot axes[0].plot(months, revenue, marker="s", color="#3776AB") axes[0].set_title("Revenue Over Time") axes[0].set_xlabel("Month") # Right: bar chart axes[1].bar(categories, values, color="#4b8bbe") axes[1].set_title("Revenue by Category") plt.tight_layout() plt.show()
# Correlation heatmap df = pd.DataFrame({ "price": np.random.uniform(5, 50, 100), "ads": np.random.uniform(1, 20, 100), "sales": np.random.uniform(50, 500, 100), "returns": np.random.uniform(0, 50, 100), }) plt.figure(figsize=(6, 5)) sns.heatmap(df.corr(), annot=True, cmap="Blues", fmt=".2f", linewidths=0.5) plt.title("Correlation Matrix") plt.tight_layout() plt.show()
# Pairwise scatter plots colored by a category iris = sns.load_dataset("iris") sns.pairplot(iris, hue="species", palette="Set2", height=2.5) plt.suptitle("Iris Dataset Pairplot", y=1.02) plt.show()
FacetGrid lets you split a dataset by one or two categorical variables and plot the same chart for each subset. This is extremely useful for comparing patterns across groups without writing repetitive subplot code.
tips = sns.load_dataset("tips") # One variable faceting: separate plot per meal time g = sns.FacetGrid(tips, col="time", row="sex", height=3.5, aspect=1.2) g.map_dataframe(sns.scatterplot, x="total_bill", y="tip", alpha=0.6) g.set_axis_labels("Total Bill ($)", "Tip ($)") g.add_legend() g.fig.suptitle("Tips by Gender and Meal Time", y=1.02) plt.show() # Quick alternative using built-in Seaborn relplot sns.relplot(data=tips, x="total_bill", y="tip", col="day", hue="smoker", kind="scatter", col_wrap=2, height=3) plt.show()
While Matplotlib and Seaborn produce static images, plotly.express creates interactive HTML charts with hover tooltips, zooming, and panning. Interactive charts are especially useful for dashboards, presentations, and exploratory analysis in Jupyter notebooks.
import plotly.express as px # Interactive scatter plot tips = sns.load_dataset("tips") fig = px.scatter( tips, x="total_bill", y="tip", color="day", size="size", hover_data=["sex", "smoker"], title="Restaurant Tips (hover for details)" ) fig.show() # Interactive bar chart with animation gapminder = px.data.gapminder() fig = px.bar( gapminder.query("continent == 'Americas'"), x="country", y="gdpPercap", animation_frame="year", title="GDP per Capita Over Time" ) fig.update_layout(xaxis_tickangle=-45) fig.show()
pip install plotly.
fig, ax = plt.subplots(figsize=(7, 4)) ax.bar(categories, values, color="#3776AB") ax.set_title("Revenue by Category") # Save as high-resolution PNG and PDF fig.savefig("revenue_chart.png", dpi=300, bbox_inches="tight") fig.savefig("revenue_chart.pdf", bbox_inches="tight")
For academic papers, conferences, and professional reports, there are several additional settings that produce polished output.
# Publication-quality settings import matplotlib as mpl # Set global defaults for clean, professional figures mpl.rcParams["font.family"] = "serif" mpl.rcParams["font.size"] = 11 mpl.rcParams["axes.linewidth"] = 0.8 mpl.rcParams["figure.dpi"] = 150 # screen display mpl.rcParams["savefig.dpi"] = 300 # file output fig, ax = plt.subplots(figsize=(6, 4)) ax.plot([1, 2, 3], [10, 20, 15], "o-") ax.set_xlabel("Time Period") ax.set_ylabel("Value") # Save formats: PNG for slides, PDF/SVG for papers fig.savefig("figure_for_slides.png", dpi=300, bbox_inches="tight", facecolor="white") # white background, not transparent fig.savefig("figure_for_paper.pdf", bbox_inches="tight") fig.savefig("figure_for_web.svg", bbox_inches="tight") # Reset to defaults when done mpl.rcdefaults()
plt.tight_layout() before plt.show() or savefig(). It prevents labels from being clipped. For publications, use dpi=300 and bbox_inches="tight". Use PDF or SVG format for vector graphics that scale cleanly at any size. Use facecolor="white" to avoid transparent backgrounds that look odd in some presentation software.
Create a 2x2 subplot figure: (top-left) line plot of monthly sales, (top-right) bar chart of sales by region, (bottom-left) histogram of order values, (bottom-right) scatter plot of price vs. quantity. Use a consistent color palette.
Load the built-in Seaborn "tips" dataset with sns.load_dataset("tips"). Create a heatmap of the correlation matrix and a boxplot of total bill by day of the week. Save both plots as PNG files.
Create an annotated time series plot. Generate 24 months of synthetic sales data with a visible seasonal pattern (higher in months 11-12). Use ax.annotate() to mark the highest and lowest months with arrows. Add a shaded region for the holiday season (Nov-Dec). Save as both PNG (300 dpi) and PDF.
Using the Seaborn "tips" dataset, create a FacetGrid that shows the relationship between total_bill and tip, faceted by day (columns) and time (rows). Color the points by smoker status. Add a regression line to each panel using g.map_dataframe(sns.regplot, ...). Compare the tip percentages across the different facets.
fig, ax = plt.subplots()) for anything beyond a quick exploratory plot.ax.annotate, ax.axvspan) turn raw plots into explanatory narratives.