Chapter 6: Data Visualization

Matplotlib and Seaborn

6.1 The Visualization Stack

Matplotlib is Python's foundational plotting library. Seaborn builds on top of it with higher-level statistical graphics and better default aesthetics. Together they cover most visualization needs in analytics work.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Set a clean style
sns.set_theme(style="whitegrid", font_scale=1.1)

6.2 Figure Anatomy: fig, ax, and axes

Every Matplotlib plot is built from two objects: the Figure (the overall window or page) and one or more Axes (individual plots inside the figure). Understanding this distinction is essential because the object-oriented interface gives you far more control than the plt.plot() shortcut.

# The object-oriented approach: explicit fig and ax
fig, ax = plt.subplots(figsize=(8, 4))

# ax is a single Axes object — you call methods on it directly
ax.plot([1, 2, 3, 4], [10, 20, 25, 30], marker="o")
ax.set_xlabel("Quarter")
ax.set_ylabel("Sales ($K)")
ax.set_title("Quarterly Sales")

# fig controls the overall canvas
fig.suptitle("Company Performance Dashboard", fontsize=14, y=1.02)

# Multiple axes: subplots returns an array
fig2, axes = plt.subplots(2, 3, figsize=(14, 8))
# axes is a 2D numpy array: axes[row, col]
print(f"axes shape: {axes.shape}")  # (2, 3)
axes[0, 0].set_title("Top-left plot")
axes[1, 2].set_title("Bottom-right plot")
plt.tight_layout()
plt.show()
plt.plot() vs ax.plot(): The plt.plot() function is a convenience wrapper that implicitly creates a figure and axes behind the scenes. It works fine for quick, single plots. But when you need multiple panels, precise layout control, or want to embed plots in a GUI, always use the explicit fig, ax = plt.subplots() approach. Professional code almost always uses the object-oriented API.

6.3 Line Plots

months = np.arange(1, 13)
revenue = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78]

plt.figure(figsize=(8, 4))
plt.plot(months, revenue, marker="o", linewidth=2, color="#3776AB")
plt.xlabel("Month")
plt.ylabel("Revenue ($K)")
plt.title("Monthly Revenue Trend")
plt.xticks(months)
plt.tight_layout()
plt.show()

6.3 Scatter Plots

np.random.seed(42)
ad_spend   = np.random.uniform(10, 100, 50)
sales      = 2.5 * ad_spend + np.random.normal(0, 20, 50)

plt.figure(figsize=(6, 5))
plt.scatter(ad_spend, sales, alpha=0.7, edgecolors="w", s=60)
plt.xlabel("Ad Spend ($K)")
plt.ylabel("Sales ($K)")
plt.title("Ad Spend vs. Sales")
plt.tight_layout()
plt.show()

6.4 Bar Charts

categories = ["Electronics", "Apparel", "Grocery", "Home"]
values = [340, 215, 520, 180]

plt.figure(figsize=(7, 4))
bars = plt.bar(categories, values, color=["#3776AB", "#4b8bbe", "#6ba3d6", "#9dc3e6"])
plt.ylabel("Revenue ($K)")
plt.title("Revenue by Category")

# Add value labels on bars
for bar in bars:
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 5,
            f"${bar.get_height()}K", ha="center", fontsize=10)
plt.tight_layout()
plt.show()

6.5 Histograms

delivery_times = np.random.normal(loc=5.2, scale=1.1, size=500)

plt.figure(figsize=(7, 4))
plt.hist(delivery_times, bins=25, color="#3776AB", edgecolor="white", alpha=0.8)
plt.axvline(delivery_times.mean(), color="red", linestyle="--", label="Mean")
plt.xlabel("Delivery Time (days)")
plt.ylabel("Frequency")
plt.title("Distribution of Delivery Times")
plt.legend()
plt.tight_layout()
plt.show()

6.6 Customizing Ticks, Labels, and Legends

The difference between a quick exploratory plot and a presentation-ready figure often comes down to how you handle axis ticks, labels, and legends. Matplotlib gives you full control over each of these elements.

fig, ax = plt.subplots(figsize=(9, 5))

# Plot two series
months = np.arange(1, 13)
revenue_2023 = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78]
revenue_2024 = [48, 50, 55, 58, 63, 68, 65, 70, 75, 80, 82, 90]

ax.plot(months, revenue_2023, marker="o", label="2023", color="#3776AB")
ax.plot(months, revenue_2024, marker="s", label="2024", color="#e74c3c")

# Customize x-axis ticks: show month abbreviations
month_labels = ["Jan", "Feb", "Mar", "Apr", "May", "Jun",
                "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
ax.set_xticks(months)
ax.set_xticklabels(month_labels, rotation=45, ha="right")

# Customize y-axis: start at 0 and add dollar formatting
ax.set_ylim(bottom=0)
ax.set_ylabel("Revenue ($K)", fontsize=12)
ax.set_xlabel("Month", fontsize=12)
ax.set_title("Year-over-Year Revenue Comparison", fontsize=14, fontweight="bold")

# Legend: control position, frame, and font
ax.legend(loc="upper left", frameon=True, fontsize=11, shadow=True)

# Minor gridlines for readability
ax.grid(True, which="major", alpha=0.3)
ax.minorticks_on()

plt.tight_layout()
plt.show()
Legend placement: Common loc values include "upper left", "upper right", "lower left", "lower right", and "best" (auto-picks the least-obstructive corner). You can also place the legend outside the plot with ax.legend(bbox_to_anchor=(1.05, 1), loc="upper left"), which moves it just to the right of the axes.

6.7 Color Palettes and Colormaps

Color choice matters both for aesthetics and for accessibility. Seaborn provides curated palettes, while Matplotlib offers colormaps for continuous data. Choosing the right one depends on whether your data is categorical, sequential, or diverging.

# Seaborn palettes for categorical data
fig, axes = plt.subplots(1, 3, figsize=(14, 3))

palettes = ["Set2", "husl", "colorblind"]
for ax, pal in zip(axes, palettes):
    colors = sns.color_palette(pal, 6)
    for i, c in enumerate(colors):
        ax.bar(i, 1, color=c, width=0.8)
    ax.set_title(f"Palette: {pal}")
    ax.set_xticks([])
    ax.set_yticks([])
plt.tight_layout()
plt.show()

# Colormaps for continuous/heatmap data
# Sequential: "viridis", "Blues", "YlOrRd" — for data with a natural order
# Diverging:  "RdBu", "coolwarm" — for data with a meaningful center (e.g., 0)
# "colorblind" palette: safe for ~8% of men who have color vision deficiency

data = np.random.randn(10, 10)

fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, cmap in zip(axes, ["viridis", "Blues", "RdBu"]):
    im = ax.imshow(data, cmap=cmap, aspect="auto")
    ax.set_title(f"cmap='{cmap}'")
    fig.colorbar(im, ax=ax, shrink=0.8)
plt.tight_layout()
plt.show()
When to use which colormap: Use sequential colormaps (viridis, Blues) when values go from low to high with no special midpoint. Use diverging colormaps (RdBu, coolwarm) when there is a meaningful center value, such as zero in a correlation matrix or a difference from the mean. Avoid rainbow colormaps ("jet") because they create artificial visual boundaries and are hard to read for colorblind viewers.

6.8 Annotating Plots

Annotations draw the reader's attention to specific data points or regions. The ax.annotate() function lets you place text with an optional arrow connecting it to a point on the plot.

fig, ax = plt.subplots(figsize=(9, 5))

months = np.arange(1, 13)
revenue = [42, 45, 48, 52, 55, 60, 58, 62, 65, 70, 72, 78]

ax.plot(months, revenue, marker="o", color="#3776AB", linewidth=2)

# Annotate the peak value
peak_idx = np.argmax(revenue)
ax.annotate(
    f"Peak: ${revenue[peak_idx]}K",
    xy=(months[peak_idx], revenue[peak_idx]),       # point to annotate
    xytext=(months[peak_idx] - 3, revenue[peak_idx] + 5),  # text position
    fontsize=11,
    arrowprops=dict(arrowstyle="->", color="#333", lw=1.5),
    bbox=dict(boxstyle="round,pad=0.3", facecolor="#ffffcc", edgecolor="#999")
)

# Annotate the dip in July
ax.annotate(
    "Summer dip",
    xy=(7, 58),
    xytext=(8.5, 50),
    fontsize=10,
    arrowprops=dict(arrowstyle="->", color="red"),
    color="red"
)

# Add a shaded region to highlight Q4
ax.axvspan(10, 12, alpha=0.1, color="green", label="Q4 Holiday Season")
ax.legend()

ax.set_xlabel("Month")
ax.set_ylabel("Revenue ($K)")
ax.set_title("Annotated Revenue Trend")
plt.tight_layout()
plt.show()

6.9 Subplots

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Left: line plot
axes[0].plot(months, revenue, marker="s", color="#3776AB")
axes[0].set_title("Revenue Over Time")
axes[0].set_xlabel("Month")

# Right: bar chart
axes[1].bar(categories, values, color="#4b8bbe")
axes[1].set_title("Revenue by Category")

plt.tight_layout()
plt.show()

6.10 Seaborn: Heatmap

# Correlation heatmap
df = pd.DataFrame({
    "price": np.random.uniform(5, 50, 100),
    "ads":   np.random.uniform(1, 20, 100),
    "sales": np.random.uniform(50, 500, 100),
    "returns": np.random.uniform(0, 50, 100),
})

plt.figure(figsize=(6, 5))
sns.heatmap(df.corr(), annot=True, cmap="Blues", fmt=".2f", linewidths=0.5)
plt.title("Correlation Matrix")
plt.tight_layout()
plt.show()

6.11 Seaborn: Pairplot

# Pairwise scatter plots colored by a category
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species", palette="Set2", height=2.5)
plt.suptitle("Iris Dataset Pairplot", y=1.02)
plt.show()

6.12 Seaborn: FacetGrid

FacetGrid lets you split a dataset by one or two categorical variables and plot the same chart for each subset. This is extremely useful for comparing patterns across groups without writing repetitive subplot code.

tips = sns.load_dataset("tips")

# One variable faceting: separate plot per meal time
g = sns.FacetGrid(tips, col="time", row="sex", height=3.5, aspect=1.2)
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip", alpha=0.6)
g.set_axis_labels("Total Bill ($)", "Tip ($)")
g.add_legend()
g.fig.suptitle("Tips by Gender and Meal Time", y=1.02)
plt.show()

# Quick alternative using built-in Seaborn relplot
sns.relplot(data=tips, x="total_bill", y="tip",
           col="day", hue="smoker", kind="scatter",
           col_wrap=2, height=3)
plt.show()

6.13 Plotly Express: Interactive Charts

While Matplotlib and Seaborn produce static images, plotly.express creates interactive HTML charts with hover tooltips, zooming, and panning. Interactive charts are especially useful for dashboards, presentations, and exploratory analysis in Jupyter notebooks.

import plotly.express as px

# Interactive scatter plot
tips = sns.load_dataset("tips")
fig = px.scatter(
    tips, x="total_bill", y="tip",
    color="day", size="size",
    hover_data=["sex", "smoker"],
    title="Restaurant Tips (hover for details)"
)
fig.show()

# Interactive bar chart with animation
gapminder = px.data.gapminder()
fig = px.bar(
    gapminder.query("continent == 'Americas'"),
    x="country", y="gdpPercap",
    animation_frame="year",
    title="GDP per Capita Over Time"
)
fig.update_layout(xaxis_tickangle=-45)
fig.show()
When to choose Plotly over Matplotlib: Use Plotly when your audience will interact with the chart (Jupyter notebooks, web dashboards). Use Matplotlib/Seaborn when you need static images for papers, PDFs, or printed reports. You can install Plotly with pip install plotly.

6.14 Saving Publication-Quality Figures

fig, ax = plt.subplots(figsize=(7, 4))
ax.bar(categories, values, color="#3776AB")
ax.set_title("Revenue by Category")

# Save as high-resolution PNG and PDF
fig.savefig("revenue_chart.png", dpi=300, bbox_inches="tight")
fig.savefig("revenue_chart.pdf", bbox_inches="tight")

For academic papers, conferences, and professional reports, there are several additional settings that produce polished output.

# Publication-quality settings
import matplotlib as mpl

# Set global defaults for clean, professional figures
mpl.rcParams["font.family"]     = "serif"
mpl.rcParams["font.size"]       = 11
mpl.rcParams["axes.linewidth"]  = 0.8
mpl.rcParams["figure.dpi"]      = 150  # screen display
mpl.rcParams["savefig.dpi"]     = 300  # file output

fig, ax = plt.subplots(figsize=(6, 4))
ax.plot([1, 2, 3], [10, 20, 15], "o-")
ax.set_xlabel("Time Period")
ax.set_ylabel("Value")

# Save formats: PNG for slides, PDF/SVG for papers
fig.savefig("figure_for_slides.png", dpi=300, bbox_inches="tight",
           facecolor="white")  # white background, not transparent
fig.savefig("figure_for_paper.pdf", bbox_inches="tight")
fig.savefig("figure_for_web.svg", bbox_inches="tight")

# Reset to defaults when done
mpl.rcdefaults()
Style tip: Always call plt.tight_layout() before plt.show() or savefig(). It prevents labels from being clipped. For publications, use dpi=300 and bbox_inches="tight". Use PDF or SVG format for vector graphics that scale cleanly at any size. Use facecolor="white" to avoid transparent backgrounds that look odd in some presentation software.

Exercise 6.1

Create a 2x2 subplot figure: (top-left) line plot of monthly sales, (top-right) bar chart of sales by region, (bottom-left) histogram of order values, (bottom-right) scatter plot of price vs. quantity. Use a consistent color palette.

Exercise 6.2

Load the built-in Seaborn "tips" dataset with sns.load_dataset("tips"). Create a heatmap of the correlation matrix and a boxplot of total bill by day of the week. Save both plots as PNG files.

Exercise 6.3

Create an annotated time series plot. Generate 24 months of synthetic sales data with a visible seasonal pattern (higher in months 11-12). Use ax.annotate() to mark the highest and lowest months with arrows. Add a shaded region for the holiday season (Nov-Dec). Save as both PNG (300 dpi) and PDF.

Exercise 6.4

Using the Seaborn "tips" dataset, create a FacetGrid that shows the relationship between total_bill and tip, faceted by day (columns) and time (rows). Color the points by smoker status. Add a regression line to each panel using g.map_dataframe(sns.regplot, ...). Compare the tip percentages across the different facets.

Official Resources

Chapter 6 Takeaways

← Chapter 5: Pandas Chapter 7: Statistics →