Chapter 5: Subqueries & CTEs

Nesting and Structuring Complex Queries

5.1 What Are Subqueries?

A subquery is a query nested inside another query. Subqueries can return a single value (scalar), a single column, or an entire table. They allow you to break complex problems into logical steps without creating temporary tables.

5.2 Subqueries in WHERE

-- Scalar subquery: products priced above average
SELECT product_name, unit_price
FROM products
WHERE unit_price > (
    SELECT AVG(unit_price) FROM products
);

-- IN subquery: customers who placed orders in 2024
SELECT first_name, last_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= '2024-01-01'
);

-- EXISTS: customers who have at least one order
SELECT c.first_name, c.last_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- NOT EXISTS: customers with no orders
SELECT c.first_name, c.last_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
);
EXISTS vs. IN: EXISTS stops scanning as soon as it finds one match, making it efficient for large tables. IN materializes the entire subquery result into a list. For small lists, IN is fine. For large subqueries, EXISTS often performs better.

5.3 Correlated Subqueries

A correlated subquery references a column from the outer query, so it executes once per row of the outer query. They are powerful but can be slow on large tables.

-- Find the most recent order for each customer
SELECT
    c.first_name,
    c.last_name,
    (
        SELECT MAX(o.order_date)
        FROM orders o
        WHERE o.customer_id = c.customer_id
    ) AS last_order_date
FROM customers c;

-- Products priced above their category average
SELECT p.product_name, p.category, p.unit_price
FROM products p
WHERE p.unit_price > (
    SELECT AVG(p2.unit_price)
    FROM products p2
    WHERE p2.category = p.category
);

5.4 Subqueries in SELECT

-- Add order count as a column for each customer
SELECT
    c.first_name,
    c.last_name,
    (
        SELECT COUNT(*)
        FROM orders o
        WHERE o.customer_id = c.customer_id
    ) AS order_count
FROM customers c
ORDER BY order_count DESC;

5.5 Subqueries in FROM (Derived Tables)

-- Average of category averages (two-level aggregation)
SELECT AVG(cat_avg) AS grand_avg
FROM (
    SELECT
        category,
        AVG(unit_price) AS cat_avg
    FROM products
    GROUP BY category
) AS category_averages;

-- Top customer by total spend
SELECT customer_id, total_spend
FROM (
    SELECT
        customer_id,
        SUM(total) AS total_spend
    FROM orders
    GROUP BY customer_id
) AS customer_totals
ORDER BY total_spend DESC
LIMIT 5;

5.6 Common Table Expressions (CTEs)

A CTE uses the WITH clause to define a named temporary result set that you can reference in the main query. CTEs make complex queries far more readable than nested subqueries.

-- Same "top customer" query using a CTE
WITH customer_totals AS (
    SELECT
        customer_id,
        SUM(total) AS total_spend
    FROM orders
    GROUP BY customer_id
)
SELECT
    c.first_name,
    c.last_name,
    ct.total_spend
FROM customer_totals ct
INNER JOIN customers c ON ct.customer_id = c.customer_id
ORDER BY ct.total_spend DESC
LIMIT 5;

-- Multiple CTEs in a single query
WITH
monthly_revenue AS (
    SELECT
        STRFTIME('%Y-%m', order_date) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY month
),
avg_revenue AS (
    SELECT AVG(revenue) AS avg_rev FROM monthly_revenue
)
SELECT
    mr.month,
    mr.revenue,
    ar.avg_rev,
    CASE WHEN mr.revenue > ar.avg_rev THEN 'Above' ELSE 'Below' END AS performance
FROM monthly_revenue mr, avg_revenue ar
ORDER BY mr.month;
CTEs vs. subqueries: CTEs and derived-table subqueries produce identical results. The advantage of CTEs is readability: you read the query top to bottom, each CTE building on the previous one. For one-off calculations, a subquery is fine. For multi-step analysis, CTEs are clearer.

5.7 Recursive CTEs

Recursive CTEs can traverse hierarchical data like organizational charts, bill of materials, or category trees.

-- Org chart: find all reports (direct and indirect) under employee 1
WITH RECURSIVE org_tree AS (
    -- Anchor: start with the root employee
    SELECT emp_id, name, manager_id, 0 AS depth
    FROM employees
    WHERE emp_id = 1

    UNION ALL

    -- Recursive: find employees who report to current level
    SELECT e.emp_id, e.name, e.manager_id, ot.depth + 1
    FROM employees e
    INNER JOIN org_tree ot ON e.manager_id = ot.emp_id
)
SELECT * FROM org_tree
ORDER BY depth, name;

-- Generate a number series (useful for date scaffolding)
WITH RECURSIVE numbers AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1 FROM numbers WHERE n < 12
)
SELECT n AS month_number FROM numbers;

5.8 CTE vs. Subquery vs. Temp Table

ApproachScopeReusable in Query?Best For
SubqueryInside one clauseNoSimple, one-off calculations
CTEOne statementYes (within that statement)Multi-step analytics, readability
Temp TableEntire sessionYes (across statements)Intermediate results used by multiple queries

Exercise 5.1

Write a query using a subquery in WHERE to find all products whose price is greater than the average price of their category. Use a correlated subquery.

Exercise 5.2

Rewrite the query from Exercise 5.1 using a CTE. First compute the average price per category in the CTE, then join it with the products table.

Exercise 5.3

Write a CTE-based query that computes monthly revenue, then finds the month with the highest revenue and the month with the lowest revenue.

Exercise 5.4

Create an employees table with emp_id, name, and manager_id. Insert a hierarchy of at least 8 employees across 3 levels. Write a recursive CTE that lists each employee, their manager's name, and their depth in the hierarchy.

Official Resources

Chapter 5 Takeaways

← Chapter 4: Aggregation Chapter 6: Window Functions →