Nesting and Structuring Complex Queries
A subquery is a query nested inside another query. Subqueries can return a single value (scalar), a single column, or an entire table. They allow you to break complex problems into logical steps without creating temporary tables.
-- Scalar subquery: products priced above average SELECT product_name, unit_price FROM products WHERE unit_price > ( SELECT AVG(unit_price) FROM products ); -- IN subquery: customers who placed orders in 2024 SELECT first_name, last_name FROM customers WHERE customer_id IN ( SELECT DISTINCT customer_id FROM orders WHERE order_date >= '2024-01-01' ); -- EXISTS: customers who have at least one order SELECT c.first_name, c.last_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id ); -- NOT EXISTS: customers with no orders SELECT c.first_name, c.last_name FROM customers c WHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );
A correlated subquery references a column from the outer query, so it executes once per row of the outer query. They are powerful but can be slow on large tables.
-- Find the most recent order for each customer SELECT c.first_name, c.last_name, ( SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id ) AS last_order_date FROM customers c; -- Products priced above their category average SELECT p.product_name, p.category, p.unit_price FROM products p WHERE p.unit_price > ( SELECT AVG(p2.unit_price) FROM products p2 WHERE p2.category = p.category );
-- Add order count as a column for each customer SELECT c.first_name, c.last_name, ( SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id ) AS order_count FROM customers c ORDER BY order_count DESC;
-- Average of category averages (two-level aggregation) SELECT AVG(cat_avg) AS grand_avg FROM ( SELECT category, AVG(unit_price) AS cat_avg FROM products GROUP BY category ) AS category_averages; -- Top customer by total spend SELECT customer_id, total_spend FROM ( SELECT customer_id, SUM(total) AS total_spend FROM orders GROUP BY customer_id ) AS customer_totals ORDER BY total_spend DESC LIMIT 5;
A CTE uses the WITH clause to define a named temporary result set that you can reference in the main query. CTEs make complex queries far more readable than nested subqueries.
-- Same "top customer" query using a CTE WITH customer_totals AS ( SELECT customer_id, SUM(total) AS total_spend FROM orders GROUP BY customer_id ) SELECT c.first_name, c.last_name, ct.total_spend FROM customer_totals ct INNER JOIN customers c ON ct.customer_id = c.customer_id ORDER BY ct.total_spend DESC LIMIT 5; -- Multiple CTEs in a single query WITH monthly_revenue AS ( SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS revenue FROM orders GROUP BY month ), avg_revenue AS ( SELECT AVG(revenue) AS avg_rev FROM monthly_revenue ) SELECT mr.month, mr.revenue, ar.avg_rev, CASE WHEN mr.revenue > ar.avg_rev THEN 'Above' ELSE 'Below' END AS performance FROM monthly_revenue mr, avg_revenue ar ORDER BY mr.month;
Recursive CTEs can traverse hierarchical data like organizational charts, bill of materials, or category trees.
-- Org chart: find all reports (direct and indirect) under employee 1 WITH RECURSIVE org_tree AS ( -- Anchor: start with the root employee SELECT emp_id, name, manager_id, 0 AS depth FROM employees WHERE emp_id = 1 UNION ALL -- Recursive: find employees who report to current level SELECT e.emp_id, e.name, e.manager_id, ot.depth + 1 FROM employees e INNER JOIN org_tree ot ON e.manager_id = ot.emp_id ) SELECT * FROM org_tree ORDER BY depth, name; -- Generate a number series (useful for date scaffolding) WITH RECURSIVE numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n < 12 ) SELECT n AS month_number FROM numbers;
| Approach | Scope | Reusable in Query? | Best For |
|---|---|---|---|
| Subquery | Inside one clause | No | Simple, one-off calculations |
| CTE | One statement | Yes (within that statement) | Multi-step analytics, readability |
| Temp Table | Entire session | Yes (across statements) | Intermediate results used by multiple queries |
Write a query using a subquery in WHERE to find all products whose price is greater than the average price of their category. Use a correlated subquery.
Rewrite the query from Exercise 5.1 using a CTE. First compute the average price per category in the CTE, then join it with the products table.
Write a CTE-based query that computes monthly revenue, then finds the month with the highest revenue and the month with the lowest revenue.
Create an employees table with emp_id, name, and manager_id. Insert a hierarchy of at least 8 employees across 3 levels. Write a recursive CTE that lists each employee, their manager's name, and their depth in the hierarchy.