Chapter 3: Joins

Combining Data from Multiple Tables

3.1 Why Joins?

Relational databases split data across multiple tables to avoid duplication. An orders table stores order_id and customer_id, but not the customer's name. A join reconnects the tables by matching related columns, letting you answer questions that span multiple tables.

Foreign keys: A foreign key is a column in one table that references the primary key of another table. For example, orders.customer_id references customers.customer_id. Joins use these relationships to combine rows.

3.2 INNER JOIN

INNER JOIN returns only rows where the join condition matches in both tables. If an order references a customer_id that does not exist in the customers table, that order is excluded.

-- Orders with customer details
SELECT
    o.order_id,
    o.order_date,
    c.first_name,
    c.last_name,
    o.total
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
ORDER BY o.order_date DESC;

-- Multiple join conditions
SELECT *
FROM shipments s
INNER JOIN orders o
    ON s.order_id = o.order_id
    AND s.warehouse_id = o.warehouse_id;
Table aliases: Use short aliases (o for orders, c for customers) to keep queries readable. When two tables have columns with the same name, you must qualify them with the alias: o.order_id vs. c.customer_id.

3.3 LEFT JOIN

LEFT JOIN returns all rows from the left table and matching rows from the right table. If there is no match, the right-side columns are filled with NULL.

-- All customers, including those with no orders
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    o.order_id,
    o.total
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;

-- Find customers who have NEVER placed an order
SELECT c.customer_id, c.first_name, c.last_name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
Left vs. Inner in practice: If you want to keep all rows from the "main" table even when the lookup table has no match, use LEFT JOIN. If you only want rows that have a match in both tables, use INNER JOIN. Choosing the wrong join type is a common source of missing or inflated data.

3.4 RIGHT JOIN and FULL OUTER JOIN

-- RIGHT JOIN: all rows from the right table
-- (same as swapping table order in LEFT JOIN)
SELECT
    o.order_id,
    p.product_name
FROM order_items o
RIGHT JOIN products p ON o.product_id = p.product_id;
-- Shows all products, including unsold ones

-- FULL OUTER JOIN: all rows from both tables
-- (Not supported in SQLite; use UNION of LEFT JOINs instead)
SELECT
    c.customer_id,
    c.first_name,
    o.order_id
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;

-- SQLite workaround for FULL OUTER JOIN:
SELECT c.customer_id, c.first_name, o.order_id
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
UNION
SELECT c.customer_id, c.first_name, o.order_id
FROM orders o
LEFT JOIN customers c ON c.customer_id = o.customer_id
WHERE c.customer_id IS NULL;

3.5 CROSS JOIN

A CROSS JOIN produces the Cartesian product: every row from the left table paired with every row from the right table. Useful for generating combinations or scaffolding date/product grids.

-- Generate all product-month combinations for reporting
SELECT
    p.product_name,
    m.month_start
FROM products p
CROSS JOIN (
    SELECT '2024-01-01' AS month_start
    UNION ALL SELECT '2024-02-01'
    UNION ALL SELECT '2024-03-01'
) m
ORDER BY p.product_name, m.month_start;

3.6 Self-Joins

A self-join joins a table to itself. This is useful for hierarchical data (employee-manager relationships) or comparing rows within the same table.

-- Employee-manager hierarchy
SELECT
    e.name    AS employee,
    m.name    AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.emp_id;

-- Find products in the same category with different prices
SELECT
    a.product_name AS product_a,
    b.product_name AS product_b,
    a.category,
    a.unit_price - b.unit_price AS price_diff
FROM products a
INNER JOIN products b
    ON a.category = b.category
    AND a.product_id < b.product_id;  -- avoid duplicates

3.7 Joining Multiple Tables

-- Order details with customer and product info
SELECT
    o.order_id,
    c.first_name || ' ' || c.last_name AS customer,
    p.product_name,
    oi.quantity,
    oi.quantity * p.unit_price AS line_total
FROM orders o
INNER JOIN customers c    ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON o.order_id    = oi.order_id
INNER JOIN products p     ON oi.product_id = p.product_id
ORDER BY o.order_id, p.product_name;

3.8 Join Pitfalls

Duplicate rows from many-to-many: If both sides of a join have multiple matching rows, the result multiplies. An order with 3 items joined to a returns table with 2 returns produces 6 rows. Always verify row counts after a join: SELECT COUNT(*) FROM ....
-- Check for row multiplication
SELECT COUNT(*) AS before_join FROM orders;
SELECT COUNT(*) AS after_join
FROM orders o
INNER JOIN order_items oi ON o.order_id = oi.order_id;
-- If after_join > before_join, the join expanded rows (expected with line items)

3.9 Set Operations: UNION, INTERSECT, EXCEPT

-- UNION: combine results, remove duplicates
SELECT city FROM customers
UNION
SELECT city FROM suppliers;

-- UNION ALL: combine without removing duplicates (faster)
SELECT city FROM customers
UNION ALL
SELECT city FROM suppliers;

-- INTERSECT: rows that appear in both queries
SELECT customer_id FROM orders WHERE order_date >= '2024-01-01'
INTERSECT
SELECT customer_id FROM orders WHERE total > 500;

-- EXCEPT: rows in the first query but not the second
SELECT customer_id FROM customers
EXCEPT
SELECT customer_id FROM orders;
-- Customers who have never ordered (alternative to LEFT JOIN + IS NULL)
UNION vs UNION ALL: UNION removes duplicates (like DISTINCT), which requires sorting. UNION ALL skips deduplication and is faster. Use UNION ALL when you know the sets are already disjoint or when duplicates are acceptable.

Exercise 3.1

Write a query that returns all orders with the customer's full name and order total. Use INNER JOIN between orders and customers. Sort by order_date descending.

Exercise 3.2

Write a query that finds all products that have never been ordered. Use a LEFT JOIN between products and order_items, filtering for NULL order_item_id.

Exercise 3.3

Create an employee-manager report using a self-join on an employees table. Show each employee's name alongside their manager's name. Include employees who have no manager (CEO/top level) by using LEFT JOIN.

Exercise 3.4

Write a query that joins orders, order_items, and products to calculate the total revenue per product. Group by product_name and sort by revenue descending. Show the top 5 products.

Official Resources

Chapter 3 Takeaways

← Chapter 2: SELECT & Filtering Chapter 4: Aggregation →