Chapter 6: Window Functions

Analytics over Partitions

6.1 What Are Window Functions?

Window functions compute a value for each row based on a set of related rows (the "window"), without collapsing the result into a single row like GROUP BY does. You keep all your detail rows and add an analytical column alongside them.

-- Compare: GROUP BY collapses rows
SELECT category, SUM(unit_price) AS total
FROM products
GROUP BY category;
-- Returns one row per category

-- Window function: keeps all rows, adds a column
SELECT
    product_name,
    category,
    unit_price,
    SUM(unit_price) OVER (PARTITION BY category) AS category_total
FROM products;
The OVER clause: Every window function requires OVER(). Inside the parentheses you specify PARTITION BY (which groups to compute over) and ORDER BY (the sequence within each group). An empty OVER() treats the entire result set as one window.

6.2 ROW_NUMBER, RANK, DENSE_RANK

-- ROW_NUMBER: unique sequential number within each partition
SELECT
    product_name,
    category,
    unit_price,
    ROW_NUMBER() OVER (
        PARTITION BY category
        ORDER BY unit_price DESC
    ) AS price_rank
FROM products;

-- RANK: same rank for ties, skips next rank
-- Prices: 100, 100, 80 → ranks: 1, 1, 3
SELECT
    product_name,
    unit_price,
    RANK() OVER (ORDER BY unit_price DESC) AS rnk
FROM products;

-- DENSE_RANK: same rank for ties, no gaps
-- Prices: 100, 100, 80 → ranks: 1, 1, 2
SELECT
    product_name,
    unit_price,
    DENSE_RANK() OVER (ORDER BY unit_price DESC) AS dense_rnk
FROM products;

-- Top 3 products per category
WITH ranked AS (
    SELECT
        product_name,
        category,
        unit_price,
        ROW_NUMBER() OVER (
            PARTITION BY category
            ORDER BY unit_price DESC
        ) AS rn
    FROM products
)
SELECT * FROM ranked WHERE rn <= 3;
Top-N per group: The CTE + ROW_NUMBER() + WHERE rn <= N pattern is the standard way to get the top N rows per group. You cannot put window functions directly in WHERE, so you must wrap them in a CTE or subquery first.

6.3 NTILE

-- Divide products into 4 price quartiles
SELECT
    product_name,
    unit_price,
    NTILE(4) OVER (ORDER BY unit_price) AS price_quartile
FROM products;

-- Deciles for customer spending
SELECT
    customer_id,
    total_spend,
    NTILE(10) OVER (ORDER BY total_spend DESC) AS spend_decile
FROM (
    SELECT customer_id, SUM(total) AS total_spend
    FROM orders GROUP BY customer_id
);

6.4 LAG and LEAD

LAG accesses a value from a previous row; LEAD accesses a value from a following row. Essential for period-over-period comparisons.

-- Month-over-month revenue change
WITH monthly AS (
    SELECT
        STRFTIME('%Y-%m', order_date) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY month
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month,
    revenue - LAG(revenue) OVER (ORDER BY month) AS change,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month))
        * 100.0
        / LAG(revenue) OVER (ORDER BY month),
    1) AS pct_change
FROM monthly;

-- LEAD: look ahead to the next row
SELECT
    order_id,
    order_date,
    LEAD(order_date) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
    ) AS next_order_date
FROM orders;

6.5 Running Totals and Moving Averages

-- Running total of revenue
SELECT
    order_date,
    total,
    SUM(total) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders;

-- 3-month moving average
WITH monthly AS (
    SELECT
        STRFTIME('%Y-%m', order_date) AS month,
        SUM(total) AS revenue
    FROM orders GROUP BY month
)
SELECT
    month,
    revenue,
    AVG(revenue) OVER (
        ORDER BY month
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM monthly;

6.6 FIRST_VALUE and LAST_VALUE

-- Compare each product's price to the cheapest in its category
SELECT
    product_name,
    category,
    unit_price,
    FIRST_VALUE(product_name) OVER (
        PARTITION BY category
        ORDER BY unit_price ASC
    ) AS cheapest_product,
    unit_price - FIRST_VALUE(unit_price) OVER (
        PARTITION BY category
        ORDER BY unit_price ASC
    ) AS price_above_min
FROM products;
LAST_VALUE gotcha: By default, the window frame extends from the start of the partition to the current row, so LAST_VALUE returns the current row's value. To get the actual last value, you must specify the frame: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

6.7 Window Frame Clause

-- ROWS BETWEEN controls which rows the function sees
-- UNBOUNDED PRECEDING = first row of partition
-- N PRECEDING = N rows before current
-- CURRENT ROW = current row
-- N FOLLOWING = N rows after current
-- UNBOUNDED FOLLOWING = last row of partition

-- Running total (default frame for ordered window)
SUM(total) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

-- Centered 5-row moving average
AVG(total) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)

-- Grand total (entire partition)
SUM(total) OVER ()

6.8 Named Windows

-- Define a window once, reuse it (PostgreSQL and newer SQLite)
SELECT
    product_name,
    category,
    unit_price,
    ROW_NUMBER() OVER w AS rn,
    RANK() OVER w AS rnk,
    SUM(unit_price) OVER w AS running_total
FROM products
WINDOW w AS (PARTITION BY category ORDER BY unit_price DESC);

Exercise 6.1

Write a query that ranks all products by price within each category using DENSE_RANK. Return the product_name, category, unit_price, and rank.

Exercise 6.2

Using LAG, write a query that shows month-over-month revenue growth as a percentage. Include the month, revenue, previous month's revenue, and the percentage change.

Exercise 6.3

Write a query that computes a running total of order amounts for each customer, partitioned by customer_id and ordered by order_date.

Exercise 6.4

Use the CTE + ROW_NUMBER pattern to find the single most expensive product in each category. Return the category, product_name, and unit_price.

Official Resources

Chapter 6 Takeaways

← Chapter 5: Subqueries & CTEs Chapter 7: Data Manipulation →