Summarizing Data
Raw detail rows are useful, but business decisions require summaries. How much revenue did we earn last quarter? Which product category sells best? Aggregation functions collapse many rows into a single summary value, and GROUP BY produces those summaries for each group independently.
SQL provides five core aggregate functions. Each takes a set of values from multiple rows and returns one result.
-- Basic aggregate functions on the orders table SELECT COUNT(*) AS total_orders, SUM(total) AS total_revenue, AVG(total) AS avg_order_value, MIN(total) AS smallest_order, MAX(total) AS largest_order FROM orders WHERE order_date >= '2024-01-01';
COUNT behaves differently depending on its argument. Understanding the three forms prevents subtle bugs.
SELECT COUNT(*) AS total_rows, COUNT(email) AS has_email, COUNT(DISTINCT city) AS unique_cities FROM customers;
COUNT(*) counts every row, including those with NULLs. COUNT(column) counts only non-NULL values in that column. COUNT(DISTINCT column) counts unique non-NULL values. If email is NULL for some customers, COUNT(email) excludes them.
GROUP BY splits the result set into groups. Each group gets its own aggregate calculation.
-- Revenue by product category SELECT p.category, COUNT(*) AS num_orders, SUM(oi.quantity * oi.unit_price) AS total_revenue, AVG(oi.quantity * oi.unit_price) AS avg_line_total FROM order_items oi INNER JOIN products p ON oi.product_id = p.product_id GROUP BY p.category ORDER BY total_revenue DESC; -- Grouping by multiple columns: orders per customer per month SELECT customer_id, STRFTIME('%Y-%m', order_date) AS month, COUNT(*) AS order_count, SUM(total) AS monthly_spend FROM orders GROUP BY customer_id, STRFTIME('%Y-%m', order_date) ORDER BY customer_id, month;
Every non-aggregated column in your SELECT list must appear in the GROUP BY clause. Violating this rule produces an error in standard SQL (though MySQL historically allowed it with unpredictable results).
WHERE filters individual rows before grouping. HAVING filters groups after aggregation. This distinction matters because you cannot reference aggregate results in a WHERE clause.
-- Categories with more than $500 in total sales SELECT p.category, SUM(oi.quantity * oi.unit_price) AS total_sales, COUNT(*) AS items_sold FROM order_items oi INNER JOIN products p ON oi.product_id = p.product_id WHERE oi.unit_price > 0 -- row-level filter GROUP BY p.category HAVING SUM(oi.quantity * oi.unit_price) > 500 -- group-level filter ORDER BY total_sales DESC; -- Customers who placed at least 3 orders SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spend FROM orders GROUP BY customer_id HAVING COUNT(*) >= 3;
All aggregate functions except COUNT(*) ignore NULL values. This can produce surprising results if you are not careful.
-- Suppose some customers have NULL email SELECT COUNT(*) AS total_customers, COUNT(email) AS with_email, COUNT(*) - COUNT(email) AS missing_email FROM customers; -- SUM of an all-NULL group returns NULL, not zero -- Wrap in COALESCE to get zero instead SELECT category, COALESCE(SUM(in_stock), 0) AS total_stock FROM products GROUP BY category;
You can group by computed expressions, not just raw columns. This is especially useful for date-based summaries and conditional aggregation.
-- Monthly revenue summary (SQLite date function) SELECT STRFTIME('%Y-%m', order_date) AS month, SUM(total) AS monthly_revenue, COUNT(*) AS num_orders FROM orders GROUP BY STRFTIME('%Y-%m', order_date) ORDER BY month; -- Conditional aggregation with CASE inside SUM SELECT STRFTIME('%Y-%m', order_date) AS month, COUNT(*) AS total_orders, SUM(CASE WHEN total >= 100 THEN 1 ELSE 0 END) AS large_orders, ROUND( 100.0 * SUM(CASE WHEN total >= 100 THEN 1 ELSE 0 END) / COUNT(*), 1 ) AS large_order_pct FROM orders GROUP BY STRFTIME('%Y-%m', order_date);
Standard GROUP BY produces one level of grouping. ROLLUP adds subtotals and a grand total. CUBE generates all possible subtotal combinations. These are supported in PostgreSQL, SQL Server, and Oracle (not SQLite).
-- PostgreSQL: ROLLUP for subtotals by category, then grand total SELECT COALESCE(category, 'ALL CATEGORIES') AS category, SUM(in_stock) AS total_stock, COUNT(*) AS num_products FROM products GROUP BY ROLLUP(category) ORDER BY category;
ROLLUP(A, B) produces grouping sets: (A, B), (A), and (). CUBE(A, B) produces: (A, B), (A), (B), and ().
Sometimes you need to concatenate text values within a group rather than count or sum them.
-- SQLite: list all product names per category SELECT category, COUNT(*) AS num_products, GROUP_CONCAT(product_name, ', ') AS product_list FROM products GROUP BY category; -- PostgreSQL equivalent: STRING_AGG -- SELECT category, STRING_AGG(product_name, ', ' ORDER BY product_name) -- FROM products GROUP BY category;
Write a query that returns, for each category in the products table: the number of products, the average unit price, the most expensive product price, and the total inventory value (SUM of unit_price * in_stock). Order by total inventory value descending.
Using the orders table, find all customers who have placed more than 2 orders and whose total spending exceeds $200. Return the customer_id, order count, and total spend.
Write a single query that shows, for each month: total revenue, number of orders, and the percentage of orders with total above $100. Use CASE inside an aggregate for the percentage calculation.
Use GROUP_CONCAT (SQLite) to create a query that shows each customer_id alongside a comma-separated list of their order dates.