Indexes, EXPLAIN, and Performance Tuning
On small tables, every query runs fast. On tables with millions of rows, the difference between a well-optimized query and a naive one can be seconds versus minutes. Understanding indexes and query plans lets you write queries that scale.
An index is a data structure (typically a B-tree) that lets the database find rows without scanning the entire table. Think of it like the index at the back of a textbook: instead of reading every page, you look up the topic and jump directly to the right page.
-- Create an index on a frequently filtered column CREATE INDEX idx_orders_customer ON orders(customer_id); -- Composite index (for queries that filter on both columns) CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id); -- Unique index (enforces uniqueness + speeds lookups) CREATE UNIQUE INDEX idx_customers_email ON customers(email); -- List all indexes on a table (SQLite) .indexes orders -- Drop an index DROP INDEX idx_orders_customer;
-- SQLite: EXPLAIN QUERY PLAN shows how the query will execute EXPLAIN QUERY PLAN SELECT * FROM orders WHERE customer_id = 42; -- Without index: SCAN orders (reads every row) -- With index: SEARCH orders USING INDEX idx_orders_customer -- PostgreSQL: EXPLAIN ANALYZE (executes and reports actual timing) -- EXPLAIN ANALYZE -- SELECT * FROM orders WHERE customer_id = 42;
| Scenario | Index Recommendation |
|---|---|
| WHERE column = value | Single-column index on that column |
| WHERE a = x AND b = y | Composite index on (a, b) |
| JOIN ON a.id = b.foreign_id | Index on the foreign key column |
| ORDER BY column | Index on that column (avoids sorting) |
| WHERE a = x ORDER BY b | Composite index on (a, b) |
| Rarely queried column | No index (saves write overhead) |
-- BAD: function on indexed column disables the index SELECT * FROM orders WHERE STRFTIME('%Y', order_date) = '2024'; -- Database cannot use index on order_date -- GOOD: rewrite to preserve index usage SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'; -- BAD: SELECT * when you only need two columns SELECT * FROM orders; -- reads all columns from disk -- GOOD: name only the columns you need SELECT order_id, total FROM orders; -- BAD: implicit type conversion SELECT * FROM products WHERE product_id = '42'; -- string compared to integer column -- GOOD: match the column type SELECT * FROM products WHERE product_id = 42;
A view is a saved query that acts like a virtual table. It does not store data; it re-executes the underlying query each time you reference it.
-- Create a view for frequently used report CREATE VIEW monthly_revenue AS SELECT STRFTIME('%Y-%m', order_date) AS month, COUNT(*) AS order_count, SUM(total) AS revenue FROM orders GROUP BY month; -- Query the view like a table SELECT * FROM monthly_revenue WHERE revenue > 10000; -- Drop a view DROP VIEW monthly_revenue; -- PostgreSQL: materialized views (cached, need manual refresh) -- CREATE MATERIALIZED VIEW monthly_revenue AS ... -- REFRESH MATERIALIZED VIEW monthly_revenue;
-- Create a temp table (exists only for the current session) CREATE TEMP TABLE high_value_customers AS SELECT customer_id, SUM(total) AS lifetime_value FROM orders GROUP BY customer_id HAVING SUM(total) > 1000; -- Use it in subsequent queries SELECT c.first_name, c.last_name, hvc.lifetime_value FROM high_value_customers hvc INNER JOIN customers c ON hvc.customer_id = c.customer_id;
-- EXISTS is often faster than IN for large subqueries -- Instead of: SELECT * FROM products WHERE product_id IN (SELECT product_id FROM order_items); -- Use: SELECT * FROM products p WHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id ); -- Avoid correlated subqueries in SELECT when a JOIN works -- Instead of: SELECT c.first_name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS cnt FROM customers c; -- Use: SELECT c.first_name, COUNT(o.order_id) AS cnt FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.first_name;
Create a products table with 1000+ rows (use a recursive CTE to generate data). Run EXPLAIN QUERY PLAN SELECT * FROM products WHERE category = 'Electronics' before and after adding an index on category. Compare the output.
Create a view called customer_summary that shows each customer's name, total number of orders, total spend, and average order value. Query the view to find the top 10 customers by total spend.
Identify a query you wrote in a previous chapter that could benefit from an index. Add the index, run EXPLAIN QUERY PLAN before and after, and document the difference.