Chapter 8: Database Design

Normalization, Constraints, and Schema Planning

8.1 The Relational Model

A relational database stores data in tables (relations). Each table has a fixed set of columns with defined data types and a collection of rows. A primary key uniquely identifies each row. The relational model enforces structure, which makes querying predictable and efficient.

8.2 Data Types

CategorySQLitePostgreSQLUse Case
IntegerINTEGERSMALLINT / INTEGER / BIGINTIDs, counts, boolean flags
DecimalREALNUMERIC(p,s) / DOUBLE PRECISIONPrices, rates, measurements
TextTEXTVARCHAR(n) / TEXTNames, descriptions, codes
Date/TimeTEXT (ISO 8601)DATE / TIMESTAMP / INTERVALOrder dates, durations
BooleanINTEGER (0/1)BOOLEANActive flags, feature toggles
BinaryBLOBBYTEAFiles, images (rarely used)
Use NUMERIC for money: REAL/FLOAT can introduce rounding errors (e.g., 0.1 + 0.2 = 0.30000000000000004). In PostgreSQL, use NUMERIC(10,2) for prices and financial calculations. In SQLite, store cents as INTEGER (e.g., $29.99 = 2999) to avoid floating-point issues.

8.3 Primary Keys and Foreign Keys

-- Primary key: uniquely identifies each row
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    first_name  TEXT NOT NULL,
    last_name   TEXT NOT NULL,
    email       TEXT UNIQUE
);

-- Foreign key: references another table's primary key
CREATE TABLE orders (
    order_id    INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date  TEXT NOT NULL,
    total       REAL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Composite primary key (for junction/bridge tables)
CREATE TABLE order_items (
    order_id   INTEGER,
    product_id INTEGER,
    quantity   INTEGER NOT NULL DEFAULT 1,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id)   REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);
SQLite foreign keys: SQLite does not enforce foreign keys by default. You must enable them for each connection: PRAGMA foreign_keys = ON;. Without this, you can insert an order with a customer_id that does not exist in the customers table.

8.4 Constraints

CREATE TABLE employees (
    emp_id      INTEGER PRIMARY KEY,
    name        TEXT NOT NULL,                      -- cannot be NULL
    email       TEXT UNIQUE,                        -- no duplicates
    department  TEXT DEFAULT 'General',             -- default value
    salary      REAL CHECK(salary > 0),             -- must be positive
    hire_date   TEXT NOT NULL,
    manager_id  INTEGER,
    FOREIGN KEY (manager_id) REFERENCES employees(emp_id)
);

8.5 Normalization

Normalization organizes tables to reduce redundancy and prevent update anomalies. The three normal forms (1NF, 2NF, 3NF) provide a roadmap.

First Normal Form (1NF)

Each cell contains a single value; each row is unique. Violated when a column holds comma-separated lists.

-- BAD: violates 1NF (multiple values in one cell)
-- | order_id | products          |
-- | 1        | Widget A, Cable C  |

-- GOOD: separate table for order items
-- orders: order_id, customer_id, order_date
-- order_items: order_id, product_id, quantity

Second Normal Form (2NF)

Every non-key column depends on the entire primary key (relevant for composite keys).

-- BAD: product_name depends only on product_id, not on order_id
-- | order_id | product_id | product_name | quantity |
-- Splitting into orders, products, and order_items fixes this.

Third Normal Form (3NF)

No transitive dependencies: non-key columns depend only on the primary key, not on other non-key columns.

-- BAD: city depends on zip_code, not on customer_id
-- | customer_id | zip_code | city     |

-- GOOD: separate zip_codes table
CREATE TABLE zip_codes (
    zip_code TEXT PRIMARY KEY,
    city     TEXT,
    state    TEXT
);
When to denormalize: Normalization reduces redundancy but increases the number of joins. For analytics workloads (OLAP), a denormalized star schema with fewer joins is often faster. Normalize for transactional systems (OLTP); denormalize for reporting and analytics.

8.6 Schema Design Patterns

Star Schema (Data Warehousing)

-- Fact table: one row per event/transaction
CREATE TABLE fact_sales (
    sale_id     INTEGER PRIMARY KEY,
    date_key    INTEGER,
    product_key INTEGER,
    store_key   INTEGER,
    quantity    INTEGER,
    revenue     REAL
);

-- Dimension tables: descriptive attributes
CREATE TABLE dim_date (
    date_key    INTEGER PRIMARY KEY,
    full_date   TEXT,
    year        INTEGER,
    quarter     INTEGER,
    month       INTEGER,
    day_of_week TEXT
);

CREATE TABLE dim_product (
    product_key  INTEGER PRIMARY KEY,
    product_name TEXT,
    category     TEXT,
    brand        TEXT
);

8.7 ALTER TABLE

-- Add a column
ALTER TABLE customers ADD COLUMN phone TEXT;

-- Rename a column (SQLite 3.25+)
ALTER TABLE customers RENAME COLUMN phone TO phone_number;

-- Rename a table
ALTER TABLE customers RENAME TO clients;

-- Drop a column (SQLite 3.35+, PostgreSQL)
ALTER TABLE clients DROP COLUMN phone_number;

Exercise 8.1

Design a schema for a university course registration system. Create tables for students, courses, instructors, and enrollments. Define appropriate primary keys, foreign keys, and constraints. Draw the relationships on paper before writing the CREATE TABLE statements.

Exercise 8.2

Given a flat table with columns (order_id, customer_name, customer_email, product_name, product_price, quantity, order_date), normalize it into 3NF. Create the appropriate tables with proper keys and constraints.

Exercise 8.3

Design a star schema for a retail analytics database. Create a fact table (fact_sales) and at least three dimension tables (dim_date, dim_product, dim_store). Populate each with sample data and write a query that answers: "What was the total revenue per product category per quarter?"

Official Resources

Chapter 8 Takeaways

← Chapter 7: Data Manipulation Chapter 9: Query Optimization →