Chapter 2: Data Types & Control Flow

Variables, Collections, Loops, and Conditionals

2.1 Numeric Types: int and float

Python distinguishes integers (whole numbers) from floating-point numbers (decimals). Arithmetic follows standard precedence, and division with / always returns a float.

units   = 500          # int
price   = 24.99        # float
revenue = units * price  # 12495.0 (float)

# Integer division and modulo
boxes   = units // 12   # 41 (floor division)
leftover = units % 12   # 8  (remainder)

print(f"{boxes} full boxes, {leftover} loose units")

2.2 Strings

Strings are sequences of characters. They support indexing, slicing, and a rich set of methods for cleaning text data.

sku = "  WDG-2024-A  "

# Common string methods
print(sku.strip())        # "WDG-2024-A"
print(sku.strip().lower()) # "wdg-2024-a"
print(sku.strip().split("-"))  # ['WDG', '2024', 'A']

# Slicing: string[start:stop:step]
code = "ABCDEFGH"
print(code[0:3])   # "ABC"
print(code[::2])    # "ACEG"

Essential String Methods

Strings are immutable in Python, so every method returns a new string rather than modifying the original. This table covers the methods you will use most often when cleaning real-world data.

raw = "  Hello, World!  "

# Whitespace removal
print(raw.strip())       # "Hello, World!"  — both sides
print(raw.lstrip())      # "Hello, World!  " — left only
print(raw.rstrip())      # "  Hello, World!" — right only

# Search and replace
msg = "Order #1234 shipped"
print(msg.replace("shipped", "delivered"))  # "Order #1234 delivered"
print(msg.find("#"))                       # 6 (index of first match)
print(msg.startswith("Order"))             # True
print(msg.endswith("shipped"))              # True

# Split and join — critical for parsing data
csv_line = "Widget,29.99,500,East"
fields = csv_line.split(",")             # ['Widget', '29.99', '500', 'East']
reconstructed = " | ".join(fields)       # "Widget | 29.99 | 500 | East"

# Case conversion
print("hello".upper())     # "HELLO"
print("HELLO".lower())     # "hello"
print("hello world".title()) # "Hello World"

f-string Formatting in Depth

f-strings (formatted string literals, introduced in Python 3.6) are the most readable way to embed expressions inside strings. The format specification mini-language gives you fine control over alignment, padding, and number formatting.

product = "Widget"
price = 1234.5
pct = 0.0873

# Number formatting
print(f"{price:,.2f}")     # "1,234.50"  — commas + 2 decimals
print(f"{pct:.1%}")        # "8.7%"      — percentage format
print(f"{price:.0f}")      # "1234"      — no decimals

# Alignment and padding
print(f"{product:<15}${price:>10,.2f}")  # "Widget          $1,234.50"
print(f"{product:*^20}")                  # "*******Widget*******"

# Expressions inside f-strings
print(f"Tax: ${price * 0.08:,.2f}")   # "Tax: $98.76"

2.3 Booleans

Boolean values (True/False) are the backbone of conditional logic. Comparison operators return booleans.

on_time = True
lead_time = 7

print(lead_time > 5)           # True
print(lead_time == 7 and on_time) # True
print(not on_time)              # False

2.4 Lists

Lists are ordered, mutable collections. They are the workhorse data structure for storing sequences of items.

# Create and modify lists
warehouses = ["Newark", "Chicago", "Dallas"]
warehouses.append("Seattle")
warehouses.insert(1, "Atlanta")

print(warehouses)        # ['Newark', 'Atlanta', 'Chicago', 'Dallas', 'Seattle']
print(len(warehouses))   # 5
print(warehouses[-1])    # "Seattle" (last element)

# Slicing returns a new list
first_three = warehouses[:3]
print(first_three)       # ['Newark', 'Atlanta', 'Chicago']

2.5 Dictionaries

Dictionaries store key-value pairs. They are ideal for lookup tables, configuration settings, and mapping IDs to records.

# Product catalog as a dictionary
product = {
    "sku":    "WDG-2024",
    "name":   "Widget A",
    "price":  29.99,
    "stock":  1250
}

# Access and update
print(product["name"])          # "Widget A"
product["stock"] -= 100        # Sell 100 units
product["category"] = "Hardware"  # Add new key

# Safe access with .get()
weight = product.get("weight", "N/A")  # "N/A" (key missing)

2.6 Tuples

Tuples are immutable sequences. Use them for fixed collections like coordinates, database records, or function return values.

# Warehouse coordinates (lat, lon)
location = (40.7128, -74.0060)
lat, lon = location  # Tuple unpacking
print(f"Latitude: {lat}, Longitude: {lon}")

2.7 Sets and Frozensets

A set is an unordered collection of unique elements. Sets are useful when you need to remove duplicates, test membership quickly, or compute intersections and unions. The frozenset is an immutable version that can be used as a dictionary key or stored in another set.

# Create sets
east_skus  = {"A101", "B202", "C303", "A101"}  # duplicate removed
west_skus  = {"B202", "D404", "E505"}
print(east_skus)  # {'A101', 'B202', 'C303'}

# Set operations
print(east_skus & west_skus)   # {'B202'} — intersection (sold in both)
print(east_skus | west_skus)   # all unique SKUs — union
print(east_skus - west_skus)   # {'A101', 'C303'} — East only

# Membership test (O(1) average, much faster than lists)
print("A101" in east_skus)     # True

# frozenset — immutable, can be a dict key
region_key = frozenset(["East", "Central"])
coverage = {region_key: 0.92}

# Remove duplicates from a list
raw_ids = [1, 2, 2, 3, 3, 3]
unique_ids = list(set(raw_ids))  # [1, 2, 3]

When to use a set vs. a list: If you need to check whether an item exists in the collection, a set is dramatically faster than a list. Checking membership in a list scans every element (O(n)), while a set uses a hash table (O(1)). For a collection of 1 million items, this can mean the difference between microseconds and seconds.

2.8 Conditionals: if / elif / else

service_level = 0.92

if service_level >= 0.95:
    rating = "Excellent"
elif service_level >= 0.90:
    rating = "Good"
elif service_level >= 0.80:
    rating = "Acceptable"
else:
    rating = "Below Standard"

print(f"Service level {service_level:.0%} — Rating: {rating}")

2.9 Loops

Use for to iterate over a known collection and while when you need to loop until a condition changes.

# For loop with enumerate
products = ["Widget", "Gadget", "Bracket"]
for i, name in enumerate(products, start=1):
    print(f"{i}. {name}")

# While loop: reorder point check
inventory = 200
daily_demand = 15
day = 0
while inventory > 50:
    inventory -= daily_demand
    day += 1
print(f"Reorder needed on day {day} (stock: {inventory})")

enumerate() and zip()

Two built-in functions that make loops cleaner and eliminate the need for manual index tracking. enumerate() gives you both the index and the value. zip() pairs elements from two or more iterables together.

# enumerate — avoids manual counter variables
skus = ["A101", "B202", "C303"]
for idx, sku in enumerate(skus):
    print(f"Row {idx}: {sku}")

# zip — iterate over multiple lists in parallel
names    = ["Widget", "Gadget", "Bracket"]
prices   = [29.99, 14.50, 7.25]
stocks   = [500, 320, 750]

for name, price, stock in zip(names, prices, stocks):
    print(f"{name:10} ${price:6.2f}  stock: {stock}")

# Combine enumerate + zip for index + paired values
for i, (name, price) in enumerate(zip(names, prices), 1):
    print(f"{i}. {name}: ${price}")

Why zip matters: Without zip, you would iterate using index variables like for i in range(len(names)) and then access names[i], prices[i], etc. This is error-prone and hard to read. zip produces cleaner code and catches length mismatches early (it stops at the shortest iterable).

2.10 Nested Data Structures

Real-world data often involves structures nested inside other structures: lists of dictionaries (like rows from a database), dictionaries of lists (like column-oriented data), or dictionaries containing other dictionaries. Understanding how to navigate and build these is essential for working with JSON data, API responses, and configuration files.

# List of dictionaries — each dict is a record (row)
orders = [
    {"id": 1001, "product": "Widget", "qty": 50, "region": "East"},
    {"id": 1002, "product": "Gadget", "qty": 120, "region": "West"},
    {"id": 1003, "product": "Widget", "qty": 30, "region": "East"},
]

# Access nested data
print(orders[0]["product"])  # "Widget"

# Loop through records
total_qty = sum(order["qty"] for order in orders)
print(f"Total units ordered: {total_qty}")

# Dictionary of lists — column-oriented layout
columns = {
    "month":   ["Jan", "Feb", "Mar"],
    "revenue": [45000, 52000, 48000],
    "cost":    [30000, 33000, 31000]
}
# Access a column
print(columns["revenue"])  # [45000, 52000, 48000]

2.11 List and Dictionary Comprehensions

A concise way to create lists by combining a loop and an optional condition in a single line. Dictionary comprehensions follow the same pattern but produce key-value pairs.

prices = [12.5, 8.0, 25.0, 3.5, 19.99, 45.0]

# List comprehension: filter and transform
premium = [p for p in prices if p > 15]
print(premium)  # [25.0, 19.99, 45.0]

# Apply 10% discount
discounted = [p * 0.9 for p in prices]
print(discounted)

# Dictionary comprehension — create a lookup table
inventory = {"A": 120, "B": 45, "C": 200, "D": 30}
low_stock = {sku: qty for sku, qty in inventory.items() if qty < 50}
print(low_stock)  # {'B': 45, 'D': 30}

# Invert a dictionary (swap keys and values)
region_codes = {"East": "E", "West": "W", "Central": "C"}
code_to_region = {v: k for k, v in region_codes.items()}
print(code_to_region)  # {'E': 'East', 'W': 'West', 'C': 'Central'}

# Set comprehension
categories = {order["region"] for order in orders}
print(categories)  # {'East', 'West'}

Readability guideline: If a comprehension needs more than one if clause or spans more than about 80 characters, rewrite it as a regular for-loop. Comprehensions are meant to simplify simple patterns, not to compress complex logic into a single unreadable line.

2.12 Error Handling: try / except

When your code encounters an error at runtime (like dividing by zero or converting an invalid string to a number), Python raises an exception and stops. The try/except block lets you catch these errors, handle them gracefully, and keep your program running. This is especially important when processing real-world data that may contain unexpected values.

# Basic try/except
try:
    price = float("not_a_number")
except ValueError:
    print("Could not convert to float — check your data")

# Handling multiple error types
def safe_divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        print("Cannot divide by zero")
        return None
    except TypeError:
        print("Both arguments must be numbers")
        return None

print(safe_divide(10, 0))      # Cannot divide by zero → None
print(safe_divide(10, "a"))   # Both arguments must be numbers → None
print(safe_divide(10, 3))      # 3.333...

# try/except/else/finally — the full pattern
try:
    f = open("data.csv")
except FileNotFoundError:
    print("File not found — check the path")
else:
    print(f"Opened file with {len(f.readlines())} lines")
    f.close()
finally:
    print("This runs no matter what")

Do not catch everything blindly. Writing a bare except: without specifying the error type hides bugs and makes debugging very difficult. Always catch the specific exception types you expect (ValueError, KeyError, FileNotFoundError, etc.).

Exercise 2.1

Given the dictionary inventory = {"A": 120, "B": 45, "C": 200, "D": 30, "E": 95}, write a loop that prints each SKU that has stock below 50 units. Then rewrite it as a dictionary comprehension that returns only the low-stock SKUs with their quantities.

Exercise 2.2

Create a list of monthly sales figures for 12 months. Use a for loop to compute the running cumulative total and store each month's cumulative value in a new list. Print both lists side by side.

Exercise 2.3

Given two lists, products = ["Widget", "Gadget", "Bracket"] and prices = [29.99, 14.50, 7.25], use zip() to create a dictionary mapping product names to prices. Then use a dictionary comprehension to create a new dictionary containing only products that cost more than $10.

Exercise 2.4

Write a function that takes a list of strings representing prices (e.g., ["29.99", "N/A", "14.50", "", "7.25"]) and returns a list of floats, replacing any invalid entries with 0.0. Use try/except inside your loop to handle conversion errors gracefully.

Official Resources

Python Built-in Types External
Python Control Flow Tutorial External
Real Python: List Comprehensions External

Chapter 2 Takeaways

Python has six core data types for analytics: int, float, str, bool, NoneType, and set.
Lists are mutable ordered sequences; dictionaries map keys to values; tuples are immutable; sets store unique elements.
String methods like split(), join(), strip(), and replace() are essential for cleaning real-world data.
enumerate() and zip() eliminate manual index tracking and produce cleaner loops.
List and dictionary comprehensions provide a compact alternative to simple for-loop-plus-append patterns.
Use try/except to handle errors gracefully; always catch specific exception types.
Nested data structures (lists of dicts, dicts of lists) appear constantly when working with JSON and API data.

← Chapter 1: Getting Started Chapter 3: Functions →