Chapter 1: Getting Started

Installation, IDEs, and Your First Script

1.1 Why Python for Analytics?

Python has become the default language for data science in industry and academia. It is open source, has a massive ecosystem of analytics libraries, and reads almost like pseudocode. Whether you plan to work in consulting, supply chain, finance, or tech, Python literacy gives you a significant advantage.

Key advantage: Python's ecosystem includes pandas for data wrangling, scikit-learn for machine learning, matplotlib for visualization, and statsmodels for econometrics, all within a single language.

1.2 Installing Python

The two most common installation routes are Anaconda (recommended for beginners) and pip with python.org. Anaconda bundles Python with 250+ data science packages and the conda package manager.

Option A: Anaconda (Recommended)

Download the installer from anaconda.com/download. After installation, open the Anaconda Navigator to launch Jupyter Notebook or Spyder.

# Verify installation in your terminal
conda --version
python --version

Option B: pip + python.org

Download from python.org/downloads. After installing, use pip to add packages as needed.

# Install common data science packages
pip install numpy pandas matplotlib seaborn scikit-learn statsmodels jupyter

1.3 Virtual Environments

A virtual environment is an isolated Python installation that keeps each project's packages separate. Without environments, installing a package for one project can break another project that depends on a different version. Getting into the habit of creating one environment per project prevents "dependency hell" as your projects grow.

Using venv (Built-in)

Python ships with venv in the standard library. It creates a lightweight directory containing a private copy of the Python interpreter and its own pip.

# Create a virtual environment called 'myenv'
python -m venv myenv

# Activate it
# macOS / Linux:
source myenv/bin/activate

# Windows:
myenv\Scripts\activate

# Your prompt changes to show the active env
(myenv) $ pip install pandas numpy

# Deactivate when done
deactivate

Using conda Environments

If you installed Anaconda or Miniconda, conda manages environments and can install non-Python dependencies (C libraries, R, etc.) that pip cannot handle.

# Create a conda environment with a specific Python version
conda create -n analytics python=3.11

# Activate it
conda activate analytics

# Install packages (conda can mix conda + pip)
conda install pandas numpy scikit-learn
pip install statsmodels

# List all environments
conda env list

# Export environment for reproducibility
conda env export > environment.yml

# Recreate from file on another machine
conda env create -f environment.yml

pip vs conda: Quick Comparison

Featurepipconda
SourcePyPI (Python Package Index)Anaconda / conda-forge channels
Language supportPython onlyPython, R, C/C++, Julia
Environment toolvenv (separate tool)Built-in (conda create)
Dependency resolverBasic (improved in recent versions)Full SAT solver, handles C-level deps
SpeedGenerally faster for pure-Python packagesSlower to solve, but installs pre-built binaries
Best forLightweight projects, web apps, CI/CDData science, packages with C extensions
Rule of thumb: Use conda if you are doing data science and need packages like NumPy, SciPy, or TensorFlow that have complex C dependencies. Use pip + venv for lightweight scripting projects. You can also mix them: create a conda environment, then pip-install packages that are not available on conda channels.

1.4 Choosing an IDE

ToolBest ForKey Feature
Jupyter NotebookExploration, EDA, reportsCell-by-cell execution with inline plots
VS CodeLarger projects, scriptsDebugger, Git integration, extensions
Google ColabCloud, collaborationFree GPU, zero setup, shareable links
PyCharmProfessional software developmentIntelligent code completion, refactoring tools
SpyderScientific computing, R-like workflowVariable explorer, MATLAB-like layout
Recommendation: Start with Jupyter Notebook for interactive exploration. Move to VS Code when writing reusable scripts and modules. If you are transitioning from MATLAB or R, Spyder's variable explorer and console layout may feel most familiar.

Jupyter Magic Commands

Jupyter notebooks support "magic" commands (prefixed with % or !) that are not regular Python. These are useful for timing code, installing packages, or configuring plots without leaving the notebook.

# Time a single statement (runs it many times and reports average)
%timeit sum(range(10000))

# Time an entire cell (put %%timeit at the top of the cell)
%%timeit
total = 0
for i in range(10000):
    total += i

# Display matplotlib plots inline (run once at top of notebook)
%matplotlib inline

# Install a package directly from a notebook cell
!pip install seaborn

# Run a shell command
!ls *.csv

# Show all variables in the current namespace
%whos

# Reset all variables (start fresh)
%reset -f
Why %timeit matters: As a data analyst, you will often compare two approaches to see which is faster. The %timeit magic runs your code thousands of times and reports the average, giving you a reliable benchmark rather than a single noisy measurement.

1.5 Your First Script

Create a new file called hello.py or open a Jupyter cell and type the following:

# hello.py — Your first Python script
print("Hello, Data Science!")

# Variables and basic arithmetic
revenue = 150000
cost    = 95000
profit  = revenue - cost

print(f"Revenue: ${revenue:,}")
print(f"Cost:    ${cost:,}")
print(f"Profit:  ${profit:,}")
Output: Hello, Data Science!
Revenue: $150,000
Cost: $95,000
Profit: $55,000

1.6 Variables and print()

Python variables do not need type declarations. The interpreter infers the type from the assigned value. Use print() to display output and f-strings for formatting.

# Variable assignment — no type declaration needed
product_name = "Widget A"       # str
units_sold   = 1250              # int
unit_price   = 29.99             # float
in_stock     = True              # bool

# f-string formatting
total = units_sold * unit_price
print(f"{product_name}: sold {units_sold} units at ${unit_price} each")
print(f"Total revenue: ${total:,.2f}")

# Check type
print(type(unit_price))  # <class 'float'>

1.7 type() and Type Checking

Since Python is dynamically typed, variables can change type at any time. The type() function tells you what type a value currently holds, and isinstance() checks whether a value belongs to a specific type. Understanding types helps you debug unexpected behavior, especially when reading data from files where numbers might arrive as strings.

quantity = 42
print(type(quantity))           # <class 'int'>

quantity = "forty-two"
print(type(quantity))           # <class 'str'>  — type changed!

# isinstance() is preferred for type checking
price = 19.99
print(isinstance(price, float))      # True
print(isinstance(price, (int, float))) # True (check multiple types)

# Type conversion (casting)
user_input = "150"
units = int(user_input)        # Convert string to int
rate  = float("3.14")          # Convert string to float
label = str(2024)              # Convert int to string
Why this matters: When you read a CSV file without pandas, all values arrive as strings. You must convert numeric columns explicitly with int() or float() before doing arithmetic. Pandas handles this automatically, but understanding manual conversion helps you debug cases where it does not.

1.8 User Input with input()

The input() function pauses the program and waits for the user to type something. It always returns a string, so you need to convert the result if you expect a number.

# input() always returns a string
name = input("Enter product name: ")
qty  = int(input("Enter quantity: "))
cost = float(input("Enter unit cost: "))

total = qty * cost
print(f"Order for {qty} x {name}: ${total:,.2f}")
Practical note: In data analytics scripts, input() is rarely used because data comes from files, databases, or APIs. However, it is valuable when building quick command-line tools for teammates who are not comfortable editing code directly. For example, you might write a script that asks "Enter the forecast horizon (weeks):" and then runs the appropriate analysis.

1.9 Comments and Code Style

Good code is readable code. Follow PEP 8 conventions: use snake_case for variables, add comments to explain why (not what), and keep lines under 79 characters.

# GOOD: descriptive variable names
order_quantity = 500
lead_time_days = 14

# AVOID: cryptic single-letter names
# q = 500
# l = 14

Exercise 1.1

Create a script that calculates the Economic Order Quantity (EOQ). Define variables for annual demand (10,000 units), ordering cost ($50 per order), and holding cost ($2 per unit per year). Compute EOQ = sqrt(2 * D * S / H) and print the result with a formatted message.

Hint: import math and use math.sqrt(), or use the ** 0.5 exponent.

Exercise 1.2

Open a Jupyter Notebook and create three cells: (1) import the math module, (2) define a variable radius = 5 and compute the area of a circle, (3) print the result. Run each cell individually and observe the output.

1.10 Common Beginner Errors

Every beginner encounters the same handful of errors. Recognizing them quickly will save you hours of frustration. Read the error message carefully, starting from the bottom line, which tells you the type of error and a brief description.

# 1. IndentationError — Python uses whitespace to define code blocks
if True:
print("oops")  # IndentationError: expected an indented block

# Fix: indent the body of if/for/while/def with 4 spaces
if True:
    print("fixed")

# 2. NameError — using a variable before defining it
print(total_cost)  # NameError: name 'total_cost' is not defined

# 3. TypeError — operating on incompatible types
result = "Price: " + 29.99  # TypeError: can only concatenate str to str
result = "Price: " + str(29.99)  # Fix: convert to string first

# 4. SyntaxError — forgetting colons, quotes, or parentheses
if x > 5     # SyntaxError: expected ':'
print("hello"  # SyntaxError: unexpected EOF (missing closing paren)
Read errors bottom-up: Python tracebacks show the chain of function calls, with the actual error at the very bottom. Start reading there. The line number tells you where the error occurred, and the error type (NameError, TypeError, etc.) tells you what went wrong. Googling the exact error message almost always leads to a helpful Stack Overflow answer.
# 5. IndexError — accessing a list position that doesn't exist
items = ["A", "B", "C"]
print(items[5])  # IndexError: list index out of range

# 6. ModuleNotFoundError — package not installed
import pandas  # ModuleNotFoundError: No module named 'pandas'
# Fix: pip install pandas  (in your terminal, not in Python)

# 7. Mutable default argument trap (subtle!)
def add_item(item, basket=[]):  # BAD: default list is shared
    basket.append(item)
    return basket

print(add_item("apple"))   # ['apple']
print(add_item("banana"))  # ['apple', 'banana'] — unexpected!

# Fix: use None as default
def add_item(item, basket=None):
    if basket is None:
        basket = []
    basket.append(item)
    return basket

Exercise 1.3

Write a script that uses input() to ask the user for a product name, a unit price, and a quantity. Compute the total cost and print a formatted receipt. Handle the case where the user enters a non-numeric price by wrapping the conversion in a try/except block and printing a helpful error message.

Hint: use try: price = float(input(...)) and except ValueError: print("Please enter a number").

Exercise 1.4

Create a virtual environment using python -m venv, activate it, install numpy, and write a three-line script that imports numpy, creates an array of [10, 20, 30], and prints its mean. Then deactivate the environment. Document each terminal command you used.

Official Resources

Chapter 1 Takeaways

← Guide Home Chapter 2: Data Types →