Installation, IDEs, and Your First Script
Python has become the default language for data science in industry and academia. It is open source, has a massive ecosystem of analytics libraries, and reads almost like pseudocode. Whether you plan to work in consulting, supply chain, finance, or tech, Python literacy gives you a significant advantage.
The two most common installation routes are Anaconda (recommended for beginners) and pip with python.org. Anaconda bundles Python with 250+ data science packages and the conda package manager.
Download the installer from anaconda.com/download. After installation, open the Anaconda Navigator to launch Jupyter Notebook or Spyder.
# Verify installation in your terminal
conda --version
python --version
Download from python.org/downloads. After installing, use pip to add packages as needed.
# Install common data science packages
pip install numpy pandas matplotlib seaborn scikit-learn statsmodels jupyter
A virtual environment is an isolated Python installation that keeps each project's packages separate. Without environments, installing a package for one project can break another project that depends on a different version. Getting into the habit of creating one environment per project prevents "dependency hell" as your projects grow.
Python ships with venv in the standard library. It creates a lightweight directory containing a private copy of the Python interpreter and its own pip.
# Create a virtual environment called 'myenv' python -m venv myenv # Activate it # macOS / Linux: source myenv/bin/activate # Windows: myenv\Scripts\activate # Your prompt changes to show the active env (myenv) $ pip install pandas numpy # Deactivate when done deactivate
If you installed Anaconda or Miniconda, conda manages environments and can install non-Python dependencies (C libraries, R, etc.) that pip cannot handle.
# Create a conda environment with a specific Python version conda create -n analytics python=3.11 # Activate it conda activate analytics # Install packages (conda can mix conda + pip) conda install pandas numpy scikit-learn pip install statsmodels # List all environments conda env list # Export environment for reproducibility conda env export > environment.yml # Recreate from file on another machine conda env create -f environment.yml
| Feature | pip | conda |
|---|---|---|
| Source | PyPI (Python Package Index) | Anaconda / conda-forge channels |
| Language support | Python only | Python, R, C/C++, Julia |
| Environment tool | venv (separate tool) | Built-in (conda create) |
| Dependency resolver | Basic (improved in recent versions) | Full SAT solver, handles C-level deps |
| Speed | Generally faster for pure-Python packages | Slower to solve, but installs pre-built binaries |
| Best for | Lightweight projects, web apps, CI/CD | Data science, packages with C extensions |
| Tool | Best For | Key Feature |
|---|---|---|
| Jupyter Notebook | Exploration, EDA, reports | Cell-by-cell execution with inline plots |
| VS Code | Larger projects, scripts | Debugger, Git integration, extensions |
| Google Colab | Cloud, collaboration | Free GPU, zero setup, shareable links |
| PyCharm | Professional software development | Intelligent code completion, refactoring tools |
| Spyder | Scientific computing, R-like workflow | Variable explorer, MATLAB-like layout |
Jupyter notebooks support "magic" commands (prefixed with % or !) that are not regular Python. These are useful for timing code, installing packages, or configuring plots without leaving the notebook.
# Time a single statement (runs it many times and reports average) %timeit sum(range(10000)) # Time an entire cell (put %%timeit at the top of the cell) %%timeit total = 0 for i in range(10000): total += i # Display matplotlib plots inline (run once at top of notebook) %matplotlib inline # Install a package directly from a notebook cell !pip install seaborn # Run a shell command !ls *.csv # Show all variables in the current namespace %whos # Reset all variables (start fresh) %reset -f
%timeit magic runs your code thousands of times and reports the average, giving you a reliable benchmark rather than a single noisy measurement.
Create a new file called hello.py or open a Jupyter cell and type the following:
# hello.py — Your first Python script print("Hello, Data Science!") # Variables and basic arithmetic revenue = 150000 cost = 95000 profit = revenue - cost print(f"Revenue: ${revenue:,}") print(f"Cost: ${cost:,}") print(f"Profit: ${profit:,}")
Python variables do not need type declarations. The interpreter infers the type from the assigned value. Use print() to display output and f-strings for formatting.
# Variable assignment — no type declaration needed product_name = "Widget A" # str units_sold = 1250 # int unit_price = 29.99 # float in_stock = True # bool # f-string formatting total = units_sold * unit_price print(f"{product_name}: sold {units_sold} units at ${unit_price} each") print(f"Total revenue: ${total:,.2f}") # Check type print(type(unit_price)) # <class 'float'>
Since Python is dynamically typed, variables can change type at any time. The type() function tells you what type a value currently holds, and isinstance() checks whether a value belongs to a specific type. Understanding types helps you debug unexpected behavior, especially when reading data from files where numbers might arrive as strings.
quantity = 42 print(type(quantity)) # <class 'int'> quantity = "forty-two" print(type(quantity)) # <class 'str'> — type changed! # isinstance() is preferred for type checking price = 19.99 print(isinstance(price, float)) # True print(isinstance(price, (int, float))) # True (check multiple types) # Type conversion (casting) user_input = "150" units = int(user_input) # Convert string to int rate = float("3.14") # Convert string to float label = str(2024) # Convert int to string
int() or float() before doing arithmetic. Pandas handles this automatically, but understanding manual conversion helps you debug cases where it does not.
The input() function pauses the program and waits for the user to type something. It always returns a string, so you need to convert the result if you expect a number.
# input() always returns a string name = input("Enter product name: ") qty = int(input("Enter quantity: ")) cost = float(input("Enter unit cost: ")) total = qty * cost print(f"Order for {qty} x {name}: ${total:,.2f}")
input() is rarely used because data comes from files, databases, or APIs. However, it is valuable when building quick command-line tools for teammates who are not comfortable editing code directly. For example, you might write a script that asks "Enter the forecast horizon (weeks):" and then runs the appropriate analysis.
Good code is readable code. Follow PEP 8 conventions: use snake_case for variables, add comments to explain why (not what), and keep lines under 79 characters.
# GOOD: descriptive variable names order_quantity = 500 lead_time_days = 14 # AVOID: cryptic single-letter names # q = 500 # l = 14
Create a script that calculates the Economic Order Quantity (EOQ). Define variables for annual demand (10,000 units), ordering cost ($50 per order), and holding cost ($2 per unit per year). Compute EOQ = sqrt(2 * D * S / H) and print the result with a formatted message.
Hint: import math and use math.sqrt(), or use the ** 0.5 exponent.
Open a Jupyter Notebook and create three cells: (1) import the math module, (2) define a variable radius = 5 and compute the area of a circle, (3) print the result. Run each cell individually and observe the output.
Every beginner encounters the same handful of errors. Recognizing them quickly will save you hours of frustration. Read the error message carefully, starting from the bottom line, which tells you the type of error and a brief description.
# 1. IndentationError — Python uses whitespace to define code blocks if True: print("oops") # IndentationError: expected an indented block # Fix: indent the body of if/for/while/def with 4 spaces if True: print("fixed") # 2. NameError — using a variable before defining it print(total_cost) # NameError: name 'total_cost' is not defined # 3. TypeError — operating on incompatible types result = "Price: " + 29.99 # TypeError: can only concatenate str to str result = "Price: " + str(29.99) # Fix: convert to string first # 4. SyntaxError — forgetting colons, quotes, or parentheses if x > 5 # SyntaxError: expected ':' print("hello" # SyntaxError: unexpected EOF (missing closing paren)
# 5. IndexError — accessing a list position that doesn't exist items = ["A", "B", "C"] print(items[5]) # IndexError: list index out of range # 6. ModuleNotFoundError — package not installed import pandas # ModuleNotFoundError: No module named 'pandas' # Fix: pip install pandas (in your terminal, not in Python) # 7. Mutable default argument trap (subtle!) def add_item(item, basket=[]): # BAD: default list is shared basket.append(item) return basket print(add_item("apple")) # ['apple'] print(add_item("banana")) # ['apple', 'banana'] — unexpected! # Fix: use None as default def add_item(item, basket=None): if basket is None: basket = [] basket.append(item) return basket
Write a script that uses input() to ask the user for a product name, a unit price, and a quantity. Compute the total cost and print a formatted receipt. Handle the case where the user enters a non-numeric price by wrapping the conversion in a try/except block and printing a helpful error message.
Hint: use try: price = float(input(...)) and except ValueError: print("Please enter a number").
Create a virtual environment using python -m venv, activate it, install numpy, and write a three-line script that imports numpy, creates an array of [10, 20, 30], and prints its mean. Then deactivate the environment. Document each terminal command you used.
%timeit and %matplotlib inline streamline common tasks.type() and isinstance() to inspect types when debugging.