Assignment 3a: Package Management Concepts¶
| Author | Robert Frenken |
| Estimated time | 3--4 hours |
| Prerequisites | Assignment 1 completed, Python basics |
What you'll learn
Why package management exists, how dependency resolution works, and when to use pip vs conda vs uv. By the end you'll be able to create reproducible Python environments for any project — and you'll understand why that's a solved problem you no longer have to re-solve.
3a and 3b are assigned together
You have 2 weeks to complete both. 3a is conceptual (package management). 3b is hands-on coding (a neural network from scratch). Both include a blog post deliverable.
Part 0: What is a Package?¶
Python's standard library ships with the language — modules like os, json, math, and pathlib are always available. But most real projects need third-party packages from PyPI (the Python Package Index), a public repository of over 500,000 packages.
When you run pip install requests, pip:
- Queries PyPI for the
requestspackage - Downloads a wheel (
.whl) — a pre-built archive of Python code - Installs it into your environment's
site-packages/directory - Resolves and installs any dependencies that
requestsitself needs
Task¶
Create and activate a fresh virtual environment, then count what's already installed:
You should see only pip and setuptools — a clean slate. Now install one package and count again:
How many packages are installed now?
You asked for 1 package but got several. Where did the extras come from? Keep this question in mind as you continue.
When you're done exploring, deactivate and delete the scratch environment:
- Created a fresh venv and confirmed the minimal package list
- Installed
requestsand counted the extras
Part 1: Why Package Management Matters¶
The dependency chain problem¶
When you install a package, it brings its own dependencies — and those dependencies have dependencies too. This is called a dependency tree.
Here's what happens when you install requests:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4fd', 'primaryTextColor': '#1a1a1a', 'lineColor': '#555'}}}%%
graph TD
A["fa:fa-box requests"]:::process --> B["charset-normalizer"]:::data
A --> C["idna"]:::data
A --> D["urllib3"]:::data
A --> E["certifi"]:::data
classDef process fill:#e8f4fd,stroke:#3b82f6,color:#1a1a1a,stroke-width:2px
classDef data fill:#d1fae5,stroke:#059669,color:#1a1a1a,stroke-width:2px Four dependencies — manageable. Now look at what happens with a large ML framework like PyTorch:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4fd', 'primaryTextColor': '#1a1a1a', 'lineColor': '#555'}}}%%
graph TD
A["fa:fa-fire torch"]:::process --> B["filelock"]:::data
A --> C["typing-extensions"]:::data
A --> D["sympy"]:::data
A --> E["networkx"]:::data
A --> F["jinja2"]:::data
A --> G["fsspec"]:::data
A --> H["fa:fa-microchip nvidia-cuda-runtime-cu12"]:::external
A --> I["fa:fa-microchip nvidia-cudnn-cu12"]:::external
A --> J["fa:fa-microchip nvidia-cublas-cu12"]:::external
A --> K["fa:fa-microchip nvidia-nccl-cu12"]:::external
A --> L["triton"]:::data
D --> M["mpmath"]:::subdata
F --> N["MarkupSafe"]:::subdata
I --> J
classDef process fill:#e8f4fd,stroke:#3b82f6,color:#1a1a1a,stroke-width:2px
classDef data fill:#d1fae5,stroke:#059669,color:#1a1a1a,stroke-width:2px
classDef external fill:#ede9fe,stroke:#7c3aed,color:#1a1a1a,stroke-width:2px
classDef subdata fill:#ecfdf5,stroke:#059669,color:#1a1a1a,stroke-width:2px One pip install torch pulls in 14+ packages including GPU libraries, a symbolic math engine, and a template language. Every one of those packages has its own version requirements, and they all have to be compatible with each other.
How dependency resolution actually works
When you pip install torch, pip does a depth-first traversal of the dependency graph, collecting every package torch needs, then every package those need, and so on. At each step it has to pick a version that satisfies all constraints — if torch needs numpy >= 1.24 and another already-installed package needs numpy < 2, the resolver has to find a version in the intersection.
Modern resolvers (pip ≥ 20.3, uv, conda) use backtracking or SAT solvers: if they hit a dead end, they undo recent choices and try different versions. When no valid combination exists, you get a ResolutionImpossible error — and the only fix is relaxing a constraint somewhere.
Why this matters¶
Imagine Project A needs numpy==1.26.4 and Project B needs numpy==2.0.0. If both are installed into the same Python environment, one project breaks. This is the version conflict problem, and it's the core reason package management tools exist.
Reflection
What could go wrong if two projects share a single Python installation and one of them upgrades a shared dependency?
- Dependency-tree reasoning makes sense to you
- Part 1 reflection answered
Part 2: Virtual Environments¶
A virtual environment (venv) is an isolated Python installation. Each venv has its own site-packages/ directory, so packages installed in one venv don't affect another.
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4fd', 'primaryTextColor': '#1a1a1a', 'lineColor': '#555'}}}%%
graph LR
subgraph "System Python"
SYS["fa:fa-computer python3.12"]:::system
end
subgraph "Project A venv"
A_PY["python3.12"]:::processA
A_NP["numpy 1.26.4"]:::processA
A_PD["pandas 2.1.5"]:::processA
end
subgraph "Project B venv"
B_PY["python3.12"]:::processB
B_NP["numpy 2.0.0"]:::processB
B_SK["scikit-learn 1.5.0"]:::processB
end
SYS -.->|"venv created from"| A_PY
SYS -.->|"venv created from"| B_PY
classDef system fill:#f3f4f6,stroke:#6b7280,color:#1a1a1a,stroke-width:2px
classDef processA fill:#e8f4fd,stroke:#3b82f6,color:#1a1a1a,stroke-width:2px
classDef processB fill:#d1fae5,stroke:#059669,color:#1a1a1a,stroke-width:2px Each project gets exactly the versions it needs. No conflicts.
Hands-on: prove isolation works¶
Create two virtual environments with different numpy versions:
# Environment 1: numpy 1.x
python -m venv env-numpy1
source env-numpy1/bin/activate
pip install "numpy>=1.26,<2"
python -c "import numpy; print(f'env-numpy1: numpy {numpy.__version__}')"
deactivate
# Environment 2: numpy 2.x
python -m venv env-numpy2
source env-numpy2/bin/activate
pip install "numpy>=2.0,<3"
python -c "import numpy; print(f'env-numpy2: numpy {numpy.__version__}')"
deactivate
Verify that each environment has its own version. Then clean up:
What just happened
You created two fully independent Python installations that share nothing but the interpreter. Installing a new package in one has zero effect on the other. This is what lets a lab machine host 20 projects without any of them breaking each other.
Reflection
Why not install everything into the system Python? What problems would that cause on a shared system like OSC where multiple users and projects coexist?
- Two isolated venvs created with different numpy versions
- Part 2 reflection answered
Part 3: Package Managers Compared¶
The lab uses three package managers. Each has different strengths:
| Feature | pip | conda | uv |
|---|---|---|---|
| Source | PyPI (Python packages) | conda-forge / Anaconda (any language) | PyPI (Python packages) |
| Speed | Moderate | Slow | Very fast (10--100x pip) |
| Resolver | Backtracking (since pip 20.3) | SAT solver | SAT solver |
| Lock files | No built-in (pip freeze is approximate) | environment.yml (not locked) | uv.lock (deterministic) |
| Non-Python packages | No | Yes (CUDA, MKL, compilers) | No |
| Lab usage | Legacy projects, quick installs | CUDA toolkit, complex C dependencies | New projects, daily development |
When to use each¶
- uv — Default choice for new projects. Fast, deterministic, excellent error messages.
- conda — When you need non-Python dependencies (CUDA, MKL, system libraries) that aren't available as pip wheels.
- pip — When a package is only on PyPI and not on conda-forge, or you're working in a codebase that hasn't migrated yet.
For details on how we use these on OSC, see the Environment Management guide and the Python Environment Setup guide.
Hands-on: compare pip and uv speed¶
Install the same set of packages with pip and uv, then compare the time:
Record the real time from each time command.
What you'll probably see
On a recent laptop with warm caches, rough numbers:
| Tool | Time |
|---|---|
pip install | 15–40s |
uv pip install | 1–3s |
The gap is bigger the first time (cold cache) and for larger package sets (e.g., pytorch + its dependencies). uv also produces cleaner error messages when there's a conflict.
Reflection
Conda can install non-Python packages like CUDA libraries. Why is this useful? When would pip or uv alone not be enough?
- Speed comparison run with times recorded
- Part 3 reflection answered
Part 4: Requirements Files¶
A requirements file records exactly which packages (and versions) a project needs, so anyone can recreate the environment.
Two approaches¶
pip freeze — captures everything currently installed, including transitive dependencies:
pip freeze > requirements.txt
# Output includes every package with exact versions:
# certifi==2024.8.30
# charset-normalizer==3.4.0
# idna==3.10
# requests==2.32.3
# urllib3==2.2.3
Hand-curated — list only your direct dependencies with version ranges:
The hand-curated approach is easier to maintain — you only list what you directly use. But it's less reproducible because transitive dependency versions can drift.
Lock files: the best of both worlds¶
A lock file pins every dependency (direct and transitive) to exact versions, generated from your hand-curated requirements:
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#e8f4fd', 'primaryTextColor': '#1a1a1a', 'lineColor': '#555'}}}%%
graph LR
A@{ shape: doc, label: "fa:fa-file-lines requirements.txt<br/>(what you want)" }:::decision -->|"uv pip compile"| B@{ shape: doc, label: "fa:fa-lock requirements.lock<br/>(exact versions)" }:::data
B -->|"uv pip install -r"| C(["fa:fa-check Reproducible<br/>environment"]):::success
classDef decision fill:#fef3c7,stroke:#d97706,color:#1a1a1a,stroke-width:2px
classDef data fill:#d1fae5,stroke:#059669,color:#1a1a1a,stroke-width:2px
classDef success fill:#d1fae5,stroke:#059669,color:#1a1a1a,stroke-width:2px # Create a hand-curated requirements.txt first, then:
uv pip compile requirements.txt -o requirements.lock
uv pip install -r requirements.lock
Hands-on: create and use a requirements file¶
-
Create a virtual environment and install some packages:
-
Generate a requirements file:
-
Test that it works — create a fresh environment and install from the file:
-
Clean up:
-
Captured both a
pip freezeand a hand-curated requirements file - Reproduced an environment from a requirements file in a fresh venv
Part 5: Publish¶
Write a short blog post on your Quarto website explaining one concept from this assignment. Pick the topic that surprised you most or that you think would be most useful to a classmate. Follow the Publishing Guide for the full workflow.
Quick reference for this assignment:
- Suggested categories:
[tooling, python, reflection] - Post angles (pick one):
- Why virtual environments exist — and what specifically goes wrong without them
- How dependency trees work — with a diagram you draw for a package of your choice
- pip vs conda vs uv — when to reach for each, with the time comparison you measured
Your post should include at least one code example or diagram.
Final Deliverables¶
- Screenshot of
pip listfrom Part 0 showing installed packages afterpip install requests - Screenshots showing different numpy versions in two separate venvs (Part 2)
- Terminal output comparing pip vs uv install times (Part 3)
- Contents of your
requirements.txtfrom Part 4 - Three reflection question answers (Parts 1, 2, and 3)
- Blog post URL from Part 5
Troubleshooting¶
| Problem | Cause | Fix |
|---|---|---|
error: externally-managed-environment | System Python is locked down (common on Ubuntu 23.04+, OSC) | Use a virtual environment — python -m venv .venv && source .venv/bin/activate |
pip: command not found after activating venv | venv was created without pip, or activation failed | Recreate with python -m venv --clear .venv or use uv venv |
ResolutionImpossible or version conflict | Two packages need incompatible versions of a shared dependency | Read the error to find the conflicting package. Try relaxing version pins or check if a newer version resolves the conflict |
uv: command not found | uv is not installed | Install with curl -LsSf https://astral.sh/uv/install.sh \| sh then restart your terminal |
Sources¶
- Python Packaging User Guide — PyPA (Python Packaging Authority)
- uv documentation — Astral
- Conda documentation — Anaconda, Inc.