Anaconda to Apptainer: Reproducible Python Environments
Learn how to export your local Conda environment into a production-ready Apptainer container that runs identically across laptops, HPC clusters, and cloud systems. This step-by-step guide eliminates broken builds and version conflicts in distributed computing.
Build Reproducible Python Environments with Anaconda to Apptainer — Without Broken HPC Deployments
You’ve spent weeks perfecting a Python environment locally. Dependencies are locked in. Your code runs flawlessly on your laptop. Then you push it to the HPC cluster, and everything breaks.
Missing libraries. Version conflicts. Runtime errors at 3 AM when your job finally reaches the queue. The problem: your local Conda environment doesn’t travel. It’s fragile, system-dependent, and impossible to reproduce across different machines.
Apptainer (formerly Singularity) solves this. By containerizing your Conda environment into a portable .sif file, you get a frozen snapshot of your entire Python stack—OS, libraries, environment variables, everything—that runs identically everywhere.
What This Is
30-second pitch:
Apptainer is a lightweight container runtime designed for HPC environments (unlike Docker, which requires root). This tutorial shows you how to:
- Export your local Conda environment as a reproducible YAML snapshot
- Build an Apptainer container that includes that exact environment
- Bind local data and code folders to avoid copying terabytes of files
- Update containers using sandboxes without rebuilding from scratch
By the end, you’ll have a production-ready .sif file that runs identically on your laptop, university cluster, and cloud HPC systems.
Prerequisites
You’ll need:
- Apptainer 1.0+ (or Singularity 3.8+) — installation guide
- Conda/Miniconda — already installed locally
- Linux system (native or WSL2 on Windows; macOS requires Apptainer VM)
- Sudo access (required for building and sandbox operations)
- ~5 GB disk space for the container image
Check your setup:
apptainer --version
conda --version
Step 1: Create and Populate Your Conda Environment Locally
Start with a clean environment. This becomes your “golden image.”
conda create -n conda-env python=3.12
conda activate conda-env
pip install pandas seaborn scipy matplotlib
Test that imports work:
python -c "import pandas; import seaborn; print('All packages loaded')"
Step 2: Export Your Conda Environment as YAML
This YAML file is the recipe for your container—it captures every package and version.
conda env export -n conda-env > config.yaml
Verify the file was created:
head -20 config.yaml
You should see a list of packages with pinned versions (e.g., pandas=2.0.1).
Step 3: Create the Apptainer Recipe File
Create a file called Singularity.def:
Bootstrap: docker
From: continuumio/miniconda3:latest
%files
config.yaml /opt/config.yaml
%post
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
conda env create -f /opt/config.yaml
conda clean --all -y
%environment
export PATH=/opt/conda/envs/conda-env/bin:$PATH
export CONDA_DEFAULT_ENV=conda-env
%runscript
exec python "$@"
What each section does:
Bootstrap: docker— pulls a base image with Conda pre-installed%files— copies yourconfig.yamlinto the container%post— installs system dependencies and creates the Conda environment%environment— sets PATH so Conda is always active%runscript— makes the container executable with Python by default
Step 4: Build the Apptainer Container
sudo apptainer build conda.sif Singularity.def
This takes 3–10 minutes depending on package count. Once complete, you have conda.sif—your portable container.
⚠️ Note:
sudois required because Apptainer needs root access to create the image. On shared HPC clusters, check with your admin about unprivileged builds.
Core Workflow
Run Python Scripts Inside the Container
Execute any Python script without needing Conda on the host:
apptainer exec conda.sif python script.py
The container automatically activates your Conda environment.
Bind Local Folders (Critical for Data & Code)
If your code or datasets live outside the container, bind them instead of copying:
apptainer exec --bind /local/path/to/data:/container/data conda.sif python analysis.py
Why binding matters:
- Your 2 TB dataset stays on disk; the container just sees it
- Code updates don’t require rebuilding
- Cluster storage mounts seamlessly
The syntax is --bind source:destination, where source is on your host and destination is inside the container.
Set Working Directory Inside Container
If your script expects files in a specific location, use --pwd:
apptainer exec --bind $(pwd):/work --pwd /work conda.sif python script.py
This maps your current directory to /work inside the container and sets it as the working directory.
Create a Sandbox for Testing Updates
A sandbox is a writable directory version of your container—use it to test changes before rebuilding:
sudo apptainer build --sandbox conda-env-sandbox conda.sif
This creates a folder called conda-env-sandbox/ with your entire container unpacked.
Modify the Sandbox
Open a writable shell:
sudo apptainer shell --writable conda-env-sandbox
Inside the container:
conda activate conda-env
pip install markdown
exit
Test your code with the new package without rebuilding the entire container.
Rebuild Container from Updated Sandbox
Once you’ve tested changes, create a new .sif:
sudo apptainer build conda-updated.sif conda-env-sandbox/
Now conda-updated.sif includes the new packages.
Practical Example
Scenario: You have a data analysis script that reads CSV files from a shared cluster directory, generates plots, and saves results.
Local setup:
- Script:
~/projects/analysis/demo.py - Data:
/cluster/shared/datasets/(2 TB) - Output:
~/projects/analysis/results/
Step-by-step:
1. Create and export the Conda environment:
conda create -n data-analysis python=3.10
conda activate data-analysis
pip install pandas matplotlib seaborn scikit-learn
conda env export > config.yaml
2. Build the container:
sudo apptainer build analysis.sif Singularity.def
3. Run on the HPC cluster with bound data:
apptainer exec \
--bind /cluster/shared/datasets:/data \
--bind ~/projects/analysis:/work \
--pwd /work \
analysis.sif python demo.py
What’s happening:
/cluster/shared/datasetsis visible inside the container as/data~/projects/analysis(your code and output) is visible as/work- The working directory is
/work, so relative paths work - Your script reads from
/dataand writes results to/work/results/
The script runs without copying any files. If you update demo.py, just re-run the command—no rebuild needed.
Common Issues & Fixes
“working data not found” error
You forgot to bind the folder or didn’t set --pwd. Use:
apptainer exec --bind $(pwd):/work --pwd /work conda.sif python script.py
Permission denied when building
Add sudo:
sudo apptainer build conda.sif Singularity.def
“conda: command not found” inside container
Verify %environment is in your .def file and includes the PATH export. Rebuild:
sudo apptainer build --force conda.sif Singularity.def
Sandbox changes aren’t reflected in the rebuilt container
Always rebuild after making changes:
sudo apptainer build conda-updated.sif conda-env-sandbox/
Container is too large (>5 GB)
Add cleanup to the %post section of your .def file:
conda clean --all -y
apt-get autoremove -y
apt-get clean
Then rebuild from scratch.
Next Steps
You now have a reproducible, portable container for any Python environment.
What to do next:
- Test on your HPC cluster — copy
conda.sifand verify bound paths work - Version your
.deffile — keep it in Git alongside your code for reproducibility - Automate rebuilds — script the export → build → test pipeline if packages change frequently
- Share with collaborators — send them the
.siffile; they run your exact environment without setup
What’s your biggest pain point with Python environments on HPC clusters right now? Reply in the comments—I’m building follow-ups on GPU support, distributed training, and cluster job submission integration.
What’s your current workflow for shipping Python environments to HPC clusters—and what’s broken about it?
Comments