################################################### HPC Software Modules (Python, PyTorch, Julia) ################################################### ************ Overview ************ The Department of Statistics HPC provides centrally managed software via a shared ``/apps`` filesystem using *Environment Modules*. This allows you to: * Load specific software versions on demand * Switch between versions easily * Use consistent, supported environments across all compute nodes ************************** Available Software ************************** You can see available modules with: .. code-block:: bash module avail Example output (simplified): .. code-block:: text /apps/modulefiles/Core: conda/miniforge3 julia/1.11.3 python/3.10.20 python/3.11.15 python/3.12.13 pytorch/cu124-py310 pytorch/cu124-py311 aiml/py311 Basic Usage ================ Before using modules, initialise the module system: .. code-block:: bash source /etc/profile.d/modules.sh Then: .. code-block:: bash module purge module use /apps/modulefiles/Core module load Examples ---------- Python ^^^^^^^^^^ .. code-block:: bash module purge module use /apps/modulefiles/Core module load python/3.11.15 python3 --version PyTorch (GPU) ^^^^^^^^^^^^^^^^^ .. code-block:: bash module purge module use /apps/modulefiles/Core module load pytorch/cu124-py311 python -c "import torch; print(torch.cuda.is_available())" Julia ^^^^^^^^^ .. code-block:: bash module purge module use /apps/modulefiles/Core module load julia/1.11.3 julia --version AI/ML Environment ^^^^^^^^^^^^^^^^^^^ A preconfigured environment with common scientific and machine learning libraries: .. code-block:: bash module purge module use /apps/modulefiles/Core module load aiml/py311 python -c "import numpy, pandas, sklearn, torch; print('ok')" Using Modules in Slurm Jobs ============================== In Slurm sbatch jobs, the module system is not always automatically available. You **must initialise it explicitly**. Example Slurm script: .. code-block:: bash #!/bin/bash #SBATCH --job-name=test_python #SBATCH --partition=standard-gpu #SBATCH --clusters=srf_gpu_01 #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=2 #SBATCH --mem=4G #SBATCH --time=00:05:00 #SBATCH --output=test_python_%j.out set -euo pipefail source /etc/profile.d/modules.sh module purge module use /apps/modulefiles/Core module load python/3.10.20 echo "Python executable:" which python3 echo "Python version:" python3 --version Common Issue: ``module: command not found`` ------------------------------------------------- **If you see:** .. code-block:: text module: command not found This means the module system has not been initialised. **Fix by adding:** .. code-block:: bash source /etc/profile.d/modules.sh to your script. Conda / Miniforge --------------------- A shared Conda installation is available: .. code-block:: bash module load conda/miniforge3 You can then create your own environments: .. code-block:: bash # Create a directory in /scratch/fast for envs mkdir -p $SCRATCH_FAST/$USER/conda-envs # Create the environment mamba create -y -p $SCRATCH_FAST/$USER/conda-envs/mytorch python=3.11 # Activate it source /apps/conda/miniforge3/bin/activate $SCRATCH_FAST/$USER/conda-envs/mytorch # Install packages mamba install -y pytorch torchvision torchaudio pytorch-cuda=12.4 \ -c pytorch -c nvidia or: pip install .. note:: First run can be slow because: * packages download * environments build After that, it's fast. Avoid $HOME for large envs (~/.conda/envs). Use /scratch instead: .. code-block:: bash export CONDA_PKGS_DIRS=$SCRATCH_FAST/$USER/conda-pkgs This keeps package cache off $HOME. .. warning:: * Do not modify shared environments under ``/apps/conda/envs`` * Create your own environments in ``$HOME`` or ``/scratch`` * /scratch/ is not permanent, and is wiped every 30 days. Users will need to recreate environments if needed. Best Practices ----------------- * Always start with: .. code-block:: bash module purge * Always load modules explicitly in scripts * Use shared modules for stable, supported environments * Use personal Conda environments for experimentation Summary ---------- The module system provides: * Consistent software across the cluster * Easy version switching * Preconfigured GPU and AI environments If unsure, start with: .. code-block:: bash module avail and load the version you need.