Storage¶
HPC Storage Overview¶
The HPC now uses CephFS for shared storage across all nodes. This provides consistent access to files regardless of which node your job runs on.
Note
SUMMARY:
Home Directory = safe + permanent
Scratch = big + temporary (30 days)
/data/tmp/ = fast + disposable (per job)
/vols/bitbucket/$USER/ –> now called /bitbucket/$USER/ (see 7. Additional storage)
If it matters –> Home Directory
If it’s large –> /scratch/bulk or /scratch/fast
If it’s temporary –> /data/tmp/
Storage is divided into three main types:
1. Home Directories (Persistent Storage)¶
Location:
/ceph/volumes/homes_<group>/<user>/...
Purpose:
Your primary working environment
Source code, scripts, configurations
Small to moderate datasets
Characteristics:
Safe for important data
Not intended for very large or high-I/O scratch workloads
2. Scratch Storage (Temporary, High-Capacity)¶
Locations:
/scratch/bulk/$USER
/scratch/fast/$USER
Warning
Files on /scratch/ are automatically deleted after 30 days. Please ensure you move anything you wish to keep to other locations, eg your home directory or project directory.
Purpose:
Large datasets
Intermediate files
Slurm job working directories
Characteristics:
High capacity
Shared across nodes (CephFS-backed)
Files are automatically deleted after 30 days
Guidance:
Always copy important results back to your home directory
Do not rely on scratch for long-term storage
3. Node-Local Temporary Storage (Ephemeral)¶
Location:
/data/tmp/$SLURM_JOB_ID (rolling out soon)
Warning
Files on /data/tmp/ are automatically deleted after your Slurm job ends. Please include file retrieval/sync in your Slurm script, to ensure you move anything you wish to keep to other locations, eg your home directory or project directory.
Purpose:
High-performance temporary I/O during jobs
Characteristics:
Local to each compute node. Not shared.
Fast (no network overhead)
Deleted automatically when the Slurm job completes
Guidance:
Ideal for temporary files during jobs
Never store important data here
4. What to Avoid¶
Do not use:
/data/localhost/not-backed-up/$USER
/data/localhost/not-backed-up/scratch/$USER
These locations:
Are not backed up
Will be wiped and decommissioned in the coming weeks.
5. Checking storage quota and usage¶
Q: How much is my storage quota?
$ cephquota
Q: How much storage am I using?
$ cephdu
6. Recommended Slurm workflow¶
#!/bin/bash
#SBATCH --job-name=
#SBATCH --output=
#SBATCH --time=
.
.
set -euo pipefail
export SCRATCH_BULK="/scratch/bulk/$USER/$SLURM_JOB_ID"
export SCRATCH_FAST="/scratch/fast/$USER/$SLURM_JOB_ID"
export TMPDIR="$SCRATCH_FAST/tmp"
7. Additional storage¶
/bitbucket/$USER/
Warning
/bitbucket/$USER/ is not backed up. Please do not use this storage area as a backup.
While previously the use of /vols/bitbucket/ for HPC workloads was discouraged, you can now use /bitbucket/ for HPC workloads. Please only store data here in your own subfolder named for your Department of Statistics username. Any other files may be deleted or moved without further warning.
The default quota is 500 GB per user directory.
8. Storage areas to decommission¶
Warning
The following storage areas are being wiped and decommissioned. If you wish to keep any of your data, please ensure you move them to another location (see above). If you need any help with this, please contact Stats IT.
* /data/localhost/$USER/
* /data/localhost/not-backed-up/$USER/
* /data/localhost/not-backed-up/scratch/$USER/