Storage¶
HPC Storage Overview¶
The HPC now uses CephFS for shared storage across all nodes. This provides consistent access to files regardless of which node your job runs on.
Note
SUMMARY:
Home Directory = safe + permanent
Scratch = big + temporary (30 days)
/data/tmp/ = fast + disposable (per job)
/bitbucket/$USER/ –> formerly called /vols/bitbucket/$USER/
/opig-shared/ = for OPIG users only
If it matters –> Home Directory
If it’s large –> /scratch/bulk or /scratch/fast
If it’s temporary –> /data/tmp/
1. Storage Quotas¶
The following storage quotas apply per user:
Storage area |
Location |
Quota |
Notes |
|---|---|---|---|
Home Directory |
/ceph/volumes/<CEPHGROUP>/$USER/<CEPHID>/ |
20GB |
persistent |
Scratch (bulk) |
/scratch/bulk/ |
500GB |
temporary |
Scratch (fast) |
/scratch/fast/ |
100GB |
temporary |
Bitbucket |
/bitbucket/ |
500GB |
persistent |
OPIG Projects |
/opig-shared/ |
persistent (OPIG users only) |
2. Home Directories (Persistent Storage)¶
Location:
/ceph/volumes/homes_<group>/<user>/...
Purpose:
Your primary working environment
Source code, scripts, configurations
Small to moderate datasets
Characteristics:
Safe for important data
Shared across HPC nodes
Backed by CephFS
Not intended for high-I/O scratch workloads
3. Scratch Storage (Temporary, High-Capacity)¶
Locations:
/scratch/bulk/$USER
/scratch/fast/$USER
Inside Slurm jobs, job-specific scratch directories are automatically created for you.
Example:
/scratch/fast/$USER/slurm-jobs/$SLURM_JOB_ID
/scratch/bulk/$USER/slurm-jobs/$SLURM_JOB_ID
Warning
Files on /scratch/ are automatically deleted after 30 days.
Please ensure you move anything you wish to keep back to persistent storage.
Purpose:
Large datasets
Intermediate files
Slurm job working directories
Package caches and temporary software environments
Characteristics:
High capacity
Shared across nodes (CephFS-backed)
/scratch/fast uses SSD-backed storage
/scratch/bulk uses HDD-backed storage
Automatically wiped after 30 days
Guidance:
Use
$SCRATCH_FASTfor high-I/O temporary workloadsUse
$SCRATCH_BULKfor larger temporary datasetsAlways copy important results back to persistent storage
4. Node-Local Temporary Storage (Ephemeral)¶
Location:
/data/tmp/$SLURM_JOB_ID
Warning
Files on /data/tmp/ are automatically deleted after your Slurm job ends.
Please ensure you copy anything you wish to keep back to persistent storage.
Purpose:
High-performance temporary I/O during jobs
Node-local scratch space
Characteristics:
Local to the compute node
Not shared between nodes
No network overhead
Automatically deleted when the job finishes
Guidance:
Ideal for temporary files during jobs
Never store important data here
5. What to Avoid¶
Do not use:
/data/localhost/not-backed-up/$USER
/data/localhost/not-backed-up/scratch/$USER
These locations:
Are not backed up
Are being decommissioned
May be wiped without notice
6. Checking Storage Quota and Usage¶
Check your quota:
cephquota
Check your storage usage:
cephdu
7. Recommended Slurm Workflow¶
For best performance and reliability:
Keep source code, scripts, and small persistent files in
$HOMEUse
$SCRATCH_FASTfor temporary high-performance job dataUse
$SCRATCH_BULKfor larger temporary datasetsUse
$TMPDIRfor temporary files generated during jobsCopy important results back to persistent storage before jobs finish
The following environment variables are automatically available inside Slurm jobs:
Variable |
Purpose |
|---|---|
|
Fast SSD-backed temporary scratch space |
|
Large-capacity temporary scratch space |
|
Per-job temporary directory inside |
Example Slurm workflow:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=16G
set -euo pipefail
echo "Running on $(hostname)"
echo "SCRATCH_FAST=$SCRATCH_FAST"
echo "SCRATCH_BULK=$SCRATCH_BULK"
echo "TMPDIR=$TMPDIR"
# Copy input data into fast scratch
rsync -av input.dat "$SCRATCH_FAST/"
# Move into scratch working directory
cd "$SCRATCH_FAST"
# Run computation
python3 ~/myproject/run_analysis.py
# Copy results back to persistent storage
mkdir -p /bitbucket/$USER/results/myjob
rsync -av output/ /bitbucket/$USER/results/myjob/
Warning
Scratch storage is temporary.
Files under /scratch are automatically deleted after 30 days.
Avoid using cp -r for large data movement. Prefer rsync for better reliability and performance.
8. Additional Storage¶
/bitbucket/$USER/
Warning
/bitbucket/$USER/ is not backed up. Please do not use this area as your only copy of important data.
You may use /bitbucket/$USER/ for:
Larger persistent datasets
Persistent Conda environments
Research project storage
The default quota is 500 GB per user directory.
9. Storage Areas Being Decommissioned¶
Warning
The following storage areas are being decommissioned.
Please move any data you wish to keep to another location.
/data/localhost/$USER/
/data/localhost/not-backed-up/$USER/
/data/localhost/not-backed-up/scratch/$USER/
If you need help migrating data, please contact Stats IT.