Storage

HPC Storage Overview

The HPC now uses CephFS for shared storage across all nodes. This provides consistent access to files regardless of which node your job runs on.

Note

SUMMARY:

  • Home Directory = safe + permanent

  • Scratch = big + temporary (30 days)

  • /data/tmp/ = fast + disposable (per job)

  • /bitbucket/$USER/ –> formerly called /vols/bitbucket/$USER/

  • /opig-shared/ = for OPIG users only

  • If it matters –> Home Directory

  • If it’s large –> /scratch/bulk or /scratch/fast

  • If it’s temporary –> /data/tmp/

1. Storage Quotas

The following storage quotas apply per user:

Storage area

Location

Quota

Notes

Home Directory

/ceph/volumes/<CEPHGROUP>/$USER/<CEPHID>/

20GB

persistent

Scratch (bulk)

/scratch/bulk/

500GB

temporary

Scratch (fast)

/scratch/fast/

100GB

temporary

Bitbucket

/bitbucket/

500GB

persistent

OPIG Projects

/opig-shared/

persistent (OPIG users only)

2. Home Directories (Persistent Storage)

Location:

/ceph/volumes/homes_<group>/<user>/...

Purpose:

  • Your primary working environment

  • Source code, scripts, configurations

  • Small to moderate datasets

Characteristics:

  • Safe for important data

  • Shared across HPC nodes

  • Backed by CephFS

  • Not intended for high-I/O scratch workloads

3. Scratch Storage (Temporary, High-Capacity)

Locations:

/scratch/bulk/$USER
/scratch/fast/$USER

Inside Slurm jobs, job-specific scratch directories are automatically created for you.

Example:

/scratch/fast/$USER/slurm-jobs/$SLURM_JOB_ID
/scratch/bulk/$USER/slurm-jobs/$SLURM_JOB_ID

Warning

Files on /scratch/ are automatically deleted after 30 days.

Please ensure you move anything you wish to keep back to persistent storage.

Purpose:

  • Large datasets

  • Intermediate files

  • Slurm job working directories

  • Package caches and temporary software environments

Characteristics:

  • High capacity

  • Shared across nodes (CephFS-backed)

  • /scratch/fast uses SSD-backed storage

  • /scratch/bulk uses HDD-backed storage

  • Automatically wiped after 30 days

Guidance:

  • Use $SCRATCH_FAST for high-I/O temporary workloads

  • Use $SCRATCH_BULK for larger temporary datasets

  • Always copy important results back to persistent storage

4. Node-Local Temporary Storage (Ephemeral)

Location:

/data/tmp/$SLURM_JOB_ID

Warning

Files on /data/tmp/ are automatically deleted after your Slurm job ends.

Please ensure you copy anything you wish to keep back to persistent storage.

Purpose:

  • High-performance temporary I/O during jobs

  • Node-local scratch space

Characteristics:

  • Local to the compute node

  • Not shared between nodes

  • No network overhead

  • Automatically deleted when the job finishes

Guidance:

  • Ideal for temporary files during jobs

  • Never store important data here

5. What to Avoid

Do not use:

/data/localhost/not-backed-up/$USER
/data/localhost/not-backed-up/scratch/$USER

These locations:

  • Are not backed up

  • Are being decommissioned

  • May be wiped without notice

6. Checking Storage Quota and Usage

Check your quota:

cephquota

Check your storage usage:

cephdu

8. Additional Storage

/bitbucket/$USER/

Warning

/bitbucket/$USER/ is not backed up. Please do not use this area as your only copy of important data.

You may use /bitbucket/$USER/ for:

  • Larger persistent datasets

  • Persistent Conda environments

  • Research project storage

The default quota is 500 GB per user directory.

9. Storage Areas Being Decommissioned

Warning

The following storage areas are being decommissioned.

Please move any data you wish to keep to another location.

/data/localhost/$USER/
/data/localhost/not-backed-up/$USER/
/data/localhost/not-backed-up/scratch/$USER/

If you need help migrating data, please contact Stats IT.