Storage

HPC Storage Overview

The HPC now uses CephFS for shared storage across all nodes. This provides consistent access to files regardless of which node your job runs on.

Note

SUMMARY:

  • Home Directory = safe + permanent

  • Scratch = big + temporary (30 days)

  • /data/tmp/ = fast + disposable (per job)

  • /vols/bitbucket/$USER/ –> now called /bitbucket/$USER/ (see 7. Additional storage)

  • If it matters –> Home Directory

  • If it’s large –> /scratch/bulk or /scratch/fast

  • If it’s temporary –> /data/tmp/

Storage is divided into three main types:

1. Home Directories (Persistent Storage)

Location:

/ceph/volumes/homes_<group>/<user>/...

Purpose:

  • Your primary working environment

  • Source code, scripts, configurations

  • Small to moderate datasets

Characteristics:

  • Safe for important data

  • Not intended for very large or high-I/O scratch workloads

2. Scratch Storage (Temporary, High-Capacity)

Locations:

/scratch/bulk/$USER
/scratch/fast/$USER

Warning

Files on /scratch/ are automatically deleted after 30 days. Please ensure you move anything you wish to keep to other locations, eg your home directory or project directory.

Purpose:

  • Large datasets

  • Intermediate files

  • Slurm job working directories

Characteristics:

  • High capacity

  • Shared across nodes (CephFS-backed)

  • Files are automatically deleted after 30 days

Guidance:

  • Always copy important results back to your home directory

  • Do not rely on scratch for long-term storage

3. Node-Local Temporary Storage (Ephemeral)

Location:

/data/tmp/$SLURM_JOB_ID (rolling out soon)

Warning

Files on /data/tmp/ are automatically deleted after your Slurm job ends. Please include file retrieval/sync in your Slurm script, to ensure you move anything you wish to keep to other locations, eg your home directory or project directory.

Purpose:

  • High-performance temporary I/O during jobs

Characteristics:

  • Local to each compute node. Not shared.

  • Fast (no network overhead)

  • Deleted automatically when the Slurm job completes

Guidance:

  • Ideal for temporary files during jobs

  • Never store important data here

4. What to Avoid

Do not use:

/data/localhost/not-backed-up/$USER
/data/localhost/not-backed-up/scratch/$USER

These locations:

  • Are not backed up

  • Will be wiped and decommissioned in the coming weeks.

5. Checking storage quota and usage

Q: How much is my storage quota?

$ cephquota

Q: How much storage am I using?

$ cephdu

7. Additional storage

/bitbucket/$USER/

Warning

/bitbucket/$USER/ is not backed up. Please do not use this storage area as a backup.

While previously the use of /vols/bitbucket/ for HPC workloads was discouraged, you can now use /bitbucket/ for HPC workloads. Please only store data here in your own subfolder named for your Department of Statistics username. Any other files may be deleted or moved without further warning.

The default quota is 500 GB per user directory.

8. Storage areas to decommission

Warning

The following storage areas are being wiped and decommissioned. If you wish to keep any of your data, please ensure you move them to another location (see above). If you need any help with this, please contact Stats IT.

* /data/localhost/$USER/
* /data/localhost/not-backed-up/$USER/
* /data/localhost/not-backed-up/scratch/$USER/