Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
services:computing:hpc [2022/05/12 11:40]
calucci [Periodic Summary Reports from Slurm] enabled for new accounts
services:computing:hpc [2024/03/06 13:30] (current)
calucci hwperf
Line 33: Line 33:
  
   * **''​regular1''​** (old nodes) and **''​regular2''​** (new nodes): max 16 nodes, max 12h   * **''​regular1''​** (old nodes) and **''​regular2''​** (new nodes): max 16 nodes, max 12h
-  * **''​wide1''​** and **''​wide2''​**:​ max 32 nodes, max 8h +  * **''​wide1''​** and **''​wide2''​**:​ max 32 nodes, max 8h, max 2 concurrently running jobs per user 
-  * **''​long1''​** and **''​long2''​**:​ max 8 nodes, max 48h+  * **''​long1''​** and **''​long2''​**:​ max 8 nodes, max 48h, max 6 concurrently running jobs per user
   * **''​gpu1''​** and **''​gpu2''​**:​ max 4 nodes, max 12h   * **''​gpu1''​** and **''​gpu2''​**:​ max 4 nodes, max 12h
-  * **''​power9''​**:​ max nodes, max 24h+  * **''​power9''​**:​ max nodes, max 24h
  
 <note tip>​Please note that hyperthreading is enabled on all nodes (it was disabled on old Ulysses). If you **do not** want to use hyperthreading,​ the ''​%%--hint=nomultithread%%''​ options to srun/sbatch will help. <note tip>​Please note that hyperthreading is enabled on all nodes (it was disabled on old Ulysses). If you **do not** want to use hyperthreading,​ the ''​%%--hint=nomultithread%%''​ options to srun/sbatch will help.
Line 45: Line 45:
 Job scheduling is fair share-based,​ so the scheduling priority of your jobs depends on the waiting time in the queue AND on the amount of resources consumed by your other jobs. If you have urgent need to start a **single** job ASAP (e.g. for debugging), you can use the ''​fastlane''​ QoS that will give your job a substantial priority boost (to prevent abuse, only one job per user can use fastlane at a time, and you will "​pay"​ for the priority boost with a lower priority for your subsequent jobs). Job scheduling is fair share-based,​ so the scheduling priority of your jobs depends on the waiting time in the queue AND on the amount of resources consumed by your other jobs. If you have urgent need to start a **single** job ASAP (e.g. for debugging), you can use the ''​fastlane''​ QoS that will give your job a substantial priority boost (to prevent abuse, only one job per user can use fastlane at a time, and you will "​pay"​ for the priority boost with a lower priority for your subsequent jobs).
  
-You //should// always use the ''​%%--mem%%'' ​or ''​%%--mem-per-cpu%%'' ​slurm options ​to specify ​the amount of memory needed by your job. This is especially important if your jobs doesn'​t use all available CPUs on a node (40 threads on IBM nodes, 64 on HP) and failing to do so will negatively impact the scheduling performance.+You //should// always use the ''​%%--mem%%'' ​slurm option to specify the amount of memory needed by your job; ''​%%--mem-per-cpu%%'' ​is also possible, but not recommended due to the scheduler configuration. This is especially important if your jobs doesn'​t use all available CPUs on a node (40 threads on IBM nodes, 64 on HP) and failing to do so will negatively impact the scheduling performance. 
 + 
 +<note warning>​Please note that ''​%%--mem=0%%''​ (i.e. "all available memory"​) is **not** recommended since the amount of memory actually available on each node may vary (e.g. in case of hardware failures).</​note>​
  
 <note tip> <note tip>
Line 52: Line 54:
  
  
-====== Simplest possible job ======+===== Simplest possible job =====
 This is a single-core job with default time and memory limits (1 hour and 0.5GB) This is a single-core job with default time and memory limits (1 hour and 0.5GB)
  
Line 72: Line 74:
  
 <note warning>​Please note that MPI jobs are only supported if they allocate all available core/​threads on each node (so 20c/40t on *1 partitions and 32c/64t on *2 partitions. In this context, //not supported// means that jobs using fewer cores/​threads than available may or may not work, depending on how cores //not// allocated to your job are used.</​note>​ <note warning>​Please note that MPI jobs are only supported if they allocate all available core/​threads on each node (so 20c/40t on *1 partitions and 32c/64t on *2 partitions. In this context, //not supported// means that jobs using fewer cores/​threads than available may or may not work, depending on how cores //not// allocated to your job are used.</​note>​
 +
 +==== Access to hardware-based performance counters ====
 +
 +Access to hardware-based performance counters is disabled by default for security reasons. It can be enabled on request, only for node-exclusive jobs (i.e. for allocations where a single job is allowed to run on each node), use ''​sbatch -C hwperf --exclusive ...''​
 +
 ===== Filesystem Usage and Backup Policy ===== ===== Filesystem Usage and Backup Policy =====
  
Line 86: Line 93:
 Daily backups are taken of ''/​home'',​ while no backup is available for ''/​scratch''​. If you need to recover some deleted or damaged file from a backup set, please write to [[helpdesk-hpc@sissa.it]]. Daily backups are kept for one week, a weekly backup is kept for one month, and monthly backups are kept for one year. Daily backups are taken of ''/​home'',​ while no backup is available for ''/​scratch''​. If you need to recover some deleted or damaged file from a backup set, please write to [[helpdesk-hpc@sissa.it]]. Daily backups are kept for one week, a weekly backup is kept for one month, and monthly backups are kept for one year.
  
-Due to their inherent volatility, some directories can be excluded from the backup set. At this time, the list of excluded directories includes only one item, namely ​''/​home/​$USER/​.cache'' ​+Due to their inherent volatility, some directories can be excluded from the backup set. At this time, the list of excluded directories includes only ''/​home/​$USER/​.cache''​ and ''/​home/​$USER/​.singularity/​cache''​
  
 ===== Job E-Mail ===== ===== Job E-Mail =====