Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
services:computing:hpc [2021/05/24 15:06]
calucci filesystems & backups
services:computing:hpc [2022/02/11 15:20]
calucci email reports
Line 7: Line 7:
  
  
-Ulysses v2 can be accessed ​via the login nodes at ''​frontend1.hpc.sissa.it''​ or ''​frontend2.hpc.sissa.it''​ from SISSA network or from SISSA [[:​vpn|VPN]]. More access options might be made available in due time.+SSH access to Ulysses v2 is provided ​via the login nodes at ''​frontend1.hpc.sissa.it''​ or ''​frontend2.hpc.sissa.it''​ from SISSA network or from SISSA [[:​vpn|VPN]]. More access options might be made available in due time.
  
 ===== Hardware and Software ===== ===== Hardware and Software =====
Line 22: Line 22:
  
 The software tree is the same you have on Linux workstations,​ with the same [[services:​modules|Lmod modules]] system (with the only exception of desktop-oriented software packages). The software tree is the same you have on Linux workstations,​ with the same [[services:​modules|Lmod modules]] system (with the only exception of desktop-oriented software packages).
 +
 +A small number of POWER9-based nodes are also available (2 sockets, 16 cores, 4 threads per core; 256GB RAM) with 2 or 4 Tesla V100. Please note that you cannot run x86 code on POWER9. For an interactive shell on a P9 machine, please type ''​p9login''​ on frontend[12].
  
 ===== Queue System =====  ===== Queue System ===== 
Line 34: Line 36:
   * **''​long1''​** and **''​long2''​**:​ max 8 nodes, max 48h   * **''​long1''​** and **''​long2''​**:​ max 8 nodes, max 48h
   * **''​gpu1''​** and **''​gpu2''​**:​ max 4 nodes, max 12h   * **''​gpu1''​** and **''​gpu2''​**:​ max 4 nodes, max 12h
 +  * **''​power9''​**:​ max 2 nodes, max 24h
  
 <note tip>​Please note that hyperthreading is enabled on all nodes (it was disabled on old Ulysses). If you **do not** want to use hyperthreading,​ the ''​%%--hint=nomultithread%%''​ options to srun/sbatch will help. <note tip>​Please note that hyperthreading is enabled on all nodes (it was disabled on old Ulysses). If you **do not** want to use hyperthreading,​ the ''​%%--hint=nomultithread%%''​ options to srun/sbatch will help.
Line 68: Line 71:
 </​code>​ </​code>​
  
 +<note warning>​Please note that MPI jobs are only supported if they allocate all available core/​threads on each node (so 20c/40t on *1 partitions and 32c/64t on *2 partitions. In this context, //not supported// means that jobs using fewer cores/​threads than available may or may not work, depending on how cores //not// allocated to your job are used.</​note>​
 ===== Filesystem Usage and Backup Policy ===== ===== Filesystem Usage and Backup Policy =====
  
Line 83: Line 87:
  
 Due to their inherent volatility, some directories can be excluded from the backup set. At this time, the list of excluded directories includes only one item, namely ''/​home/​$USER/​.cache'' ​ Due to their inherent volatility, some directories can be excluded from the backup set. At this time, the list of excluded directories includes only one item, namely ''/​home/​$USER/​.cache'' ​
 +
 +===== Job E-Mail =====
 +You can enable e-mail notifications at various stages of each job life with the ''​--mail-type=TYPE''​ option where ''​TYPE''​ can be a comma-separated list such as ''​BEGIN,​END,​FAIL''​ (more details are available in ''​man sbatch''​). Notification recipient is by default your SISSA e-mail address, but you can select a different address with ''​--mail-user''​. **End-job** notification includes a summary of consumed resources (CPU time and memory) as absolute values and as a percentage of requested resources. Please note that memory usage is sampled at 30 seconds intervals, so if your job is terminated by an out-of-memory condition arising from a very large failed allocation, the reported value can be grossly underestimated.
 +==== Energy Accounting ====
 +An experimental energy accounting system has been enabled on Ulysses, and energy usage estimates are reported in end-job notification. This is intended as a very rough estimate of the energy impact your job has, but is **not** accurate enough to be used for proper cost/​energy/​environmental accounting. Known limits of the energy accounting system in use include:
 +  * very small values are completely unreliable (and are not included at all in the end-job notification,​ so in case of very short or "​mostly idle" job you will find no value at all)
 +  * only CPU and memory energy usage are considered, while energy consumed by other devices (network cards, disk controllers,​ service processors, power supplies) is not accounted for; energy used "​outside"​ the compute nodes is not considered as well (this include network devices, external storage, UPS, HVAC), so even for a CPU-intensive job the "​real"​ energy consumption can easily be twice as much than reported
 +  * on the other side, //if your job doesn'​t use all available cores on each allocated node//, energy consumption can be overestimated
  
 ===== Reporting Issues ===== ===== Reporting Issues =====
Line 88: Line 100:
 When reporting issues with Ulysses, please keep to the following guidelines: When reporting issues with Ulysses, please keep to the following guidelines:
  
-  * write to [[helpdesk-hpc@sissa.it]],​ not to personal email addresses: this way your enquiry ​will be seen by more than one person+  * write to [[helpdesk-hpc@sissa.it]],​ not to personal email addresses: this way your request ​will be seen by more than one person
   * please use a clear and descriptive subject for your message: "​missing library libwhatever.so.12 from package whatever-libs"​ is OK, "​missing software"​ is less useful, "​Ulysses issues"​ is definitely not useful   * please use a clear and descriptive subject for your message: "​missing library libwhatever.so.12 from package whatever-libs"​ is OK, "​missing software"​ is less useful, "​Ulysses issues"​ is definitely not useful
   * please open one ticket for each issue; **do not** reply to old, closed tickets for unrelated issues   * please open one ticket for each issue; **do not** reply to old, closed tickets for unrelated issues