Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
services:computing:hpc [2022/02/11 15:20]
calucci email reports
services:computing:hpc [2022/05/12 11:40]
calucci [Periodic Summary Reports from Slurm] enabled for new accounts
Line 95: Line 95:
   * only CPU and memory energy usage are considered, while energy consumed by other devices (network cards, disk controllers,​ service processors, power supplies) is not accounted for; energy used "​outside"​ the compute nodes is not considered as well (this include network devices, external storage, UPS, HVAC), so even for a CPU-intensive job the "​real"​ energy consumption can easily be twice as much than reported   * only CPU and memory energy usage are considered, while energy consumed by other devices (network cards, disk controllers,​ service processors, power supplies) is not accounted for; energy used "​outside"​ the compute nodes is not considered as well (this include network devices, external storage, UPS, HVAC), so even for a CPU-intensive job the "​real"​ energy consumption can easily be twice as much than reported
   * on the other side, //if your job doesn'​t use all available cores on each allocated node//, energy consumption can be overestimated   * on the other side, //if your job doesn'​t use all available cores on each allocated node//, energy consumption can be overestimated
 +
 +===== Periodic Summary Reports from Slurm =====
 +
 +You can enable the generation of periodic reports on your cluster usage that will be delivered to your email address on a daily, weekly and/or monthly base.
 +
 +Each summary reports includes the number of jobs that completed their lifecycle during the selected interval along with the total amount of CPU*hours consumed and and estimation of total energy consumption;​ the number of jobs in each partition; and the final states of completed jobs (usually one of ''​COMPLETED'',​ ''​TIMEOUT'',​ ''​CANCELLED'',​ ''​FAILED''​ or ''​OUT_OF_MEMORY''​). Optionally a detailed listing of all jobs can be included as an attachment (this will be a Zip-ed CSV file that can be further processed with your software of choice, but it is also human-readable).
 +
 +To enable the reports with the default options (no daily report; weekly report with jobs detail and monthly report delivered to your_username@sissa.it) just create an empty ''​.slurm_report''​ file in your home directory on Ulysses: ​
 +<​code>​
 +touch $HOME/​.slurm_report
 +</​code>​
 +
 +If you need to tune some parameters (e.g. enable daily reports, enable/​disable job details, change mail delivery address), please copy the default configuration file to your home
 +<​code>​
 +cp /​usr/​local/​etc/​slurm_report.ini $HOME/​.slurm_report
 +</​code>​
 +and edit the local copy. If your account has no "​@sissa.it"​ email, it is recommended that you edit the ''​mailto=''​ line.
 +
 +<note tip>​Since 2022-05-12, Slurm reports are enabled for all new accounts; if you want to disable the report, just delete the config file ''​$HOME/​.slurm_report''​ </​note>​
 +
 +==== How to read the detailed report ====
 +
 +The detailed report, if requested, is attached as a Zip-compressed CSV file. You should be able to open / decompress it on any modern computing platform and the CSV file is both human- and machine-readable. Timestamps are in ISO 8601 format with implicit local time zone YYYY-MM-DDThh:​mm:​ss,​ e.g. 2022-03-04T09:​30:​00 is "half past nine in the morning of March 4th, 2022". Four timestamps are provided for each job: **submit** (when the job was created with sbatch or similar commands), **eligible** (when the job becomes runnable, i.e. there are no conflicting conditions, like dependency on other jobs or exceeded user limits), **start** and **end** (when the job actually begins and ends execution).
  
 ===== Reporting Issues ===== ===== Reporting Issues =====