Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
services:computing:hpc [2022/02/11 15:20]
calucci email reports
services:computing:hpc [2022/03/08 15:01]
calucci summary reports
Line 95: Line 95:
   * only CPU and memory energy usage are considered, while energy consumed by other devices (network cards, disk controllers,​ service processors, power supplies) is not accounted for; energy used "​outside"​ the compute nodes is not considered as well (this include network devices, external storage, UPS, HVAC), so even for a CPU-intensive job the "​real"​ energy consumption can easily be twice as much than reported   * only CPU and memory energy usage are considered, while energy consumed by other devices (network cards, disk controllers,​ service processors, power supplies) is not accounted for; energy used "​outside"​ the compute nodes is not considered as well (this include network devices, external storage, UPS, HVAC), so even for a CPU-intensive job the "​real"​ energy consumption can easily be twice as much than reported
   * on the other side, //if your job doesn'​t use all available cores on each allocated node//, energy consumption can be overestimated   * on the other side, //if your job doesn'​t use all available cores on each allocated node//, energy consumption can be overestimated
 +
 +===== Periodic Summary Reports from Slurm =====
 +
 +You can enable the generation of periodic reports on your cluster usage that will be delivered to your email address on a daily, weekly and/or monthly base.
 +
 +Each summary reports includes the number of jobs that completed their lifecycle during the selected interval along with the total amount of CPU*hours consumed and and estimation of total energy consumption;​ the number of jobs in each partition; and the final states of completed jobs (usually one of ''​COMPLETED'',​ ''​TIMEOUT'',​ ''​CANCELLED'',​ ''​FAILED''​ or ''​OUT_OF_MEMORY''​). Optionally a detailed listing of all jobs can be included as an attachment (this will be a Zip-ed CSV file that can be further processed with your software of choice, but it is also human-readable).
 +
 +To enable the reports with the default options (no daily report; weekly report with jobs detail and monthly report delivered to your_username@sissa.it) just create an empty ''​.slurm_report''​ file in your home directory on Ulysses: ​
 +<​code>​
 +touch $HOME/​.slurm_report
 +</​code>​
 +
 +If you need to tune some parameters (e.g. enable daily reports, enable/​disable job details, change mail delivery address), please copy the default configuration file to your home
 +<​code>​
 +cp /​usr/​local/​etc/​slurm_report.ini $HOME/​.slurm_report
 +</​code>​
 +and edit the local copy. If your account has no "​@sissa.it"​ email, it is recommended that you edit the ''​mailto=''​ line.
 +
 +==== How to read the detailed report ====
 +
 +The detailed report, if requested, is attached as a Zip-compressed CSV file. You should be able to open / decompress it on any modern computing platform and the CSV file is both human- and machine-readable. Timestamps are in ISO 8601 format with implicit local time zone YYYY-MM-DDThh:​mm:​ss,​ e.g. 2022-03-04T09:​30:​00 is "half past nine in the morning of March 4th, 2022". Four timestamps are provided for each job: **submit** (when the job was created with sbatch or similar commands), **eligible** (when the job becomes runnable, i.e. there are no conflicting conditions, like dependency on other jobs or exceeded user limits), **start** and **end** (when the job actually begins and ends execution).
  
 ===== Reporting Issues ===== ===== Reporting Issues =====