The Ulysses cluster v2 is available for scientific computation to all SISSA users. If you have an active SISSA account, please write to helpdesk-hpc@sissa.it in order to have it enabled on Ulysses.
SSH access to Ulysses v2 is provided via the login nodes at frontend1.hpc.sissa.it
or frontend2.hpc.sissa.it
from SISSA network or from SISSA VPN. More access options might be made available in due time.
Available compute nodes include:
All nodes are connected to an Infiniband QDR fabric.
The software tree is the same you have on Linux workstations, with the same Lmod modules system (with the only exception of desktop-oriented software packages).
A small number of POWER9-based nodes are also available (2 sockets, 16 cores, 4 threads per core; 256GB RAM) with 2 or 4 Tesla V100. Please note that you cannot run x86 code on POWER9. For an interactive shell on a P9 machine, please type p9login
on frontend[12].
The queue system is now SLURM (https://slurm.schedmd.com/documentation.html), so if you were used to TORQUE on old Ulysses you will need to somewhat modify your job scripts.
Available partitions (or “queues” in TORQUE old-speak) include
regular1
(old nodes) and regular2
(new nodes): max 16 nodes, max 12hwide1
and wide2
: max 32 nodes, max 8h, max 2 concurrently running jobs per userlong1
and long2
: max 8 nodes, max 48h, max 6 concurrently running jobs per usergpu1
and gpu2
: max 4 nodes, max 12hpower9
: max 4 nodes, max 24h--hint=nomultithread
options to srun/sbatch will help.
Job scheduling is fair share-based, so the scheduling priority of your jobs depends on the waiting time in the queue AND on the amount of resources consumed by your other jobs. If you have urgent need to start a single job ASAP (e.g. for debugging), you can use the fastlane
QoS that will give your job a substantial priority boost (to prevent abuse, only one job per user can use fastlane at a time, and you will “pay” for the priority boost with a lower priority for your subsequent jobs).
You should always use the --mem
slurm option to specify the amount of memory needed by your job; --mem-per-cpu
is also possible, but not recommended due to the scheduler configuration. This is especially important if your jobs doesn't use all available CPUs on a node (40 threads on IBM nodes, 64 on HP) and failing to do so will negatively impact the scheduling performance.
--mem=0
(i.e. “all available memory”) is not recommended since the amount of memory actually available on each node may vary (e.g. in case of hardware failures).
#SBATCH --ntasks=...
it is recommended that you explicitly request a number of nodes and tasks per node (usually, all tasks that can fit in a given node) for best performance. Otherwise, your job can end up “spread” on more nodes than necessary, while sharing resources with other unrelated jobs on each node. E.g. on regular1
, -N2 -n80
will allocate all threads on 2 nodes, while -n80
can spread them on as many as 40 different nodes.
This is a single-core job with default time and memory limits (1 hour and 0.5GB)
$ cat myscript.sh #!/bin/bash # #SBATCH -N1 #SBATCH -n1 echo "Hello, World!" $ sbatch -p regular1 myscript.sh Submitted batch job 730384 $ cat slurm-730384.out Hello, World!
Access to hardware-based performance counters is disabled by default for security reasons. It can be enabled on request, only for node-exclusive jobs (i.e. for allocations where a single job is allowed to run on each node), use sbatch -C hwperf –exclusive …
/home
and /scratch
are both general-purpose filesystems, they are based on the same hardware and provide the same performance level. When you first login on Ulysses, /home/$USER
comes pre-populated with a small number of files that provide some reasonable configuration defaults. At the same time, /scratch/$USER
is created for you where you have write permission.
Default quotas are 200GB on /home
and 5TB on /scratch
; short-term users (e.g. accounts created for workshops, summer schools and other events, that usually expire in a matter of weeks) are usually given smaller quotas in agreement with workshop organizers. On special and motivated request a larger quota can be granted: please write to helpdesk-hpc@sissa.it, Cc: your supervisor (if applicable) with your request; please note that the storage resource is limited, and not every request can be granted.
quota
command is available and will give you a summary of your filesystem usage.ls -s …
” will report the actual allocated space for your files instead of (or along side with, depending on the command line) their apparent size.
Daily backups are taken of /home
, while no backup is available for /scratch
. If you need to recover some deleted or damaged file from a backup set, please write to helpdesk-hpc@sissa.it. Daily backups are kept for one week, a weekly backup is kept for one month, and monthly backups are kept for one year.
Due to their inherent volatility, some directories can be excluded from the backup set. At this time, the list of excluded directories includes only /home/$USER/.cache
and /home/$USER/.singularity/cache
You can enable e-mail notifications at various stages of each job life with the –mail-type=TYPE
option where TYPE
can be a comma-separated list such as BEGIN,END,FAIL
(more details are available in man sbatch
). Notification recipient is by default your SISSA e-mail address, but you can select a different address with –mail-user
. End-job notification includes a summary of consumed resources (CPU time and memory) as absolute values and as a percentage of requested resources. Please note that memory usage is sampled at 30 seconds intervals, so if your job is terminated by an out-of-memory condition arising from a very large failed allocation, the reported value can be grossly underestimated.
An experimental energy accounting system has been enabled on Ulysses, and energy usage estimates are reported in end-job notification. This is intended as a very rough estimate of the energy impact your job has, but is not accurate enough to be used for proper cost/energy/environmental accounting. Known limits of the energy accounting system in use include:
You can enable the generation of periodic reports on your cluster usage that will be delivered to your email address on a daily, weekly and/or monthly base.
Each summary reports includes the number of jobs that completed their lifecycle during the selected interval along with the total amount of CPU*hours consumed and and estimation of total energy consumption; the number of jobs in each partition; and the final states of completed jobs (usually one of COMPLETED
, TIMEOUT
, CANCELLED
, FAILED
or OUT_OF_MEMORY
). Optionally a detailed listing of all jobs can be included as an attachment (this will be a Zip-ed CSV file that can be further processed with your software of choice, but it is also human-readable).
To enable the reports with the default options (no daily report; weekly report with jobs detail and monthly report delivered to your_username@sissa.it) just create an empty .slurm_report
file in your home directory on Ulysses:
touch $HOME/.slurm_report
If you need to tune some parameters (e.g. enable daily reports, enable/disable job details, change mail delivery address), please copy the default configuration file to your home
cp /usr/local/etc/slurm_report.ini $HOME/.slurm_report
and edit the local copy. If your account has no “@sissa.it” email, it is recommended that you edit the mailto=
line.
$HOME/.slurm_report
The detailed report, if requested, is attached as a Zip-compressed CSV file. You should be able to open / decompress it on any modern computing platform and the CSV file is both human- and machine-readable. Timestamps are in ISO 8601 format with implicit local time zone YYYY-MM-DDThh:mm:ss, e.g. 2022-03-04T09:30:00 is “half past nine in the morning of March 4th, 2022”. Four timestamps are provided for each job: submit (when the job was created with sbatch or similar commands), eligible (when the job becomes runnable, i.e. there are no conflicting conditions, like dependency on other jobs or exceeded user limits), start and end (when the job actually begins and ends execution).
When reporting issues with Ulysses, please keep to the following guidelines:
.bashrc
or .bash_profile
include anything but the system defaults, please state clearly so/opt/contrib
and help in the creation of a suitable module