convergence is composed of one frontend and ten compute nodes:
Computer | Model | Memory | Processor | Cores | GPUs |
---|---|---|---|---|---|
front | DELL PowerEdge R650xs | 125 GB | 2 x Intel Xeon Silver 4310 | 24 cores / 48 threads @ 2.10 GHz | |
node01 | DELL PowerEdge XE8545 | 2 TB | 2 x AMD EPYC 7543 | 64 cores / 128 threads @ 2.80 GHz | 4 x NVIDIA A100 80Go SXM |
node[02-06] | DELL PowerEdge R750xa | 2 TB | 2 x Intel Xeon Gold 6330 | 56 cores / 112 threads @ 2.00 GHz | 4 x NVIDIA A100 80Go PCIe |
node[07-10] | DELL PowerEdge R750xa | 1 TB | 2 x Intel Xeon Gold 6330 | 56 cores / 112 threads @ 2.00 GHz | 4 x NVIDIA A100 80Go PCIe |
On each node, 4 cores (8 threads) and 4 GB of RAM are reserved for the system and slurm.
By default when you reserve a GPU, slurm allocates you 4 cores (8 threads) and 64 GB of RAM.
MIG is used to partition the A100 80 GB GPUs into smaller GPUs. Each compute node presents :
/home (300 TB) is hosted by front (DELL ME5084 disk array - SAS 12 Gb - 28 x HDD 16 TB) and exported to the compute nodes through NFS.
Each compute node has a local storage space mounted in /scratch (1.6 TB on NVME).
Access to front is done through a 10Gb/s ethernet link.
Compute nodes and front are interconnected by a 200Gb/s Infiniband network (Mellanox QM8700).
To access Convergence, you need to establish a ssh connection to the cluster's frontend (front.convergence.lip6.fr).
LIP6's members automatically get access to Convergence.
Users that do not belong to the LIP6 can request an account at convergence@lip6.fr.
You can access compute resources through the slurm resource manager (see https://slurm.schedmd.com/).
using the sinfo command, you can:
root@front:~# sinfo -O "partition:13,available:8,nodelist:18,defaulttime:13,time:13,nodeai:10" PARTITION AVAIL NODELIST DEFAULTTIME TIMELIMIT NODES(A/I) convergence* up node[01-10] 1:00:00 30-00:00:00 1/9Explanation for the above output: display
root@front:~# sinfo -p convergence --Node -O "nodelist:13,features:8,socketcorethread:8,cpusstate:15,memory:8,allocmem:10,gres:60,gresused:60,statelong:20,reason:20" NODELIST AVAIL_FEATURES S:C:T CPUS(A/I/O/T) MEMORY ALLOCMEM GRES GRES_USED STATE REASON node01 intel 2:28:2 0/112/0/112 2048000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node02 intel 2:28:2 0/112/0/112 2048000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node03 intel 2:28:2 0/112/0/112 2048000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node04 intel 2:28:2 0/112/0/112 2048000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node05 intel 2:28:2 0/112/0/112 2048000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node06 intel 2:28:2 0/112/0/112 1024000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node07 intel 2:28:2 0/112/0/112 1024000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node08 intel 2:28:2 0/112/0/112 1024000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node09 intel 2:28:2 0/112/0/112 1024000 0 gpu:a100_7g.80gb:2,gpu:a100_3g.40gb:4 gpu:a100_7g.80gb:0,gpu:a100_3g.40gb:0 idle~ none node10 amd 2:32:2 16/112/0/128 2048000 524288 gpu:a100_7g.80gb:2(S:0),gpu:a100_3g.40gb:4(S:1) gpu:a100_7g.80gb:0(IDX:N/A),gpu:a100_3g.40gb:1(IDX:2) mixed noneExplanation for the above output: display
Using the squeue command, you can list running jobs and get their identifiers.
root@front:~# squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 68 convergen test leroux R 28:06 1 node10Explanation for the above output: there is one job, its identifier is 68, its name is test, it was started by user leroux, it has been running (R) for 28 minutes 06 seconds and is using resources on node10. The different states of a job are described in the man page of squeue.
Using the sacct command, you can get more details about a job.
root@front:~# sacct -j 68 --format="JobID,JobName,User,Account,NodeList,AllocTres%80,Start,End,State,Reason" -X JobID JobName User Account NodeList AllocTRES Start End State Reason ----- ------- ------ ------- -------- --------------------------------------------------------- ------------------- ------- ------- ------ 68 test leroux lip6 node10 billing=16,cpu=16,gres/gpu:a100_3g.40gb=1,mem=512G,node=1 2023-04-19T15:31:12 Unknown RUNNING NoneDétail de la sortie :
You can use the salloc command to get an interactive session. You will get a shell on the frontend from which you will be able to run commands on reserved resources with srun. If you close the shell, the job is terminated.
leroux@front:~$ salloc --nodes=2 --gpus-per-node=a100_3g.40gb:1 --time=60 salloc: Granted job allocation 60 salloc: Waiting for resource configuration salloc: Nodes node[01,10] are ready for job leroux@front:~$ srun nvidia-smi -L GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-6bfd077d-a528-62e5-5ffd-f5ccf9e5a557) MIG 3g.40gb Device 0: (UUID: MIG-a5a1e127-c156-5892-ae71-8518fcd84332) GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-b90dfade-11bc-8d5b-321b-9f6f6284b497) MIG 3g.40gb Device 0: (UUID: MIG-81ca0f5d-30f9-5a88-9f4a-1cec8fd84f6c) leroux@front:~$ srun hostname node10 node01 leroux@front:~$ exit salloc: Relinquishing job allocation 60 salloc: Job allocation 60 has been revoked. leroux@front:~$
You can use salloc's option --x11 to activate graphical display forwarding by slurm (you also need to use ssh's option -X for the connection to the frontend).
You can use salloc's option --no-shell to allocate resources without having to keep a shell opened on the frontend during the duration of your job. Then you can access allocated resources using srun's option --jobid or directly by ssh.
You can connect by ssh to compute nodes on which salloc allocated resources for your job.
The sbatch command allows you to submit a script which will be executed in a non interactive way. You can configure the reservation by adding #SBATCH directives at the beginning of the script.
Example of sbatch script:
leroux@front:~$ cat batch1.sh #!/bin/bash #SBATCH --job-name=exemple #SBATCH --nodes=1 #SBATCH --constraint=amd #SBATCH --cpus-per-task=16 #SBATCH --mem=512G #SBATCH --gpus=a100_3g.40gb:1 #SBATCH --time=5 #SBATCH --mail-type=ALL #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err nvidia-smi -L sleep 300 leroux@front:~$ sbatch batch1.sh Submitted batch job 20 leroux@front:~$ cat exemple-20.out GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-b90dfade-11bc-8d5b-321b-9f6f6284b497) MIG 3g.40gb Device 0: (UUID: MIG-81ca0f5d-30f9-5a88-9f4a-1cec8fd84f6c)
If your script reserves resources on many compute nodes, the script will run on the first allocated node.
slurm defines many environment variables ‘SLURM_*’ that you can use in your scripts:
You can connect by ssh to compute nodes on which sbatch allocated resources for your job.
You can use the --constraint= option of salloc or sbatch to specify additional caracteristics of the compute nodes you want.
This option lets you select compute nodes by the features they have (see sinfo) ouput to list these features.
For example, you can choose a compute node with Intel processors (--constraint=intel) ou AMD processors (--constraint=amd).
You can use the srun command to run commands simultaneously on compute nodes on which salloc or sbatch allocated resources for your job.
In a job, each call to srun is a step.
srun inherits from salloc or sbatch's reservation directives.
leroux@front:~$ cat batch3.sh #!/bin/bash #SBATCH --job-name=exemple #SBATCH --nodes=2 #SBATCH --gpus-per-node=a100_3g.40gb:1 #SBATCH --time=1 #SBATCH --mail-type=ALL #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err srun hostname srun nvidia-smi -L leroux@front:~$ sbatch batch3.sh Submitted batch job 24 leroux@front:~$ cat exemple-24.out node01 node10 GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-6bfd077d-a528-62e5-5ffd-f5ccf9e5a557) MIG 3g.40gb Device 0: (UUID: MIG-a5a1e127-c156-5892-ae71-8518fcd84332) GPU 0: NVIDIA A100-SXM4-80GB (UUID: GPU-b90dfade-11bc-8d5b-321b-9f6f6284b497) MIG 3g.40gb Device 0: (UUID: MIG-81ca0f5d-30f9-5a88-9f4a-1cec8fd84f6c)
A call to srun is blocking. You have to wait for the command to finish on every node before executing the next command.
You can use shell's &
to execute srun in parallel. In which case, you have to tell srun what resources each step will consume, so that slurm can run them in parallel.
#!/bin/bash #SBATCH --nodes=2 #SBATCH --gpus-per-node=a100_3g.40gb:3 #SBATCH --time=5 srun -n2 -c8 --gpus-per-node=a100_3g.40gb:1 bash tache1.sh & srun -n2 -c8 --gpus-per-node=a100_3g.40gb:1 bash tache2.sh & srun -n2 -c8 --gpus-per-node=a100_3g.40gb:1 bash tache3.sh wait
If a job you started on a node (via salloc or sbatch) has resources currently allocated, you can connect to that node directly via ssh. Your session will be restricted to these resources.
Compute nodes are not reachable from internet. To access them, you must first pass through the frontend front:
ssh -J front.convergence.lip6.fr node01.convergence.lip6.fr
Using the scancel command, you can cancel a job:
leroux@front:~# scancel 177
The module command can be used to configure your environment to use specific version of software:
leroux@front:~$ module avail ----------------------------------- /etc/environment-modules/modules --------------------------------------------- cuda/11.0 cuda/11.1 maple/2019.0 maple/2020.0 mathematica/12.1 matlab/R2019b matlab/R2020a python/anaconda3
leroux@front:~$ module load cuda/11.1 python/anaconda3
leroux@front:~$ module list Currently Loaded Modulefiles: 1) cuda/11.1 2) python/anaconda3
leroux@front:~$ module unload cuda
leroux@front:~$ module purge
Thanks to Sorbonne Universités's site licences, maple, mathematica and matlab are available on the cluster via the module command.
You can use the conda command, available by loading module python/anaconda3, to manage your own python environments.
Your shell needs to be initialized before using conda. You can use conda init to permanently modify your .bashrc so that your shell is automatically initialized for conda in interactive sessions. Scripts executed by slurm are not run in an interactive session, so you need to initialize your shell with eval "$(conda shell.bash hook)" (see example for jupyter).
conda's documentation is available on its official web site.
To use jupyter:
leroux@front:~$ conda create -n myenv
leroux@front:~$ conda install -n myenv notebook
#!/bin/bash #SBATCH --job-name=test_jupyter #SBATCH --nodes=1 #SBATCH --gpus-per-node=a100_3g.40gb:1 #SBATCH --time=60 #SBATCH --mail-type=ALL #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err module purge # Environment cleanup module load python/anaconda3 # Loading of anaconda3 module eval "$(conda shell.bash hook)" # Shell initialization to use conda conda activate myenv # Activation your python environment jupyter notebook # Startup of jupyter
cat test_jupyter-226.err ... or http://127.0.0.1:8888/?token=97c4066cee8dcc55cb40b7311bcf1240cb503a6872c88038 ...
ssh -J front.convergence.lip6.fr -L 8888:localhost:8888 node01.convergence.lip6.fr
On Convergence, you can run docker like containers thanks to the pyxis slurm plugin. This plugin uses enroot to run containers.
Example of sbatch script:
#!/bin/bash -x #SBATCH --job-name=test #SBATCH --nodes=1 #SBATCH --gpus=a100_3g.40gb:1 #SBATCH --container-image nvcr.io\#nvidia/pytorch:23.04-py3 #SBATCH --container-mount-home #SBATCH --time=60 #SBATCH --mail-type=ALL #SBATCH --output=%x-%j.out #SBATCH --error=%x-%j.err hostname python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count()); print(torch.cuda.get_device_name())" echo "" > /tmp/jupyter_notebook_config.py # The default jupyter configuration file creates wrong URLs jupyter notebook --config=/tmp/jupyter_notebook_config.py
Script commands are executed inside of the container defined by the --container-image option.
The user's home directory can be mounted inside of the container by using the --container-mount-home option.
The reserved GPU can be use in the container.
You can get a shell inside of the container:
# get the PID of your process leroux@node10:~$ ps aux|grep leroux leroux 3435 0.2 0.0 5784 3268 ? S 14:25 0:00 /bin/bash -x /var/spool/slurmd/job00128/slurm_script leroux 5838 2.8 0.0 808204 105396 ? Sl 14:30 0:01 /usr/bin/python /usr/local/bin/jupyter-notebook root 5961 0.0 0.0 46596 12436 ? Ss 14:30 0:00 sshd: leroux [priv] leroux 6001 0.1 0.0 46596 8960 ? S 14:30 0:00 sshd: leroux@pts/0 leroux 6002 0.0 0.0 18004 5844 pts/0 Ss 14:30 0:00 -bash leroux 6065 0.0 0.0 19160 3612 pts/0 R+ 14:31 0:00 ps aux leroux 6066 0.0 0.0 6608 2260 pts/0 S+ 14:31 0:00 grep --color=auto leroux # Start a shell inside of the container leroux@node10:~$ enroot exec 5838 bash # A simple test using pytorch in the container leroux@node10:/workspace$ python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count()); print(torch.cuda.get_device_name())" True 1 NVIDIA A100-SXM4-80GB MIG 3g.40gb # Exit of the container leroux@node10:/workspace$ exit
Send any requests about Convergence to convergence@lip6.fr.
To get news about Convergence you should subscribe to the convergence-news@listes.lip6.fr mailing list. Non LIP6 users are automatically added to this list when they got an account.