GPU jobs

GPU cards available

Several GPU servers (lapp-wngpu00x.in2p3.fr) are available with various NVIDIA GPU cards :

Server number	NVIDIA cards per server	Profile	CPUs (vprocs)	RAM (GiB)	Comment
004	1 x Tesla V100	Training	24 (48)	192
005	1 x Quadro P6000	Default	16 (32)	64
006	4 x Tesla T4	Inference	32 (64)	192
007	3 x Ampere A100 40GB	Training	16 (32)	384
008	3 x Ampere A100 40GB	Training	16 (32)	256
009	1 x Ampere A100 40GB	Training	16 (32)	256	Restricted access to LISTIC laboratory users
010	3 x Ampere A100 80GB	Training	32 (64)	512
011	3 x Ampere A100 80GB	Training	32 (64)	512
012	1 x Ampere A100 80GB	Training	32 (64)	512	Restricted access to LAPTh laboratory users

GPU specifications

	P6000	T4	V100	A100
`TFlops (float)`	12.6	8.1	14	19.5
`Memory (GB)`	24	15	16	40
`Nb Cuda Cores`	3840	2560	5120	6912
`Clock rate (GHz)`	1.645	1.590	1.380	1.410
`Generation`	6.1	7.5	7.0	8.0

A100 refers to two GPU types A100 40G and A100 80G. Users targeting the use of A100 80G should explicitely request the "a100 80gb" GPU type (see below).

Dynamic GPU allocation using partitionable slots

GPUs are considered as resources to jobs and managed by HTCondor. To ensure dynamic resource allocation, partitionable slots are used on lapp-wngpuxx machines.

On each GPU worker, the resources reserved for GPU jobs are assigned to one partitionable slot from which dynamic slots are created at claim time and assigned the requested resources. When dynamic slots are unclaimed, their resources are merged back into the parent partitionable slot.

GPU partitionable slot : 100 % of GPU resources plus a certain amount of CPUs and Memory are reserved to GPU jobs.

See below how the GPU partitionable slot is defined on each machine and pay attention to the fact that requiring the total number of cpus or memory defined by the GPU partitionable slot will block the possibility to schedule any other GPU job.

Server	GPU(s)	Partitionable slot condiguration
004	1 x Tesla V100	cpus=8, gpu=100%, memory=64 GiB
005	1 x Quadro P6000	cpus=4, gpu=100%, memory=16 GiB
006	4 x Tesla T4	cpus=16, gpu=100%, memory=64 GiB
007	3 x Ampere A100 40GB	cpus=24, gpus=100%, memory=240 GiB
008	3 x Ampere A100 40GB	cpus=24, gpus=100%, memory=240 GiB
009	1 x Ampere A100 40GB	cpus=16, gpus=100%,memory=128 GiB (restrict. LISTIC)
010	3 x Ampere A100 80GB	cpus=24, gpus=100%, memory=576 GiB
011	3 x Ampere A100 80GB	cpus=24, gpus=100%, memory=576 GiB
012	1 x Ampere A100 80GB	cpus=32, gpus=100% ,memory=256 GiB (restrict. LAPTh)

Description file

To use one or more GPU cards in a job, the following line needs to be added to the description file:

# specify the number of GPU cards to use in the same server
# replace X with 1 to the max possible number of cards in the desired server
request_gpus = X

Then there are additional lines if you want to specify more precisely the kind of job you want to run:

# for a specific GPU type, replace XXX with "v100", "p6000", "t4", "a100" or "a100 80gb"
+wantGpuType = "XXX"

or

# for a specific GPU profile, replace XXX with Inference or Training
+wantGpuUsage = "XXX"

If none of these options is defined, the default usage will be applied (execution on p6000).

If both options are specified, priority will be given to +WantGpuType.

What you should be aware of

As explained above, the combination below must be used with caution.

   request_gpus = 1
   +wantGpuType = "a100"
   request_cpus =24

As a matter of fact, this combination will prevent access to two other GPU cards available on the multi-gpu server being used by the job. This concers the multi-GPUs servers : lapp-wngpu007/8 and lapp-wngpu011/12.

REMEMBER ! When HTcondor is out of CPU or Memory in partitionable slot, it is out of slots for GPU jobs.

Specific option for reserved servers

Please contact us via support-must@lapp.in2p3.fr if you need to reserve a specific server for your jobs.

Once it is configured, please add the following line to the description file in addition to the type or profile:

# for a specific GPU server, replace XXX with 001 to 012 according to your needs
requirements = machine == "lapp-wngpuXXX.in2p3.fr"

Executable file

Once a job matches to a given slot, it needs to know which GPU(s) to use, if multiple are present.

GPU UIDs that the job is permitted to use are published into the job's environment with variable _CONDOR_Assignedgpus.

HTCondor now has a Wrapper that automatically sets the CUDA_VISIBLE_DEVICES environment variable with the card(s) affected by HTCondor. Your job will therefore turn on the correct card.

If you launch an interactive job, the wrapper is not used, so you must manually set the CUDA_VISIBLE_DEVICES environment variable:

export CUDA_VISIBLE_DEVICES=${_CONDOR_Assignedgpus}

if you want deactivate the wrapper, please specify in your description file:

+noWrapper = "True"

You will then have to remember to position CUDA_VISIBLE_DEVICES manually in your job.

NVIDIA Multi-instance GPU (MIG) support

Starting with NVIDIA Ampere GPU, MIG is and advanced capability proposed by NVIDIA which has been configured and tested on the GPU007 server. It enables multiple GPU Instances to run in parallel on a single, physical NVIDIA Ampere GPU. and allows users are able to see and schedule jobs on new virtual GPU Instances as if they were physical GPUs.

MIG is currently not activated but do not hesitate to contact support-must AT lapp.in2p3.fr if you are interested.

MIG allows multiple vGPUs (and thereby VMs) to run in parallel on a single A100, while preserving the isolation guarantees that vGPU provides. For more information on GPU partitioning using vGPU and MIG, refer to the NVIDIA technical brief.

When configured with MIG, the 3 A100 cards of lapp-wngpu007.in2p3.fr may be splitted in 2 cards with 3 Graphics units and 20 Gb memory. Then up to 6 jobs may be actived on the GPU007 server at the same time. Other MIG configurations are possible. To use a MIG GPU, user must specify in the description file:

+WantGpuType = "3g.20gb"

Choosing a MIG GpuType does not allow to request more than one GPU

Using TensorBoard display with conda

This requires that the log files from the Tensorflow computation to be stored under the MUST shared file system /mustfs/LAPP-DATA or /mustfs/MUST-DATA, or using simlinks such as /uds_data/// or /lapp_data/...

Then, please use 2 SSH terminal windows connected to the same UI.

In the first terminal, run:

# .bashrc includes conda initialize commands added after miniconda installation
source .bashrc

# <your_env> includes tensorboard
conda activate <your_env>

cd <path_to_tensorboars_run_logs>

tensorboard —logdir=.

or launch only particular experiments:

tensorboard —logdir=exp1_folder

To select several experiments to display in tensorboard, create a new folder with simlinks to the desired experiments :

mkdir my_experiment_runs
cd my_experiment_runs
ln -s <path_to_exp1_folder> exp1_folder
ln -s <path_to_exp2_folder> exp2_folder
cd ..

and run:

tensorboard -logdir=my_experiment_runs

If everything works properly, tensorboard is launched and terminal will show:

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.6.0 at http://localhost:6008/ (PRESS CTRL+C to quit)

Check the which is returned. Port 6006 is the default but if it is already used, an other value will be returned as in this example.

Then, in the second terminal window, create a ssh connection forwarding the of the remote server into a port you like to view TensorBoard:

ssh -X -Y -tt -L 6006:localhost:<tensorboard_port> <your_login>@<UI>.in2p3.fr

Finally, open the localhost url (http://localhost:6006/) in a browser.

Use of the NVIDIA HPC SDK

The NVIDIA HPC Software Development Kit version 21.9 is available on the latest GPUs. It includes compilers, libraries and software tools, supporting GPU acceleration with standard C++ and Fortran and providing performance profiling and debugging tools. More information is availale at https://developer.nvidia.com/hpc-sdk.

Using the nvc++ compiler, it is possible to execute C++17 code (for compute capabilities ≥ 6.0, working with G++-9 or newer) on GPUs. The use of the nvcc compiler requires compute capabilities ≥ 3.5.

You can refer to the course of Pierre Aubert (CTA/LAPP) (in French). This course proposes a simple example of Hadamard product C++ code with a submission script that can be used for quick start.

Quick Start with nvc++ compiler is available.

Global monitoring

MUST GPU monitoring page allows to check with GPU cards are available on all GPU servers.

If you wish to get access to this page, please send your request via support-must@lapp.in2p3.fr.