Using GPUs

From Tycho
Jump to navigation Jump to search

Here is a detailed guide on how to leverage the GPUs on the NBI cluster.

Preparation work: making sure that your software is GPU-aware

Before proceeding, it is recommended to have your local installation of (ana-/mini-)conda with newer python version (preferably 3.10+).

Installing CUDA

Normally, you would/should use the system-wide CUDA installation to make sure that it is compatible with the GPUs. In fact, there are environment modules for CUDA (e.g. cuda/11.2; note: you will need to first load the astro module) pre-installed on the system.

Here we take a different route -- we install our own (and a newer version of) CUDA for greater control. Usually you would want to install the latest CUDA that your GPUs support, but as of the time of this writing, torch lacks the support for the latest CUDA version 12.x so we opt for an earlier release (11.8).

To install CUDA via conda, do

conda install cuda -c nvidia/label/cuda-11.8.0

You should check that your installation works by running

nvcc --version

This should match the version that you just installed.

(NOTE: This number can be different from the number reported in nvidia-smi, since the number in nvidia-smi is the latest CUDA that is supported by the driver installed. In other words, you should make sure that the CUDA you installed is 'at most' that version)

Installing torch

Once you have CUDA properly installed, everything else should be a breeze. To install torch with CUDA awareness, simply do

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Test your installation with the following simple code snippet

import torch

torch.zeros(100).cuda()

If there is no error message, that means you have now installed torch with CUDA support successfully.

Installing cupy

Again, if you have CUDA installed, the installation of cupy is very straightforward. Simply run

conda install -c conda-forge cupy cudnn cutensor nccl

Note that conda will intelligently (and hopefully) detect the proper version to installed with your current installation of CUDA.

Test your installation with the following simple code snippet

import cupy

cupy.random.rand(100).device

It should say something like <CUDA Device 0>.

Installing jax

Installation of jax with CUDA is also simple. Run

pip install --upgrade "jax[cuda11_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Test your installation with

import jax

jax.devices()

It should say something like [cuda(id=0)]

Running a job directly on a GPU-equipped headnode

The GPU-equipped headnode/frontend is astro02 (the node is accessible with astro02.hpc.ku.dk). There are physically 3 Nvidia-A30 GPUs. One of them is virutally split into 3 smaller and independent virtual GPUs (in Nvidia's term -- MIG or Multi-Instance GPU), one split into 2 smaller MIGs, and one remains 'unsplit'.

To specify which GPU to use, set the environment variable CUDA_VISIBLE_DEVICES. To see the list of 'compute instances' available, run

nvidia-smi -L

On astro02, you should see something like

GPU 0: NVIDIA A30 (UUID: GPU-654aa619-952d-3f17-01ec-0c050ac8df88)
  MIG 1g.6gb      Device  0: (UUID: MIG-3868837f-57d0-5089-9887-19240a8809b4)
  MIG 1g.6gb      Device  1: (UUID: MIG-d28bcf9f-db13-5ad0-9be2-62d0e25c92a9)
  MIG 1g.6gb      Device  2: (UUID: MIG-e175ec33-0f38-5952-98d5-1c118bd9d398)
  MIG 1g.6gb      Device  3: (UUID: MIG-53cc4525-2ae7-5c11-9680-302d1d4177ba)
GPU 1: NVIDIA A30 (UUID: GPU-cb8c2438-a361-3e30-4ff5-4481d43c9e83)
  MIG 2g.12gb     Device  0: (UUID: MIG-0a768004-2ded-55f6-ac2b-4dd3f696a222)
  MIG 2g.12gb     Device  1: (UUID: MIG-0296d938-ea26-5174-a884-cd3c686bf660)
GPU 2: NVIDIA A30 (UUID: GPU-9bcd54bd-5a72-2e7b-90c8-3e3719d09e5c)
  MIG 4g.24gb     Device  0: (UUID: MIG-a8cb1bd5-6f68-54a1-8e88-ca2fa4ef80c0)

For example, if we want to use the third MIG 1g.6gb instance with the UUID MIG-e175ec33-0f38-5952-98d5-1c118bd9d398, set the environment variable export CUDA_VISIBLE_DEVICES=MIG-e175ec33-0f38-5952-98d5-1c118bd9d398

Then running the same test code for torch and checking with nvidia-smi, we see that

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  0    3   0   0  |              12MiB /  5952MiB  | 14      0 |  1   0    1    0    0 |
|                  |               0MiB /  8191MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    4   0   1  |              12MiB /  5952MiB  | 14      0 |  1   0    1    0    0 |
|                  |               0MiB /  8191MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    5   0   2  |             107MiB /  5952MiB  | 14      0 |  1   0    1    0    0 |
|                  |               2MiB /  8191MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    6   0   3  |              12MiB /  5952MiB  | 14      0 |  1   0    1    0    0 |
|                  |               0MiB /  8191MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    1   0   0  |              25MiB / 11968MiB  | 28      0 |  2   0    2    0    0 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  1    2   0   1  |              25MiB / 11968MiB  | 28      0 |  2   0    2    0    0 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  2    0   0   0  |               1MiB / 24062MiB  | 56      0 |  4   0    4    1    1 |
|                  |               1MiB / 32768MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0    5    0     530009      C   ...nda3/envs/igwn-py310/bin/python3.10       88MiB |
+---------------------------------------------------------------------------------------+

Indeed we are using the desired MIG.

Submitting a job to the GPU partition with slurm

Simply specify the GPU partition, astro2_gpu, and how many ‘generic resources (GRES)’ (in this case, GPU), that you want to use when submitting a job with slurm.

An example command is

srun -p astro2_gpu --gres=gpu:1 nvidia-smi

This should show the GPU (not the virtual one/MIG) that is being assigned to you.

As far as I know, there are 11 Nvidia A100 GPUs in this partition.