Hardware: Difference between revisions
No edit summary |
No edit summary |
||
Line 43: | Line 43: | ||
* 4 RTX A6000 GPUs available on astro04 have 48 GB ram per GPU and very low FP64 performance. They are therefore not well suited for scientific calculations, but provide hardware accelerated remote visualization, and have similar performance for machine learning workloads to the A100 GPUs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/rtx-a6000 . | * 4 RTX A6000 GPUs available on astro04 have 48 GB ram per GPU and very low FP64 performance. They are therefore not well suited for scientific calculations, but provide hardware accelerated remote visualization, and have similar performance for machine learning workloads to the A100 GPUs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/rtx-a6000 . | ||
* 6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/ | * 6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/data-center/h100/ . | ||
===Global Storage=== | ===Global Storage=== |
Latest revision as of 07:03, 14 March 2025
Tycho contains both frontends ("Analysis Hardware") that are accessible from the outside and which can be used for interactive work, such as development and analysis, and compute nodes ("Cluster Hardware") that are only accessible through the SLURM queue systems.
Name | CPUs | Memory | Memory Bandwidth | GPUs | Scratch | Notes |
---|---|---|---|---|---|---|
astro01.hpc.ku.dk | 2 x 24 cores Epyc Rome 7F72 @ 3.2 GHz | 1 TB DDR4-3200 MHz - 21 GB / core | 410 GB/s, 8.5 GB/s/core | 4x A100 | 11 TB | L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s to storage |
astro02.hpc.ku.dk | 1 x 64 cores Epyc Genoa 9554P @ 3.1 GHz | 768 GB DDR5-4800 MHz - 12 GB / core | 461 GB/s, 7.2 GB/s/core | 3x A30 | 28 TB | L2: 1 MB / core, L3: 256 MB, AVX-512, EDR 100 Gbit/s to storage |
astro03.hpc.ku.dk | 2 x 64 cores Epyc Genoa 9554 @ 3.1 GHz | 1,5 TB DDR5-4800 MHz - 12 GB / core | 922 GB/s, 7.2 GB/s/core | None | None | L2: 1 MB / core, L3: 256 MB, AVX-512, HDR 100 Gbit/s to storage |
astro04.hpc.ku.dk | 1 x 48 cores Epyc Genoa 9454P @ 2.75 GHz | 768 GB DDR5-4800 MHz - 12 GB / core | 461 GB/s, 9.6 GB/s/core | 4x RTX A6000 | 42 TB | L2: 1 MB / core, L3: 256 MB, AVX-512, 2 x NDR 200 Gbit/s to storage |
Queue Name | #Nodes | CPUs | Memory | Memory Bandwidth | Notes |
---|---|---|---|---|---|
astro_XX | 16 | 2 x 10 cores Xeon E5-2680v2 @ 2.8GHz | 64 GB DDR3-1866 MHz - 3.2 GB / core | 120 GB/s, 6 GB/s/core | L2: 256 KB / core, L3: 25 MB / socket, AVX2, FDR 56 Gbit/s |
astro2_XX | 70 | 2 x 24 cores Xeon 6248R @ 3.0GHz | 192 GB DDR4-2933 MHz - 4 GB / core | 282 GB/s, 5.9 GB/s/core | L2: 1 MB / core, L3: 35.75 MB / socket, AVX-512, EDR 100 Gbit/s |
astro3_XX | 50 | 2 x 64 cores Epyc Genoa 9554 @ 3.1 GHz | 768 GB DDR5-4800 MHz - 6 GB / core | 922 GB/s, 7.2 GB/s/core | L2: 1 MB / core, L3: 256 MB / socket, AVX-512, 2 x NDR 200 Gbit/s |
astro_gpu | 3 | 1 x 48 cores Epyc Genoa 9454P @ 2.75 GHz | 768 GB DDR5-4800 MHz - 16 GB / core | 461 GB/s, 9.6 GB/s/core | 2x H100 GPUs, L2: 1 MB / core, L3: 256 MB, AVX-512, HDR 100 Gbit/s |
astro2_gpu | 1 | 2 x 16 cores Epyc Rome 7302 @ 3.0 GHz | 1 TB DDR4-3200 MHz - 32 GB / core | 410 GB/s, 12.8 GB/s/core | 4x A100 GPUs, L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s |
Servers
The astro_XX nodes are based on a Dell C6220II shoe-box design with dual 10-core ivy-bridge CPUs. The astro2_XX nodes are Huawei Fusion server pro X6000 with dual 24-core cascade-lake CPUs. The astro3_XX nodes are XFusion servers model 1258H V7 with dual 64-core Genoa CPUs.
GPUs
GPUs are accessible interactively on the astro01, astro02, and astro04 frontends machines and through SLURM in the astro_gpu and astro2_gpu queues. GPUs are small in numbers on Tycho but provide potentially enormous computational value. _Please_ test if your code can efficiently use e.g. a full GPU, or more than one GPU, before running long production jobs on them. In particular machine-learning jobs, or codes that off-load calculations from high-level languages such as Python and Julia may effectively block a full GPU, and sometimes speculatively allocate all GPU memory, without actually making good use of the resources. Therefore, test by either profiling your code, using timers, or simply running on the different machines (astro01, astro04, and the different sized virtual GPUs on astro02) to determine how well the workload scales.
- 8 A100 GPUs are available on the astro01 frontend machine and in the astro2_gpu queue are equipped with 40 GB of memory. They have full FP64 performance and are well suited for large-scale computations as well as machine learning workloads, but they are not the best ML GPUs available at Tycho. You can read more about their specs here https://www.nvidia.com/en-us/data-center/a100 .
- 3 A30 GPUs available on astro02 have 24GB of memory per GPU. They have been split up so that the first GPU is fully available, the second GPU is split in to 2 virtual GPUs (each with 28 SMs), and the third GPU is split in to 4 virtual GPUs (each with 14 SMs). The GPUs on astro02 have full FP64 capabilities, they are smaller than on other machines and very useful for longer running jobs that only require a smaller amount of GPU computing. You can read more about their specs here https://www.nvidia.com/en-us/data-center/products/a30-gpu .
- 4 RTX A6000 GPUs available on astro04 have 48 GB ram per GPU and very low FP64 performance. They are therefore not well suited for scientific calculations, but provide hardware accelerated remote visualization, and have similar performance for machine learning workloads to the A100 GPUs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/rtx-a6000 .
- 6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/data-center/h100/ .
Global Storage
- The home directory (/groups/astro) is a fully backed up Lustre filesystem. We have a shared 6TB quota and individual quotas of 50 GB per user.
- The scratch directory (/lustre/astro) is a ZFS based high performance Lustre filesystem with dedicated hardware for our group. The total space (disregarding the transparent compression) is 1300 TB. The default quota on scratch is 5 TB, but if you need more please contact Troels Haugbølle with your supervisor / mentor / sponsor in CC and explain why and how much.
- Archive are two ZFS filesystems exported as NFS volumes from a storage server connected to the clusters with a 10 Gbit/s network connection. Each archive system can be found under /groups/astro/archive0 and /groups/astro/archive1. These filesystems are old and new users will not get directories on them. They will soon be decommissioned.
Scratch storage
The scratch disks on astro01, astro02, and astro04 are RAID0 volumes consisting of a number of locally mounted NVMe disks. They have slightly higher bandwidth than the global filesystem, but can only be accessed from the specific machine. The scratch disks have several orders of magnitudes higher IOPS compared to the global filesystem, and random access I/O or operations that require opening and closing a lot of files will perform a lot faster on the scratch disks. _Space is limited_. Please clean up after use, and remember there are no backups or redundancy on the scratch disks.
Networks
- External connection: The local HPC center is a Tier-1 CERN node and has a direct dual 400 gbit/s connection to the Danish entrance point in Lyngby of the European GEANT network. In practice we easily reach 100 MB/s for transfer of larger files, with higher speeds possible by doing parallel transfers.
- The backend storage servers for /groups/astro and /lustre/astro are all inter-connected with 100 Gbit/s HDR Infiniband. This switch has uplinks to the different cluster networks.
- All frontend machines have Ethernet or Infiniband adapters to provide optimal bandwidth to the I/O.
- Astro_XX nodes have FDR (56 Gbit/s) infiniband connected to a single switch.
- Astro2_XX nodes have EDR (100 Gbit/s) with a 2:1 blocking factor and 24 nodes per switch (3 uplink switches, 1 core switch).
- Astro3_XX nodes have two NDR-200 (200 Gbit/s) adapters with one adapter per CPU socket. They are connected directly to a single 128-port NDR-200 switch.