Hardware: Difference between revisions

Revision as of 07:03, 14 March 2025

Tycho contains both frontends ("Analysis Hardware") that are accessible from the outside and which can be used for interactive work, such as development and analysis, and compute nodes ("Cluster Hardware") that are only accessible through the SLURM queue systems.

Analysis Hardware
Name	CPUs	Memory	Memory Bandwidth	GPUs	Scratch	Notes
astro01.hpc.ku.dk	2 x 24 cores Epyc Rome 7F72 @ 3.2 GHz	1 TB DDR4-3200 MHz - 21 GB / core	410 GB/s, 8.5 GB/s/core	4x A100	11 TB	L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s to storage
astro02.hpc.ku.dk	1 x 64 cores Epyc Genoa 9554P @ 3.1 GHz	768 GB DDR5-4800 MHz - 12 GB / core	461 GB/s, 7.2 GB/s/core	3x A30	28 TB	L2: 1 MB / core, L3: 256 MB, AVX-512, EDR 100 Gbit/s to storage
astro03.hpc.ku.dk	2 x 64 cores Epyc Genoa 9554 @ 3.1 GHz	1,5 TB DDR5-4800 MHz - 12 GB / core	922 GB/s, 7.2 GB/s/core	None	None	L2: 1 MB / core, L3: 256 MB, AVX-512, HDR 100 Gbit/s to storage
astro04.hpc.ku.dk	1 x 48 cores Epyc Genoa 9454P @ 2.75 GHz	768 GB DDR5-4800 MHz - 12 GB / core	461 GB/s, 9.6 GB/s/core	4x RTX A6000	42 TB	L2: 1 MB / core, L3: 256 MB, AVX-512, 2 x NDR 200 Gbit/s to storage

Cluster Hardware
Queue Name	#Nodes	CPUs	Memory	Memory Bandwidth	Notes
astro_XX	16	2 x 10 cores Xeon E5-2680v2 @ 2.8GHz	64 GB DDR3-1866 MHz - 3.2 GB / core	120 GB/s, 6 GB/s/core	L2: 256 KB / core, L3: 25 MB / socket, AVX2, FDR 56 Gbit/s
astro2_XX	70	2 x 24 cores Xeon 6248R @ 3.0GHz	192 GB DDR4-2933 MHz - 4 GB / core	282 GB/s, 5.9 GB/s/core	L2: 1 MB / core, L3: 35.75 MB / socket, AVX-512, EDR 100 Gbit/s
astro3_XX	50	2 x 64 cores Epyc Genoa 9554 @ 3.1 GHz	768 GB DDR5-4800 MHz - 6 GB / core	922 GB/s, 7.2 GB/s/core	L2: 1 MB / core, L3: 256 MB / socket, AVX-512, 2 x NDR 200 Gbit/s
astro_gpu	3	1 x 48 cores Epyc Genoa 9454P @ 2.75 GHz	768 GB DDR5-4800 MHz - 16 GB / core	461 GB/s, 9.6 GB/s/core	2x H100 GPUs, L2: 1 MB / core, L3: 256 MB, AVX-512, HDR 100 Gbit/s
astro2_gpu	1	2 x 16 cores Epyc Rome 7302 @ 3.0 GHz	1 TB DDR4-3200 MHz - 32 GB / core	410 GB/s, 12.8 GB/s/core	4x A100 GPUs, L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s

Servers

The astro_XX nodes are based on a Dell C6220II shoe-box design with dual 10-core ivy-bridge CPUs. The astro2_XX nodes are Huawei Fusion server pro X6000 with dual 24-core cascade-lake CPUs. The astro3_XX nodes are XFusion servers model 1258H V7 with dual 64-core Genoa CPUs.

GPUs

GPUs are accessible interactively on the astro01, astro02, and astro04 frontends machines and through SLURM in the astro_gpu and astro2_gpu queues. GPUs are small in numbers on Tycho but provide potentially enormous computational value. _Please_ test if your code can efficiently use e.g. a full GPU, or more than one GPU, before running long production jobs on them. In particular machine-learning jobs, or codes that off-load calculations from high-level languages such as Python and Julia may effectively block a full GPU, and sometimes speculatively allocate all GPU memory, without actually making good use of the resources. Therefore, test by either profiling your code, using timers, or simply running on the different machines (astro01, astro04, and the different sized virtual GPUs on astro02) to determine how well the workload scales.

8 A100 GPUs are available on the astro01 frontend machine and in the astro2_gpu queue are equipped with 40 GB of memory. They have full FP64 performance and are well suited for large-scale computations as well as machine learning workloads, but they are not the best ML GPUs available at Tycho. You can read more about their specs here https://www.nvidia.com/en-us/data-center/a100 .

3 A30 GPUs available on astro02 have 24GB of memory per GPU. They have been split up so that the first GPU is fully available, the second GPU is split in to 2 virtual GPUs (each with 28 SMs), and the third GPU is split in to 4 virtual GPUs (each with 14 SMs). The GPUs on astro02 have full FP64 capabilities, they are smaller than on other machines and very useful for longer running jobs that only require a smaller amount of GPU computing. You can read more about their specs here https://www.nvidia.com/en-us/data-center/products/a30-gpu .

4 RTX A6000 GPUs available on astro04 have 48 GB ram per GPU and very low FP64 performance. They are therefore not well suited for scientific calculations, but provide hardware accelerated remote visualization, and have similar performance for machine learning workloads to the A100 GPUs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/rtx-a6000 .

6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/data-center/h100/ .

Global Storage

The home directory (/groups/astro) is a fully backed up Lustre filesystem. We have a shared 6TB quota and individual quotas of 50 GB per user.
The scratch directory (/lustre/astro) is a ZFS based high performance Lustre filesystem with dedicated hardware for our group. The total space (disregarding the transparent compression) is 1300 TB. The default quota on scratch is 5 TB, but if you need more please contact Troels Haugbølle with your supervisor / mentor / sponsor in CC and explain why and how much.
Archive are two ZFS filesystems exported as NFS volumes from a storage server connected to the clusters with a 10 Gbit/s network connection. Each archive system can be found under /groups/astro/archive0 and /groups/astro/archive1. These filesystems are old and new users will not get directories on them. They will soon be decommissioned.

Scratch storage

The scratch disks on astro01, astro02, and astro04 are RAID0 volumes consisting of a number of locally mounted NVMe disks. They have slightly higher bandwidth than the global filesystem, but can only be accessed from the specific machine. The scratch disks have several orders of magnitudes higher IOPS compared to the global filesystem, and random access I/O or operations that require opening and closing a lot of files will perform a lot faster on the scratch disks. _Space is limited_. Please clean up after use, and remember there are no backups or redundancy on the scratch disks.

Networks

External connection: The local HPC center is a Tier-1 CERN node and has a direct dual 400 gbit/s connection to the Danish entrance point in Lyngby of the European GEANT network. In practice we easily reach 100 MB/s for transfer of larger files, with higher speeds possible by doing parallel transfers.
The backend storage servers for /groups/astro and /lustre/astro are all inter-connected with 100 Gbit/s HDR Infiniband. This switch has uplinks to the different cluster networks.
All frontend machines have Ethernet or Infiniband adapters to provide optimal bandwidth to the I/O.
Astro_XX nodes have FDR (56 Gbit/s) infiniband connected to a single switch.
Astro2_XX nodes have EDR (100 Gbit/s) with a 2:1 blocking factor and 24 nodes per switch (3 uplink switches, 1 core switch).
Astro3_XX nodes have two NDR-200 (200 Gbit/s) adapters with one adapter per CPU socket. They are connected directly to a single 128-port NDR-200 switch.

@@ Line 43: / Line 43: @@
 * 4 RTX A6000 GPUs available on astro04 have 48 GB ram per GPU and very low FP64 performance. They are therefore not well suited for scientific calculations, but provide hardware accelerated remote visualization, and have similar performance for machine learning workloads to the A100 GPUs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/rtx-a6000 .
-* 6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/design-visualization/h100 .
+* 6 H100 GPUs available through the astro_gpu queue have 94 GB of memory and are our newest GPUs. They have full FP64 performance and very high machine learning performance and are well-suited for both scientific and machine learning jobs. You can read more about their specs here https://www.nvidia.com/en-us/data-center/h100/ .
 ===Global Storage===

Hardware: Difference between revisions

Revision as of 07:03, 14 March 2025

Contents

Servers

GPUs

Global Storage

Scratch storage

Networks

Navigation menu

Hardware: Difference between revisions

Revision as of 07:03, 14 March 2025

Servers

GPUs

Global Storage

Scratch storage

Networks

Navigation menu

Search