Hardware

From Tycho
Jump to navigation Jump to search

Tycho contains both frontends ("Analysis Hardware") that are accessible from the outside and which can be used for interactive work, such as development and analysis, and compute nodes ("Cluster Hardware") that are only accessible through the SLURM queue systems.

Analysis Hardware
Name CPUs Memory Memory Bandwidth GPUs Scratch Notes
astro01.hpc.ku.dk 2 x 24 cores Epyc Rome 7F72 @ 3.2 GHz 1 TB DDR4-3200 MHz - 21 GB / core 410 GB/s, 8.5 GB/s/core 4x A100  11 TB L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s to storage
astro02.hpc.ku.dk 1 x 64 cores Epyc Genoa 9554P @ 3.1 GHz 768 GB DDR5-4800 MHz - 12 GB / core 461 GB/s, 7.2 GB/s/core 3x A30  28 TB L2: 1 MB / core, L3: 256 MB, AVX-512, EDR 100 Gbit/s to storage
astro06.hpc.ku.dk 2 x 14 cores Broadwell E5-2680 v4 @ 2.40GHz 512 GB DDR4-2400 MHz - 20 GB / core L2: 512 KB / core, L3: 35 MB / socket, AVX2, QDR 40 Gbit/s to storage
astro07.hpc.ku.dk 2 x 14 cores Broadwell E5-2680 v4 @ 2.40GHz 512 GB DDR4-2400 MHz - 20 GB / core   L2: 512 KB / core, L3: 35 MB / socket, AVX2, QDR 40 Gbit/s to storage
Cluster Hardware
Queue Name #Nodes CPUs Memory Memory Bandwidth Notes
astro_XX 16 2 x 10 cores Xeon E5-2680v2 @ 2.8GHz 64 GB DDR3-1866 MHz - 3.2 GB / core 120 GB/s, 6 GB/s/core L2: 256 KB / core, L3: 25 MB / socket, AVX2, FDR 56 Gbit/s
astro2_XX 70 2 x 24 cores Xeon 6248R @ 3.0GHz 192 GB DDR4-2933 MHz - 4 GB / core 282 GB/s, 5.9 GB/s/core L2: 1 MB / core, L3: 35.75 MB / socket, AVX-512, EDR 100 Gbit/s
astro3_XX 50 2 x 64 cores Epyc Genoa 9554 @ 3.1 GHz 768 GB DDR5-4800 MHz - 6 GB / core 922 GB/s, 7.2 GB/s/core L2: 1 MB / core, L3: 256 MB / socket, AVX-512, 2 x NDR 200 Gbit/s
astro2_gpu 1 2 x 16 cores Epyc Rome 7302 @ 3.0 GHz 1 TB DDR4-3200 MHz - 32 GB / core 410 GB/s, 12.8 GB/s/core 4x A100 GPUs, L2: 512 KB / core, L3: 192 MB / socket, AVX2, EDR 100 Gbit/s

Servers

The astro_XX nodes are based on a Dell C6220II shoe-box design with dual 10-core ivy-bridge CPUs. The astro2_XX nodes are Huawei Fusion server pro X6000 with dual 24-core cascade-lake CPUs. The astro3_XX nodes are XFusion servers model 1258H V7 with dual 64-core Genoa CPUs.

Global Storage

  • The home directory (/groups/astro) is a fully backed up Lustre filesystem. We have a shared 6TB quota and individual quotas of 50 GB per user.
  • The scratch directory (/lustre/astro) is a ZFS based high performance Lustre filesystem with dedicated hardware for our group. The total space (disregarding the transparent compression) is 1300 TB. The default quota on scratch is 5 TB, but if you need more please contact Troels Haugbølle with your supervisor / mentor / sponsor in CC and explain why and how much.
  • Archive are two ZFS filesystems exported as NFS volumes from a storage server connected to the clusters with a 10 Gbit/s network connection. Each archive system can be found under /groups/astro/archive0 and /groups/astro/archive1. These filesystems are old and new users will not get directories on them. They will soon be decommissioned.

Scratch storage

The scratch disks on astro01 and astro02 are RAID0 volumes consisting of a number of locally mounted NVMe disks. They have slightly higher bandwidth than the global filesystem, but can only be accessed from the specific machine. The scratch disks have several orders of magnitudes higher IOPS compared to the global filesystem, and random access I/O or operations that require opening and closing a lot of files will perform a lot faster on the scratch disks. _Space is limited_. Please clean up after use, and remember there are no backups or redundancy on the scratch disks.

Networks

  • External connection: The local HPC center is a Tier-1 CERN node and has a direct dual 400 gbit/s connection to the Danish entrance point in Lyngby of the European GEANT network. In practice we easily reach 100 MB/s for transfer of larger files, with higher speeds possible by doing parallel transfers.
  • The backend storage servers for /groups/astro and /lustre/astro are all inter-connected with 100 Gbit/s HDR Infiniband. This switch has uplinks to the different cluster networks.
  • All frontend machines have Infiniband adapters to provide optimal bandwidth to the I/O.
  • Astro_XX nodes have FDR (56 Gbit/s) infiniband connected to a single switch.
  • Astro2_XX nodes have EDR (100 Gbit/s) with a 2:1 blocking factor and 24 nodes per switch (3 uplink switches, 1 core switch).
  • Astro3_XX nodes have two NDR-200 (200 Gbit/s) adapters with one adapter per CPU socket. They are connected directly to a single 128-port NDR-200 switch.