To rapidly address the modeling requests of our collaborators, the simulation codes we develop and use can run on parallel architectures.
The framework is organized as follows.
Methods
Community code development
We centralize every code development through Git platforms such as Bitbucket.org.
Development resources
Shared memory computers (< 1,000,000 core-hours per year)
Field | CPU | GPU | RAM | HDD | Equiv. core-hours per year |
---|---|---|---|---|---|
Office 201, station 1 | 2×8 cores, Intel Xeon E5-2650 v2m @2.6 GHz, TDW 95 W | Nvidia 96 CUDA cores @ 700 MHz, 2GB, Cuda cap. 2.1. | 32 GB | 886 GB + 466 GB | 140,000 + 840,960 |
Office 206, station 1 | 2×6 cores, Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz (TDP 80 W) | Radeon Tahiti PRO HD 7950/8950 OEM / R9 280 | 16 GB | 884 GB | 105,000 |
Office 201, station 2 | 2×12 Intel Xeon E5-2650 v4 @ 2.2 GHz | Nvidia 640 CUDA cores, 4 GB, Capabilities 6.1. | 64 GB | 210,240 + 5,606,400 | |
Luzicka | 2×4 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz | Nvidia ~2304 CUDA cores, 8 GB, CUDA v11.1 | 8 GB | >4 TB | 70,080 + 20,183,040 |
Production resources
- BIATRI-SRV (Intel)
- IT4I.cz/Barbora (Intel): the real-time job queue of Barbora cluster (IT4I.cz) is available here, with the estimation of remaining simulation time.
- IT4I.cz/Karolina (AMD).
- PRACE.eu/Navigator (Intel, Portugal).
Currently reached performances on Octopus
We attempt to optimize the computer time we use. In particular, we prepare numbers of TD-runs using Octopus software.
Here we keep a small note on the performances reached on various computers.
Example of Silicon primitive cell
Machine | Octopus Version | Compiler & Parameters | Time per SCF step | Time per TD step | ||
---|---|---|---|---|---|---|
CPU | GPU | CPU | GPU | |||
IBM Power9 (1 node + 1xGPU 16 GB) | Octopus 10.4 (distrib. binary) | GCC/GFortran | 1539 s | 500 s (single-kgrid, StatePack=no) | 1860 s | 115 s (StatePack=no) |
IBM Power9 (1 node + 4xGPU 16 GB) | Octopus 10.4 (distrib. binary) | GCC/GFortran | – | 36 s (single-kgrid, StatePack=no) |
– | 22 s (single-kgrid, StatePack=no) |
IBM Power9 (1 node + 4xGPU 16 GB) | Octopus 10.4 (distrib. binary) | GCC/GFortran | 113 s (single-kgrid, StatePack=yes) |
– | 22 s (single-kgrid, StatePack=yes) |
|
IBM Power9 (1 node + 4xGPU 16 GB) | Octopus 10.4 (distrib. binary) | GCC/GFortran | 139 s (4x kgrid, StatePack=no) |
– | xx s (4x kgrid, StatePack=no) |
|
MPCDF Draco (4 nodes) | Octopus (version of 2019 09 11) | Intel compilers | – | 2.5 s | – | |
IT4I Salomon (4 nodes) | Octopus (version of 2019 06 10) | Intel compilers | – | 5.0 s | – | |
IT4I Barbora (8 nodes) | Octopus (version of 2019 06 10) | Intel compilers | 1.91 s | – | xx s | – |
PRACE Prometheus (Poland) (4 nodes) | Octopus (version of 2019 10 16) | Intel compilers | – | 5.0 s | – |
Example of silica primitive cell
- dx=0.22 Bohr. k=8³. RAM: 22 GB. Duration per SCF cycle on Barbora: <72 s.