POWER10

POWER10
General Info
Launched	2020
Designed by	IBM, OpenPower partners
Common manufacturer(s)	Samsung;
Performance
Max. CPU clock rate	+3.5 GHz to +4 GHz
Cache
L1 cache	48+32 KB per core
L2 cache	2 MB per core
L3 cache	120 MB per chip
Architecture and classification
Min. feature size	7 nm
Microarchitecture	P10
Instruction set	Power ISA (Power ISA v.3.1)
Physical specifications
Cores	15 SMT8 cores;
Package(s)	OLGA SCM and DCM;
Socket(s)	1-16;
History
Predecessor	POWER9
Successor	POWER11

POWER10 is a superscalar, multithreading, symmetric multiprocessors based on the Power ISA announced in August 2020 at the Hot Chips conference. The processor is designed with 16 cores, but with only 15 cores available due to yield issues. The POWER10-based processors are being manufactured by Samsung using a 7 nm process with 18 layers of metal and 18 billion transistors. The silicon die is 602 mm² large.[1][2][3][4]

The main features of POWER10 are performance per watt, better memory and I/O architecture as well as a focus on artificial intelligence (AI) workloads.[5] Performance per watt is addressed mainly by Samsung's 7 nm fabrication process. Better I/O and memory is handled my the PowerAXON facilities, handling communications with other chips and systems, the Open Memory Interface (OMI) memory technology scaling from core caches through RAM and all they way to 2 PB of unified clustered memory space shared across multiple cluster nodes and support for PCIe 5. Technologies making AI loads performing better stems from many new features to the SIMD capacity and enabling new datatypes like bfloat16 and INT4.

Systems with POWER10 are intended to reach customers in the fourth quarter or 2021.

Design

Each POWER10 core has doubled up on most functional units compared to its predecessor POWER9. The core is eight-way multithreaded (SMT8) and has 48 kB instruction and 32 kB data L1 caches, a 2 MB large L2 cache and a very large translation lookaside buffer (TLB) with 4096 entries.[3] Latency cycles to the different cache stages and TLB has been reduced significantly. Each core has eight execution slices each with one FPU, ALU, branch predictor, load–store unit and SIMD-engine, able to be fed 128 (64+64) bit instructions from the new prefix/fuse instructions of the Power ISA v.3.1. Each execution slice can handle 20 instructions each, backed up by a shared 512 entry Instruction table, and fed to 128 entry wide (64 single threaded) load queue and 80 entry (40 single threaded) wide store queue. Better branch prediction features have doubled the accuracy. A core have four matrix math assist (MMA) engines, for better handling of SIMD code, especially for matrix multiplication instructions where AI inference workloads have a 20-fold performance increase.[6]

The whole processor have two "hemispheres" with eight cores, sharing a 64 MB L3 cache for a total of 16 cores and 128 MB L3 caches. Due to yield issues, at least one core is always disabled, reducing L3 cache by 8 MB to a usable total of 15 cores and 120 MB L3 cache. Each chip also have eight crypto accelerators offloading common algorithms such as AES and SHA-3.

Increased clock gating and reworked microarchitecture at every stage, together with the fuse/prefix instructions enabling more work with fewer work units, and smarter cache with lower memory latencies and effective address tagging reducing cache misses, enables the POWER10 core consume half the power as POWER9. Combined with the improvements in the compute facilities by up to 30% makes the whole processor perform 2.6× better per watt than its predecessor. And in the case of mounting two cores on the same module, up to 3 times as fast in the same power budget.

As the cores can act like eight logical processors the 15 core processor looks like 120 cores to the operating system. On a dual chip module, that becomes 240 simultaneous threads per socket.

I/O

The chip have completely reworked memory and I/O architectures. The Open Memory Interface (OMI) enables extremely low latency hand high bandwidth RAM. Using serial memory communications to off chip controllers reduces signaling lanes to and from the chip, increases the bandwidth] and makes the processor agnostic towards what technology is in the memory end, making the system flexible and future proofed.[4]

The RAM can be anything from DDR3 through DDR5 to GDDR and HBM or persistant storage memory, all depending on what's practical for the application.

DDR4 - Support for up to 4 TB RAM, 410 GB/s, 10 ns latency
GDDR6 - Up to 800 GB/s
Persistant storage - Up to 2 PB

POWER10 enables encrypting of data with no performance penalty at every stage from RAM, across accelerators and cluster nodes to data at rest.

POWER10 comes with PowerAXON facility enabling chip to chip, system to system and OpenCAPI bus for accelerators, I/O and other high performance cache coherent peripherals. It manages the communications between nodes in a 16x socket SCM cluster or a 4x socket DCM cluster. It also manages the memory semantics for clustering of systems enabling load/store access from the core up to 2 PB of RAM on the entire POWER10 cluster. IBM calls this feature Memory Inception.

Both OMI and PowerAXON can handle 1 TB/s communications off the chip.

Also in POWER10 is PCIe 5. The SCM has 32x and the DCM has 64x PCIe 5 lanes. IBM and Nvidia agreed that including NVLink in POWER10 would be somewhat redundant since PCIe 5 is fast enough for attaching many GPUs so it's not included.[3] Support for NVLink on chip was previously an unique selling point for POWER8 and POWER9.

Modules

The POWER10 comes in two plastic land grid array packages, one single chip module (SCM) and one dual chip module (DCM).

SCM — 4+ GHz, up to 15 SMT8 cores. Can be clustered up to 16 sockets. x32 PCIe 5 lanes.
DCM — 3.5+ GHz, up to 30 SMT8 cores. Can be clustered up to four sockets. x64 PCIe 5 lanes. The DCM is in the same thermal range as previous offerings.

Operating system support

Linux, version 5.9[7]
PowerVM with nested KVM
AIX[8]
IBM i[8]

gollark: Sorry if I have trouble understanding you, because you're making little sense.

gollark: It was implied by your statement `i said ender modems were not an option so u ddnt point me to cc:t`.

gollark: HTTP isn't even a CC:T specific thing!

gollark: ...

gollark: And be accessible over some sort of socket or HTTP API.

References

Dr. Cutress, Ian (2020-08-17). "Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm". AnandTech.
Quach, Katyanna (2020-08-17). "IBM takes Power10 processors down to 7nm with Samsung, due to ship by end of 2021". The Register.
Schilling, Andreas (2020-08-17). "IBM Power10 offers 30 cores with SMT8, PCIe 5.0 and DDR5". Hardware LUXX (in German).
Kennedy, Patrick (2020-08-17). "IBM POWER10 Searching for the Holy Grail of Compute". ServeTheHome.
"IBM Reveals Next-Generation IBM POWER10 Processor". IBM. 2020-08-17.
Russell, John (2020-08-17). "IBM Debuts Power10; Touts New Memory Scheme, Security, and Inferencing". HPCwire.
Larabel, Michael (2020-08-09). "Linux 5.9 Brings More IBM POWER10 Support, New/Faster SCV System Call ABI". Phoronix.
Prickett Morgan, Timothy (2019-08-06). "Talking High Bandwidth with IBM's POWER10 Architect". The Next Platform.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[anandtech-liveblog-1] Dr. Cutress, Ian (2020-08-17). "Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm". AnandTech.

[theregister_HC32-2] Quach, Katyanna (2020-08-17). "IBM takes Power10 processors down to 7nm with Samsung, due to ship by end of 2021". The Register.

[hardwareluxx-3] Schilling, Andreas (2020-08-17). "IBM Power10 offers 30 cores with SMT8, PCIe 5.0 and DDR5". Hardware LUXX (in German).

[servethehome-4] Kennedy, Patrick (2020-08-17). "IBM POWER10 Searching for the Holy Grail of Compute". ServeTheHome.

[ibm-press-01-5] "IBM Reveals Next-Generation IBM POWER10 Processor". IBM. 2020-08-17.

[hpcwire-6] Russell, John (2020-08-17). "IBM Debuts Power10; Touts New Memory Scheme, Security, and Inferencing". HPCwire.

[phoronix-linux59-7] Larabel, Michael (2020-08-09). "Linux 5.9 Brings More IBM POWER10 Support, New/Faster SCV System Call ABI". Phoronix.

[thenextplatform-interview-8] Prickett Morgan, Timothy (2019-08-06). "Talking High Bandwidth with IBM's POWER10 Architect". The Next Platform.

POWER, PowerPC, and Power ISA architectures
NXP (formerly Freescale and Motorola)
PowerPC e series (2006) e200 e300 e500 e600 e5500 e6500 Qor series (2008) QorIQ Qorivva
IBM
POWER series (1990) POWER1 POWER2 POWER3 POWER4 POWER5 POWER6 POWER7 POWER8 POWER9 POWER10 PowerPC series (1992) 6xx 4xx 7xx 74xx 970 A2 RS64 (1996) RAD series (1997) RAD6000 RAD750 RAD5500
IBM/Nintendo
Gekko Broadway Espresso
Other
Titan PWRficient Cell Xenon X704
Related links
OpenPOWER Foundation AIM alliance RISC Blue Gene Power.org PAPR PReP CHRP AltiVec
Cancelled in gray, historic in italic

POWER10

Design

I/O

Modules

Operating system support

See also

References