Nvidia DGX

Nvidia DGX is a line of Nvidia produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications.

DGX-1

DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards[1] with HBM 2 memory, connected by an NVLink mesh network.[2]

The product line is intended to bridge the gap between GPUs and AI accelerators in that the device has specific features specializing it for deep learning workloads.[3] The initial Pascal based DGX-1 delivered 170 teraflops of half precision processing,[4] while the Volta-based upgrade increased this to 960 teraflops.[5]

DGX-2

The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses 16 32GB V100 (second generation) cards in a single unit. This increases performance of up to 2 Petaflops with 512GB of shared memory for tackling larger problems and uses NVSwitch to speed up internal communication.

Additionally, there is a higher performance version of the DGX-2, the DGX-2H with a notable difference being the replacement of the Dual Intel Xeon Platinum 8168's @ 2.7 GHz with Dual Intel Xeon Platinum 8174's @ 3.1 GHz[6]

DGX A100

Announced and released on May 14, 2020 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators.[7] Also included is 15TB of PCIe gen 4 NVMe storage,[8] two 64-core AMD Rome 7742 CPUs, 1 TB of RAM, and Mellanox-powered HDR InfiniBand interconnect. The initial price for the DGX A100 was $199,000.[7]

Accelerators

Comparison of accelerators used in DGX:[7]

Accelerator
A100
V100
P100

Architecture	FP32 CUDA Cores	Boost Clock	Memory Clock	Memory Bus Width	Memory Bandwidth	VRAM	Single Precision	Double Precision	INT8 Tensor	FP16 Tensor	TF32 Tensor	Interconnect	GPU	GPU Die Size	Transistor Count	TDP	Manufacturing Process
Ampere	6912	~1410MHz	2.4Gbps HBM2	5120-bit	1.6TB/sec	40GB	19.5 TFLOPs	9.7 TFLOPs	624 TFLOPs	312 TFLOPs	156 TFLOPs	600GB/sec	A100	826mm2	54.2B	400W	TSMC 7N
Volta	5120	1530MHz	1.75Gbps HBM2	4096-bit	900GB/sec	16GB/32GB	15.7 TFLOPs	7.8 TFLOPs	N/A	125 TFLOPs	N/A	300GB/sec	GV100	815mm2	21.1B	300W/350W	TSMC 12nm FFN
Pascal	3584	1480MHz	1.4Gbps HBM2	4096-bit	720GB/sec	16GB	10.6 TFLOPs	5.3 TFLOPs	N/A	N/A	N/A	160GB/sec	GP100	610mm2	15.3B	300W	TSMC 16nm FinFET

gollark: As far as I can tell, it basically just dispatches callbacks in an event loop thingy.

gollark: Yes.

gollark: I don't know if Nim actually has "async task things", but Rust async does.

gollark: Instead of blocking the thread it yields the async task thing.

gollark: Oh, they're called *locks*, right. But no, I don't think so, they presumably block.

References

"nvidia dgx-1" (PDF).
"inside pascal". Eight GPU hybrid cube mesh architecture with NVLink
"deep learning supercomputer".
"DGX-1 deep learning system" (PDF). NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs
"DGX Server". DGX Server. Nvidia. Retrieved 7 September 2017.
https://docs.nvidia.com/dgx/pdf/dgx2-user-guide.pdf
Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] "nvidia dgx-1" (PDF).

[2] "inside pascal". Eight GPU hybrid cube mesh architecture with NVLink

[3] "deep learning supercomputer".

[4] "DGX-1 deep learning system" (PDF). NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs

[5] "DGX Server". DGX Server. Nvidia. Retrieved 7 September 2017.

[6] ttps://docs.nvidia.com/dgx/pdf/dgx2-user-guide.pdf

[anand-A100-7] Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.

[verge-A100-8] Tom Warren; James Vincent (May 14, 2020). "Nvidia's first Ampere GPU is designed for data centers and AI, not your PC". The Verge.