Manycore processor

Manycore processors are specialist multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are used extensively in embedded computers and high-performance computing.

Contrast with multicore architecture

Manycore processors are distinct from multi-core processors in being optimized from the outset for a higher degree of explicit parallelism, and for higher throughput (or lower power consumption) at the expense of latency and lower single thread performance.

The broader category of multi-core processors, by contrast, are usually designed to efficiently run both parallel and serial code, and therefore place more emphasis on high single thread performance (e.g. devoting more silicon to out of order execution, deeper pipelines, more superscalar execution units, and larger, more general caches), and shared memory. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2,4,8), and may be complemented by a manycore accelerator (such as a GPU) in a heterogeneous system.

Motivation

Cache coherency is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as message passing,[1] scratchpad memory, DMA,[2] partitioned global address space,[3] or read-only/non-coherent caches. A manycore processor using a network on a chip and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for TrueNorth).[4]

Manycore processors may have more in common (conceptually) with technologies originating in high performance computing such as clusters and vector processors.[5]

GPUs may be considered a form of manycore processor having multiple shader processing units, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).

Suitable programming models

Message passing interface
OpenCL[6] or other APIs supporting compute kernels
Partitioned global address space
Actor model
OpenMP[7]
Dataflow

Classes of manycore systems

GPUs, which can be described as manycore vector processors
Massively parallel processor array
Asynchronous array of simple processors

Specific manycore architectures

ZettaScaler , Japanese PEZY Computing 2048-core modules
Xeon Phi coprocessor,[8] which has MIC (Many Integrated Cores) architecture
Tilera
Adapteva Epiphany Architecture, a manycore chip using PGAS scratchpad memory
Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on HyperX Architecture
Movidius Myriad 2, a manycore Vision processing unit
Kalray, a manycore PCI-e accelerator for data-intensive tasks
Teraflops Research Chip a manycore processor using message passing
TrueNorth an AI accelerator with a manycore network on a chip architecture
Green arrays a manycore processor using message passing aimed at low power applications
Eyeriss, a manycore processor designed for running convolutional neural nets for embedded vision applications[9]

Specific manycore computers with 1M+ CPU cores

A number of computers built from multicore processors have one million or more individual CPU cores. Examples include:

Sunway TaihuLight, a massively parallel (10M CPU cores) Chinese supercomputer, once one of the fastest supercomputers in the world, using a custom manycore architecture. As of November 2018, the world's third fastest supercomputer (as ranked by the TOP500 list), the Chinese Sunway TaihuLight, obtains its performance from 40,960 SW26010 manycore processors, each containing 256 cores.
Gyoukou (Japanese: 暁光 Hepburn: gyōkō, dawn light), a supercomputer developed by ExaScaler and PEZY Computing.
SpiNNaker, a massively parallel (1M CPU cores) manycore processor built as part of the Human Brain Project

gollark: PotatOS sort of has that, actually.

gollark: `skynet.send("turtle-assist", ("help! @ %d %d %d"):format(gps.locate()))`

gollark: Contact a player.

gollark: Build a turtle emulator out of a Plethora laser and computer.

gollark: This sounds like Premature Optimization\™.

References

Mattson, Tim (January 2010). "The Future of Many Core Computing: A tale of two processors" (PDF).
Hendry, Gilbert; Kretschmann, Mark. "IBM Cell Processor" (PDF).
Olofsson, Andreas; Nordström, Tomas; Ul-Abdin, Zain (2014). "Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany". arXiv:1412.5538 [cs.AR].
Amir, Arnon (June 11, 2015). "IBM SyNAPSE Deep Dive Part 3". IBM Research.
"cell architecture"."The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"
Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times
Barker, J; Bowden, J (2013). "Manycore Parallelism through OpenMP". OpenMP in the Era of Low Power Devices and Accelerators. IWOMP. Lecture Notes in Computer Science, vol 8122. Springer. doi:10.1007/978-3-642-40698-0_4.
Mittal, Sparsh; Anand, Osho; Kumarr, Visnu P (May 2019). "A Survey on Evaluating and Optimizing Performance of Intel Xeon Phi".
Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne (2016). "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks". IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. pp. 262–263.CS1 maint: uses authors parameter (link)

External links

Architecting solutions for the Manycore future, published on Feb 19, 2010 (more than one dead link in the slide)
Eyeriss architecture

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Mattson, Tim (January 2010). "The Future of Many Core Computing: A tale of two processors" (PDF).

[2] Hendry, Gilbert; Kretschmann, Mark. "IBM Cell Processor" (PDF).

[3] Olofsson, Andreas; Nordström, Tomas; Ul-Abdin, Zain (2014). "Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany". arXiv:1412.5538 [cs.AR].

[4] Amir, Arnon (June 11, 2015). "IBM SyNAPSE Deep Dive Part 3". IBM Research.

[5] "cell architecture"."The Cell architecture is like nothing we have ever seen in commodity microprocessors, it is closer in design to multiprocessor vector supercomputers"

[6] Rick Merritt (June 20, 2011), "OEMs show systems with Intel MIC chips", www.eetimes.com, EE Times

[7] Barker, J; Bowden, J (2013). "Manycore Parallelism through OpenMP". OpenMP in the Era of Low Power Devices and Accelerators. IWOMP. Lecture Notes in Computer Science, vol 8122. Springer. doi:10.1007/978-3-642-40698-0_4.

[phiSurvey-8] Mittal, Sparsh; Anand, Osho; Kumarr, Visnu P (May 2019). "A Survey on Evaluating and Optimizing Performance of Intel Xeon Phi".

[9] Chen, Yu-Hsin and Krishna, Tushar and Emer, Joel and Sze, Vivienne (2016). "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks". IEEE International Solid-State Circuits Conference, ISSCC 2016, Digest of Technical Papers. pp. 262–263.CS1 maint: uses authors parameter (link)

Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Speculative (SpMT) Preemptive Cooperative Clustered Multi-Thread (CMT) Hardware scout
Theory	PRAM model PEM Model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array data structure
Coordination	Multiprocessing Memory coherency Cache coherency Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD SIMT MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM POSIX Threads RaftLib UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing