Comparison of ARMv8-A cores

This is a table of 64/32-bit ARMv8-A architecture cores comparing microarchitectures which implement the AArch64 instruction set and mandatory or optional extensions of it. Most chips support 32-bit AArch32 for legacy applications. All chips of this type have a floating-point unit (FPU) that is better than the one in older ARMv7 and NEON (SIMD) chips. Some of these chips have coprocessors also include cores from the older 32-bit architecture (ARMv7). Some of the chips are SoCs and can combine both ARM Cortex-A53 and ARM Cortex-A57, such as the Samsung Exynos 7 Octa.

Table

Company Core Released Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Exec.
ports
Fab
(in nm)
Simult. MT L0 cache L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz
ARM part number (in the main ID register)
ARM Holdings Cortex-A32 (32-bit)[1] 2017 ARMv8.0-A
(only 32-bit)
2-wide8NoLITTLE? 28[2] No No 8–64 + 8–640–1 MiBNo1-4+ 0xD01
Cortex-A34 (64-bit)[3] 2019 ARMv8.0-A
(only 64-bit)
2-wide 8No LITTLE ? No No 8–64 + 8–64 0–1 MiB No 1-4+ 0xD02
Cortex-A35[4] 2017 ARMv8.0-A2-wide[5]8NoYesLITTLE?28 / 16 /
14 / 10
No No8–64 + 8–640 / 128 KiB–1 MiBNo1–4+1.78 0xD04
Cortex-A53[6] 2014 ARMv8.0-A2-wide8NoConditional+
Indirect branch
prediction
big/LITTLE228 / 20 /
16 / 14 / 10
No No8–64 + 8–64128 KiB–2 MiBNo1–4+2.24 0xD03
Cortex-A55[7] 2017 ARMv8.2-A2-wide8Nobig/LITTLE228 / 20 /
16 / 14 / 12 / 10
No No16–64 + 16–640–256 KiB/core0–4 MiB1–8+2.65[8] 0xD05
Cortex-A57[9] 2013 ARMv8.0-A3-wide15 Yes
3-wide dispatch
Two-levelbig828 / 20 /
16[10] / 14
No No48 + 320.5–2 MiBNo1–4+4.6 0xD07
Cortex-A65[11] 2019 ARMv8.2-A??YesTwo-level?2? No No????? 0xD06
Cortex-A65AE[12] 2019 ARMv8.2-A??YesTwo-level?2? SMT2 No16-64 + 16-6464-256 KiB0-4 MB1–8? 0xD43
Cortex-A72[13] 2015 ARMv8.0-A3-wide15 Yes
5-wide dispatch
Two-levelbig828 / 16 No No48 + 320.5–4 MiBNo1–4+4.72 0xD08
Cortex-A73[14] 2016 ARMv8.0-A2-wide11–12 Yes
4-wide dispatch
Two-levelbig728 / 16 / 10 No No64 + 32/641–8 MiBNo1–4+~6.35 0xD09
Cortex-A75[7] 2017 ARMv8.2-A3-wide11–13 Yes
6-wide dispatch
Two-levelbig8?28 / 16 / 10 No No64 + 64256–512 KiB/core0–4 MiB1–8+? 0xD0A
Cortex-A76[15] 2018 ARMv8.2-A4-wide11–13Yes
8-wide dispatch
Two-levelbig810 / 7 No No64 + 64256–512 KiB/core1–4 MiB1–4? 0xD0B
Cortex-A76AE[16] 2018 ARMv8.2-A??YesTwo-levelbig?? SMT2 No????? 0xD0E
Cortex-A77[17] 2019 ARMv8.2-A 4-wide 11–13 Yes
10-wide dispatch
Two-levelbig 12 7 No 1.5K entries 64 + 64 256–512 KiB/core 1–4 MiB 1-4 ? 0xD0D
Cortex-A78[18][19] 2020 ARMv8.2-A 4-wide Yes Yes big 13 No 1.5K entries 32/64 + 32/64 256–512 KiB/core 1–4 MiB 1-4 ? 0xD41
Cortex-X1[20] 2020 ARMv8.2-A 5-wide[20] ? Yes Yes big 15 No 3K entries 64 + 64 up to 1 MiB[20] up to 8 MiB[20] custom[20] ? 0xD44
Apple Inc. Cyclone[21] 2013 ARMv8.0-A6-wide[22]16[22]Yes[22]YesNo9[22]28[23] No No64 + 64[22]1 MiB[22]4 MiB[22]2[24]?
Typhoon 2014 ARMv8.0‑A6-wide[25]16[25]Yes[25]YesNo920 No No64 + 64[22]1 MiB[25]4 MiB[22]2, 3 (A8X)?
Twister 2015 ARMv8.0‑A6-wide[25]16[25]Yes[25]YesNo916 / 14 No No64 + 64[25]3 MiB[25]4 MiB[25]
No (A9X)
2?
Hurricane 2016 ARMv8.1‑A 6-wide[26] 16 Yes Yes "big" (In A10/A10X paired with "LITTLE" Zephyr
cores)
9 16 (A10)
10 (A10X)
No No 64 + 64[27] 3 MiB[27] (A10)
8 MiB (A10X)
4 MiB[27] (A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Zephyr 2016 ARMv8.1‑A 3-wide 12 Yes Yes LITTLE 5 16 (A10)
10 (A10X)
No No 32 + 32[28] 1 MiB 4 MiB[27] (A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Monsoon 2017 ARMv8.2‑A[29] 7-wide 16 Yes Yes "big" (In Apple A11 paired with "LITTLE" Mistral
cores)
13 10 No No 64 + 64[28] 8 MiB No 2x Monsoon + 4× Mistral ?
Mistral 2017 ARMv8.2‑A[29] 3-wide 12 Yes Yes LITTLE 5 10 No No 32 + 32[28] 1 MiB No 2x Monsoon + 4× Mistral ?
Vortex 2018 ARMv8.3‑A[30] 7-wide 16 Yes Yes "big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest
cores)
13 7 No No 128 + 128[28] 8 MiB No 2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Tempest 2018 ARMv8.3‑A[30] 3-wide 12 Yes Yes LITTLE 5 7 No No 32 + 32[28] 2 MiB No 2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Lightning 2019 ARMv8.4‑A [31] 7-wide 16 Yes Yes "big" (In Apple A13 paired with "LITTLE" Thunder
cores)
13 7 No No 128 + 128[32] 8 MiB No 2x Lightning + 4x Thunder ?
Thunder 2019 ARMv8.4‑A [33] 3-wide 12 Yes Yes LITTLE 5 7 No No 32 + 48[34] 4 MiB No 2x Lightning + 4x Thunder ?
Nvidia Denver[35][36] 2014 ARMv8‑A 2-wide hardware
decoder, up to
7-wide variable-
length VLIW
micro-ops
13 Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
No 7 28 No No 128 + 64 2 MiB No 2 ?
Denver 2[37] 2016 ARMv8‑A ? 13 Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
"Super" Nvidia's own implementation ? 16 No No 128 + 64 2 MiB No 2?
Carmel 2018 ARMv8.2‑A ? Direct+
Indirect branch
prediction
? 12 No No 128 + 64 2 MiB (4 MiB @ 8 cores) 2 (+ 8) ?
Cavium ThunderX[38][39] 2014 ARMv8-A2-wide9[39]Yes[38]Two-level?28 No No78 + 32[40][41]16 MiB[40][41]No8–16, 24–48?
ThunderX2
[42](ex. Broadcom Vulcan[43])
2018[44] ARMv8.1-A
[45]
4-wide
"4 μops"[46][47]
?Yes[48]Multi-level??16[49] SMT4 No32 + 32
(data 8-way)
256KB
per core[50]
1MB
per core[50]
16-32[50]?
Marvell ThunderX3 2020[51] ARMv8.3+[51]??YesMulti-level??7[51] SMT4[51] ??????
Applied

Micro

Helix 2014???????40 / 28 No No32 + 32 (per core;
write-through
w/parity)[52]
256 KiB shared
per core pair (with ECC)
1 MiB/core2, 4, 8?
X-Gene 2013 ?4-wide15Yes???40[53] No No8 MiB84.2
X-Gene 2 2015 ?4-wide15Yes???28[54] No No8 MiB84.2
X-Gene 3[54] 2017 ???????16 No No??32 MiB32?
Qualcomm Kryo 2016 ARMv8-A??YesTwo-level?"big" or "LITTLE"
Qualcomm's own similar implementation
?14[55] No No32+24[56]0.5–1 MiB2, 46.3
Kryo 2XX 2017 ARMv8-A 2-wide 11–12Yes
7-wide dispatch
Two-levelbig 7 14 / 11 / 10 [57] No No 64 + 32/64? 512 KiB/Gold Core No 4?
2-wide 8No Conditional+
Indirect branch
prediction
? 2 No No 8–64? + 8–64? 256 KiB/Silver Core 4?
Kryo 3XX 2018 ARMv8.2-A 3-wide 11–13Yes
8-wide dispatch
Two-levelbig 8 10[57] No No 64+64[57] 256 KiB/Gold Core 2 MiB 4?
2-wide 8No Conditional+
Indirect branch
prediction
? 28 No No 16–64? + 16–64? 128 KiB/Silver 4?
Kryo 4XX 2019 ARMv8.2-A 4-wide 11–13Yes
8-wide dispatch
Yesbig 8 11 / 8 / 7 No No 64 + 64 512 KiB/Gold Prime

256 KiB/Gold

2 MiB 1+3?
2-wide 8No Conditional+
Indirect branch
prediction
? 2 No No 16–64? + 16–64? 128 KiB/Silver 4 ?
Falkor[58][59] 2017[60] "ARMv8.1-A features";[59] AArch64 only (not 32-bit)[59]4-wide10–15Yes
8-wide dispatch
Yes?810 No 24 KiB88[59] + 32500KiB1.25MiB40-48?
Samsung M1/M2[61][62] 2015 ARMv8-A4-wide13[63]Yes
9-wide dispatch[64]
Two-levelbig814 / 10 No No64 + 322 MiB[65]no4?
M3[63][66] 2018 ARMv8.2-A6-wide15Yes
12-wide dispatch
Two-levelbig1210 No No64 + 64512 KiB per core4096KB4?
M4[67] 2019 ARMv8.2-A 6-wide 15Yes
12-wide dispatch
Two-levelbig 12 8 / 7 No No 64 + 64 512 KiB per core 4096KB 2 ?
Fujitsu A64FX[68][69] 2019 ARMv8.2-A 4/2-wide 7+Yes
5-way?
Yesn/a 8+ 7 No No 64 + 64 8MiB per 12+1 cores No 48+4 1.9GHz+; 15GF/W+.
HiSilicon TaiShan V110[70] 2019 ARMv8.2-A 4-wide ? Yes Yes n/a 8 7 No No 64 + 64 512 KiB per core 1 MiB per core ? ?
Company Core Released Revision Decode Pipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE role Exec.
ports
Fab
(in nm)
Simult. MT L0 cache L1 cache
Instr + Data
(in KiB)
L2 cache L3 cache Core
configu-
rations
DMIPS/
MHz
ARM part number (in the main ID register)

As Dhrystone (implied in "DMIPS") is a synthetic benchmark developed in 1980s, it is no longer representative of prevailing workloads  use with caution.

gollark: Can I have a child of it? I want to spread its messiness further.
gollark: Okay, wow, Xu2uy is cool.
gollark: It also has a child bred with an incredibly messy red.
gollark: I give you true messiness: https://dragcave.net/lineage/DLdkF
gollark: Messy? 6G? Pitiful.

See also

References

  1. Frumusanu, Andrei (22 February 2016). "ARM Announces Cortex-A32 IoT and Embedded Processor". Anandtech.com. Retrieved 13 June 2016.
  2. "New Ultra-efficient ARM Cortex-A32 Processor Expands… - ARM". www.arm.com. Retrieved 1 October 2016.
  3. Ltd, Arm. "Cortex-A34". ARM Developer. Retrieved 10 October 2019.
  4. "Cortex-A35 Processor". ARM. ARM Ltd.
  5. Frumusanu, Andrei. "ARM Announces New Cortex-A35 CPU - Ultra-High Efficiency For Wearables & More".
  6. "Cortex-A53 Processor". ARM. ARM Ltd.
  7. Matt, Humrick (29 May 2017). "Exploring DynamIQ and ARM's New CPUs: Cortex-A75, Cortex-A55". Anandtech.com. Retrieved 29 May 2017.
  8. Based on 18% perf. increment over Cortex-A53 "Arm Cortex-A55: Efficient performance from edge to cloud". ARM. ARM Ltd.
  9. Smith, Andrei Frumusanu, Ryan. "ARM A53/A57/T760 investigated - Samsung Galaxy Note 4 Exynos Review". www.anandtech.com. Retrieved 17 June 2019.
  10. "TSMC Delivers First Fully Functional 16FinFET Networking Processor". TSMC. 25 September 2014. Retrieved 19 February 2015.
  11. "Cortex-A65 - Arm Developer". ARM Ltd. Retrieved 14 July 2020.
  12. "Cortex-A65AE - Arm Developer". ARM Ltd. Retrieved 26 April 2019.
  13. Frumusanu, Andrei. "ARM Reveals Cortex-A72 Architecture Details". Anandtech. Retrieved 25 April 2015.
  14. Frumusanu, Andrei (29 May 2016). "The ARM Cortex A73 - Artemis Unveiled". Anandtech.com. Retrieved 31 May 2016.
  15. Frumusanu, Andrei (31 May 2018). "ARM Cortex-A76 CPU Unveiled". Anandtech. Retrieved 1 June 2018.
  16. "Cortex-A76AE - Arm Developer". ARM Ltd. Retrieved 14 July 2020.
  17. Schor, David (26 May 2019). "Arm Unveils Cortex-A77, Emphasizes Single-Thread Performance". WikiChip Fuse. Retrieved 17 June 2019.
  18. "Arm Unveils the Cortex-A78: When Less Is More". WikiChip Fuse. 26 May 2020. Retrieved 28 May 2020.
  19. Ltd, Arm. "Cortex-A78". ARM Developer. Retrieved 28 May 2020.
  20. "Introducing the Arm Cortex-X Custom program". community.arm.com. Retrieved 28 May 2020.
  21. Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: The Move to 64-bit". AnandTech. Retrieved 3 July 2014.
  22. Lal Shimpi, Anand (31 March 2014). "Apple's Cyclone Microarchitecture Detailed". AnandTech. Retrieved 3 July 2014.
  23. Dixon-Warren, Sinjin (20 January 2014). "Samsung 28nm HKMG Inside the Apple A7". Chipworks. Archived from the original on 6 April 2014. Retrieved 3 July 2014.
  24. Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: A7 SoC Explained". AnandTech. Retrieved 3 July 2014.
  25. Ho, Joshua; Smith, Ryan (2 November 2015). "The Apple iPhone 6s and iPhone 6s Plus Review". AnandTech. Retrieved 13 February 2016.
  26. "Apple had shifted the microarchitecture in Hurricane (A10) from a 6-wide decode from to a 7-wide decode". AnandTech. 5 October 2018.
  27. "Apple A10 Fusion". system-on-a-chip.specout.com. Retrieved 1 October 2016.
  28. "Measured and Estimated Cache Sizes". AnandTech. 5 October 2018.
  29. "Apple A11 New Instruction Set Extensions" (PDF). Apple Inc. 8 June 2018.
  30. "Apple A12 Pointer Authentication Codes". Jonathan Levin, @Morpheus. 12 September 2018.
  31. "A13 has ARMv8.4, apparently (LLVM project sources, thanks, @Longhorn)". Jonathan Levin, @Morpheus. 13 March 2020.
  32. "The Apple A13 SoC: Lightning & Thunder". AnandTech. 16 October 2019.
  33. "A13 has ARMv8.4, apparently (LLVM project sources, thanks, @Longhorn)". Jonathan Levin, @Morpheus. 13 March 2020.
  34. "The A13's Memory Subsystem: Faster L2, More SLC BW". AnandTech. 16 October 2019.
  35. Stam, Nick (11 August 2014). "Mile High Milestone: Tegra K1 "Denver" Will Be First 64-bit ARM Processor for Android". NVidia. Retrieved 11 August 2014.
  36. Gwennap, Linley. "Denver Uses Dynamic Translation to Outperform Mobile Rivals". The Linley Group. Retrieved 24 April 2015.
  37. Ho, Joshua (25 August 2016). "Hot Chips 2016: NVIDIA Discloses Tegra Parker Details". Anandtech. Retrieved 25 August 2016.
  38. De Gelas, Johan (16 December 2014). "ARM Challenging Intel in the Server Market". Anandtech. Retrieved 8 March 2017.
  39. De Gelas, Johan (15 June 2016). "Investigating the Cavium ThunderX". Anandtech. Retrieved 8 March 2017.
  40. "64-bit Cortex Platform To Take On x86 Servers In The Cloud". electronic design. 5 June 2014. Retrieved 7 February 2015.
  41. "ThunderX_CP™ Family of Workload Optimized Compute Processors" (PDF). Cavium. 2014. Retrieved 7 February 2015.
  42. "A Look at Cavium's New High-Performance ARM Microprocessors and the Isambard Supercomputer". WikiChip Fuse. 3 June 2018. Retrieved 17 June 2019.
  43. "⚙ D30510 Vulcan is now ThunderX2T99". reviews.llvm.org.
  44. Kennedy, Patrick (7 May 2018). "Cavium ThunderX2 256 Thread Arm Platforms Hit General Availability". Retrieved 10 May 2018.
  45. "⚙ D21500 [AARCH64] Add support for Broadcom Vulcan". reviews.llvm.org.
  46. https://hpcuserforum.com/presentations/santafe2014/Broadcom%20Monday%20night.pdf
  47. "The Linley Group - Processor Conference 2013". www.linleygroup.com.
  48. "ThunderX2 ARM Processors- A Game Changing Family of Workload Optimized Processors for Data Center and Cloud Applications - Cavium". www.cavium.com.
  49. "Broadcom Announces Server-Class ARMv8-A Multi-Core Processor Architecture". Broadcom. 15 October 2013. Retrieved 11 August 2014.
  50. Kennedy, Patrick (9 May 2018). "Cavium ThunderX2 Review and Benchmarks a Real Arm Server Option". Serve the Home. Retrieved 10 May 2018.
  51. Frumusanu, Andrei (16 March 2020). "Marvell Announces ThunderX3: 96 Cores & 384 Thread 3rd Gen Arm Server Processor".
  52. Ganesh T S (3 October 2014). "ARMv8 Goes Embedded with Applied Micro's HeliX SoCs". AnandTech. Retrieved 9 October 2014.
  53. Morgan, Timothy Prickett (12 August 2014). "Applied Micro Plots Out X-Gene ARM Server Future". Enterprisetech. Retrieved 9 October 2014.
  54. De Gelas, Johan (15 March 2017). "AppliedMicro's X-Gene 3 SoC Begins Sampling". Anandtech. Retrieved 15 March 2017.
  55. "Snapdragon 820 and Kryo CPU: heterogeneous computing and the role of custom compute". Qualcomm. 2 September 2015. Retrieved 6 September 2015.
  56. Frumusanu, Ryan Smith, Andrei. "The Qualcomm Snapdragon 820 Performance Preview: Meet Kryo".
  57. Smith, Andrei Frumusanu, Ryan. "The Snapdragon 845 Performance Preview: Setting the Stage for Flagship Android 2018". Retrieved 11 June 2018.
  58. Shilov, Anton (16 December 2016). "Qualcomm Demos 48-Core Centriq 2400 SoC in Action, Begins Sampling". Anandtech. Retrieved 8 March 2017. In 2015, Qualcomm teamed up with Xilinx and Mellanox to ensure that its server SoCs are compatible with FPGA-based accelerators and data-center connectivity solutions (the fruits of this partnership will likely emerge in 2018 at best).
  59. Cutress, Ian (20 August 2017). "Analyzing Falkor's Microarchitecture". Anandtech. Retrieved 21 August 2017. The CPU cores, code named Falkor, will be ARMv8.0 compliant although with ARMv8.1 features, allowing software to potentially seamlessly transition from other ARM environments (or need a recompile). The Centriq 2400 family is set to be AArch64 only, without support for AArch32: Qualcomm states that this saves some power and die area, but that they primarily chose this route because the ecosystems they are targeting have already migrated to 64-bit. Qualcomm’s Chris Bergen, Senior Director of Product Management for the Centriq 2400, stated that the majority of new and upcoming companies have started off with 64-bit as their base in the data center, and not even considering 32-bit, which is a reason for the AArch64-only choice here. [..] Micro-op cache / L0 I-cache with Way prediction [..] The L1 I-cache is 64KB, which is similar to other ARM architecture core designs, and also uses 64-byte lines but with an 8-way associativity. To software, as the L0 is transparent, the L1 I-cache will show as an 88KB cache.
  60. Shrout, Ryan (8 November 2017). "Qualcomm Centriq 2400 Arm-based Server Processor Begins Commercial Shipment". PC Per. Retrieved 8 November 2017.
  61. Ho, Joshua. "Hot Chips 2016: Exynos M1 Architecture Disclosed".
  62. Frumusanu, Andrei. "Samsung Announces Exynos 8890 with Cat.12/13 Modem and Custom CPU".
  63. Frumusanu, Andrei (23 January 2018). "The Samsung Exynos M3 - 6-wide Decode with 50%+ IPC Increase". Anandtech. Retrieved 25 January 2018.
  64. Frumusanu, Andrei. "Hot Chips 2016: Exynos M1 Architecture Disclosed". Anandtech. Retrieved 29 May 2017.
  65. "'Neural network' spotted deep inside Samsung's Galaxy S7 silicon brain".
  66. Frumusanu, Andrei. "Hot Chips 2018: Samsung's Exynos-M3 CPU Architecture Deep Dive". www.anandtech.com. Retrieved 17 June 2019.
  67. Schor, David (14 January 2019). "Samsung Discloses Exynos M4 Changes, Upgrades Support for ARMv8.2, Rearranges The Back-End". WikiChip Fuse. Retrieved 17 June 2019.
  68. Fujitsu High Performance CPU for the Post-K Computer (PDF), 21 July 2018, retrieved 16 September 2019
  69. Arm A64fx and Post-K: Game Changing CPU & Supercomputer for HPC and its Convergence of with Big Data / AI (PDF), 3 April 2019, retrieved 16 September 2019
  70. Schor, David (3 May 2019). "Huawei Expands Kunpeng Server CPUs, Plans SMT, SVE For Next Gen". WikiChip Fuse. Retrieved 13 December 2019.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.