AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture, and had many extension updates.[1]
Announced in October 2011,[2] ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit architecture, named "AArch64", and the associated new "A64" instruction set. AArch64 provides user-space compatibility with the existing 32-bit architecture ("AArch32" / ARMv7-A), and instruction set ("A32"). The 16-32bit Thumb instruction set is referred to as "T32" and has no 64-bit counterpart. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit hypervisor.[3] ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012.[4] Apple was the first to release an ARMv8-A compatible core (Cyclone) in a consumer product (iPhone 5S). AppliedMicro, using an FPGA, was the first to demo ARMv8-A.[5] The first ARMv8-A SoC from Samsung is the Exynos 5433 used in the Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration; but it will run only in AArch32 mode.[6]
ARMv8-A includes the VFPv3/v4 and advanced SIMD (Neon) as standard features in both AArch32 and AArch64. It also adds cryptography instructions supporting AES, SHA-1/SHA-256 and finite field arithmetic.[7]
Extension: Data gathering hint (ARMv8.0-DGH).
AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.
The main opcode for selecting which group an A64 instruction belongs to is at bits 25-28.
Type | Bit | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |
Reserved | 0 | op0 | 0 | 0 | 0 | 0 | op1 | |||||||||||||||||||||||||
SME | 1 | op0 | 0 | 0 | 0 | 0 | Varies | |||||||||||||||||||||||||
Unallocated | 0 | 0 | 0 | 1 | ||||||||||||||||||||||||||||
SVE | 0 | 0 | 1 | 0 | Varies | |||||||||||||||||||||||||||
Unallocated | 0 | 0 | 1 | 1 | ||||||||||||||||||||||||||||
Data Processing — Immediate PC-rel. | op | immlo | 1 | 0 | 0 | 0 | 0 | immhi | Rd | |||||||||||||||||||||||
Data Processing — Immediate Others | sf | 1 | 0 | 0 | 01-11 | Rd | ||||||||||||||||||||||||||
Branches + System Instructions | op0 | 1 | 0 | 1 | op1 | op2 | ||||||||||||||||||||||||||
Load and Store Instructions | op0 | 1 | op1 | 0 | op2 | op3 | op4 | |||||||||||||||||||||||||
Data Processing — Register | sf | op0 | op1 | 1 | 0 | 1 | op2 | op3 | ||||||||||||||||||||||||
Data Processing — Floating Point and SIMD | op0 | 1 | 1 | 1 | op1 | op2 | op3 |
In December 2014, ARMv8.1-A,[9] an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation.
Instruction set enhancements included the following:
Enhancements for the exception model and memory translation system included the following:
In January 2016, ARMv8.2-A was announced.[11] Its enhancements fell into four categories:
The Scalable Vector Extension (SVE) is "an optional extension to the ARMv8.2-A architecture and newer" developed specifically for vectorization of high-performance computing scientific workloads.[12][13] The specification allows for variable vector lengths to be implemented from 128 to 2048 bits. The extension is complementary to, and does not replace, the NEON extensions.
A 512-bit SVE variant has already been implemented on the Fugaku supercomputer using the Fujitsu A64FX ARM processor; this computer[14] was the fastest supercomputer in the world for two years, from June 2020[15] to May 2022.[16] A more flexible version, 2x256 SVE, was implemented by the AWS Graviton3 ARM processor.
SVE is supported by the GCC compiler, with GCC 8 supporting automatic vectorization[13] and GCC 10 supporting C intrinsics. As of July 2020[update], LLVM and clang support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization.[17]
In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories:[18]
ARMv8.3-A architecture is now supported by (at least) the GCC 7 compiler.[22]
In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories:[23][24][25]
In September 2018, ARMv8.5-A was announced. Its enhancements fell into these categories:[26][27][28]
On 2 August 2019, Google announced Android would adopt Memory Tagging Extension (MTE).[30]
In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all the features from ARMv8.5.[31][32][33] ARMv9-A also adds:
In September 2019, ARMv8.6-A was announced. Its enhancements fell into these categories:[26][38]
For example, fine-grained traps, Wait-for-Event (WFE) instructions, EnhancedPAC2 and FPAC. The bfloat16 extensions for SVE and Neon are mainly for deep learning use.[40]
In September 2020, ARMv8.7-A was announced. Its enhancements fell into these categories:[26][41]
In September 2021, ARMv8.8-A and ARMv9.3-A were announced. Their enhancements fell into these categories:[26][43]
LLVM 15 supports ARMv8.8-A and ARMv9.3-A.[44]
In September 2022, ARMv8.9-A and ARMv9.4-A were announced, including:[45]
Optional AArch64 support was added to the Armv8-R profile, with the first Arm core implementing it being the Cortex-R82.[46] It adds the A64 instruction set, with some changes to the memory barrier instructions.[47]
pointer authentication extension is defined to be mandatory extension on ARMv8.3-A and is not optional
The ARMv8.3-A architecture is now supported. It can be used by specifying the -march=armv8.3-a option. [..] The option -msign-return-address= is supported to enable return address protection using ARMv8.3-A Pointer Authentication Extensions.