| ============================== | 
 | Memory Layout on AArch64 Linux | 
 | ============================== | 
 |  | 
 | Author: Catalin Marinas <catalin.marinas@arm.com> | 
 |  | 
 | This document describes the virtual memory layout used by the AArch64 | 
 | Linux kernel. The architecture allows up to 4 levels of translation | 
 | tables with a 4KB page size and up to 3 levels with a 64KB page size. | 
 |  | 
 | AArch64 Linux uses either 3 levels or 4 levels of translation tables | 
 | with the 4KB page configuration, allowing 39-bit (512GB) or 48-bit | 
 | (256TB) virtual addresses, respectively, for both user and kernel. With | 
 | 64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB) | 
 | virtual address, are used but the memory layout is the same. | 
 |  | 
 | ARMv8.2 adds optional support for Large Virtual Address space. This is | 
 | only available when running with a 64KB page size and expands the | 
 | number of descriptors in the first level of translation. | 
 |  | 
 | User addresses have bits 63:48 set to 0 while the kernel addresses have | 
 | the same bits set to 1. TTBRx selection is given by bit 63 of the | 
 | virtual address. The swapper_pg_dir contains only kernel (global) | 
 | mappings while the user pgd contains only user (non-global) mappings. | 
 | The swapper_pg_dir address is written to TTBR1 and never written to | 
 | TTBR0. | 
 |  | 
 |  | 
 | AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):: | 
 |  | 
 |   Start			End			Size		Use | 
 |   ----------------------------------------------------------------------- | 
 |   0000000000000000	0000ffffffffffff	 256TB		user | 
 |   ffff000000000000	ffff7fffffffffff	 128TB		kernel logical memory map | 
 |   ffff800000000000	ffff9fffffffffff	  32TB		kasan shadow region | 
 |   ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region | 
 |   ffffa00008000000	ffffa0000fffffff	 128MB		modules | 
 |   ffffa00010000000	fffffdffbffeffff	 ~93TB		vmalloc | 
 |   fffffdffbfff0000	fffffdfffe5f8fff	~998MB		[guard region] | 
 |   fffffdfffe5f9000	fffffdfffe9fffff	4124KB		fixed mappings | 
 |   fffffdfffea00000	fffffdfffebfffff	   2MB		[guard region] | 
 |   fffffdfffec00000	fffffdffffbfffff	  16MB		PCI I/O space | 
 |   fffffdffffc00000	fffffdffffdfffff	   2MB		[guard region] | 
 |   fffffdffffe00000	ffffffffffdfffff	   2TB		vmemmap | 
 |   ffffffffffe00000	ffffffffffffffff	   2MB		[guard region] | 
 |  | 
 |  | 
 | AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):: | 
 |  | 
 |   Start			End			Size		Use | 
 |   ----------------------------------------------------------------------- | 
 |   0000000000000000	000fffffffffffff	   4PB		user | 
 |   fff0000000000000	fff7ffffffffffff	   2PB		kernel logical memory map | 
 |   fff8000000000000	fffd9fffffffffff	1440TB		[gap] | 
 |   fffda00000000000	ffff9fffffffffff	 512TB		kasan shadow region | 
 |   ffffa00000000000	ffffa00007ffffff	 128MB		bpf jit region | 
 |   ffffa00008000000	ffffa0000fffffff	 128MB		modules | 
 |   ffffa00010000000	fffff81ffffeffff	 ~88TB		vmalloc | 
 |   fffff81fffff0000	fffffc1ffe58ffff	  ~3TB		[guard region] | 
 |   fffffc1ffe590000	fffffc1ffe9fffff	4544KB		fixed mappings | 
 |   fffffc1ffea00000	fffffc1ffebfffff	   2MB		[guard region] | 
 |   fffffc1ffec00000	fffffc1fffbfffff	  16MB		PCI I/O space | 
 |   fffffc1fffc00000	fffffc1fffdfffff	   2MB		[guard region] | 
 |   fffffc1fffe00000	ffffffffffdfffff	3968GB		vmemmap | 
 |   ffffffffffe00000	ffffffffffffffff	   2MB		[guard region] | 
 |  | 
 |  | 
 | Translation table lookup with 4KB pages:: | 
 |  | 
 |   +--------+--------+--------+--------+--------+--------+--------+--------+ | 
 |   |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0| | 
 |   +--------+--------+--------+--------+--------+--------+--------+--------+ | 
 |    |                 |         |         |         |         | | 
 |    |                 |         |         |         |         v | 
 |    |                 |         |         |         |   [11:0]  in-page offset | 
 |    |                 |         |         |         +-> [20:12] L3 index | 
 |    |                 |         |         +-----------> [29:21] L2 index | 
 |    |                 |         +---------------------> [38:30] L1 index | 
 |    |                 +-------------------------------> [47:39] L0 index | 
 |    +-------------------------------------------------> [63] TTBR0/1 | 
 |  | 
 |  | 
 | Translation table lookup with 64KB pages:: | 
 |  | 
 |   +--------+--------+--------+--------+--------+--------+--------+--------+ | 
 |   |63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0| | 
 |   +--------+--------+--------+--------+--------+--------+--------+--------+ | 
 |    |                 |    |               |              | | 
 |    |                 |    |               |              v | 
 |    |                 |    |               |            [15:0]  in-page offset | 
 |    |                 |    |               +----------> [28:16] L3 index | 
 |    |                 |    +--------------------------> [41:29] L2 index | 
 |    |                 +-------------------------------> [47:42] L1 index (48-bit) | 
 |    |                                                   [51:42] L1 index (52-bit) | 
 |    +-------------------------------------------------> [63] TTBR0/1 | 
 |  | 
 |  | 
 | When using KVM without the Virtualization Host Extensions, the | 
 | hypervisor maps kernel pages in EL2 at a fixed (and potentially | 
 | random) offset from the linear mapping. See the kern_hyp_va macro and | 
 | kvm_update_va_mask function for more details. MMIO devices such as | 
 | GICv2 gets mapped next to the HYP idmap page, as do vectors when | 
 | ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. | 
 |  | 
 | When using KVM with the Virtualization Host Extensions, no additional | 
 | mappings are created, since the host kernel runs directly in EL2. | 
 |  | 
 | 52-bit VA support in the kernel | 
 | ------------------------------- | 
 | If the ARMv8.2-LVA optional feature is present, and we are running | 
 | with a 64KB page size; then it is possible to use 52-bits of address | 
 | space for both userspace and kernel addresses. However, any kernel | 
 | binary that supports 52-bit must also be able to fall back to 48-bit | 
 | at early boot time if the hardware feature is not present. | 
 |  | 
 | This fallback mechanism necessitates the kernel .text to be in the | 
 | higher addresses such that they are invariant to 48/52-bit VAs. Due | 
 | to the kasan shadow being a fraction of the entire kernel VA space, | 
 | the end of the kasan shadow must also be in the higher half of the | 
 | kernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit, | 
 | the end of the kasan shadow is invariant and dependent on ~0UL, | 
 | whilst the start address will "grow" towards the lower addresses). | 
 |  | 
 | In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSET | 
 | is kept constant at 0xFFF0000000000000 (corresponding to 52-bit), | 
 | this obviates the need for an extra variable read. The physvirt | 
 | offset and vmemmap offsets are computed at early boot to enable | 
 | this logic. | 
 |  | 
 | As a single binary will need to support both 48-bit and 52-bit VA | 
 | spaces, the VMEMMAP must be sized large enough for 52-bit VAs and | 
 | also must be sized large enough to accommodate a fixed PAGE_OFFSET. | 
 |  | 
 | Most code in the kernel should not need to consider the VA_BITS, for | 
 | code that does need to know the VA size the variables are | 
 | defined as follows: | 
 |  | 
 | VA_BITS		constant	the *maximum* VA space size | 
 |  | 
 | VA_BITS_MIN	constant	the *minimum* VA space size | 
 |  | 
 | vabits_actual	variable	the *actual* VA space size | 
 |  | 
 |  | 
 | Maximum and minimum sizes can be useful to ensure that buffers are | 
 | sized large enough or that addresses are positioned close enough for | 
 | the "worst" case. | 
 |  | 
 | 52-bit userspace VAs | 
 | -------------------- | 
 | To maintain compatibility with software that relies on the ARMv8.0 | 
 | VA space maximum size of 48-bits, the kernel will, by default, | 
 | return virtual addresses to userspace from a 48-bit range. | 
 |  | 
 | Software can "opt-in" to receiving VAs from a 52-bit space by | 
 | specifying an mmap hint parameter that is larger than 48-bit. | 
 |  | 
 | For example: | 
 |  | 
 | .. code-block:: c | 
 |  | 
 |    maybe_high_address = mmap(~0UL, size, prot, flags,...); | 
 |  | 
 | It is also possible to build a debug kernel that returns addresses | 
 | from a 52-bit space by enabling the following kernel config options: | 
 |  | 
 | .. code-block:: sh | 
 |  | 
 |    CONFIG_EXPERT=y && CONFIG_ARM64_FORCE_52BIT=y | 
 |  | 
 | Note that this option is only intended for debugging applications | 
 | and should not be used in production. |