|
| 1 | +@page page_kernel_smp_boot QEMU virt64 AArch64 SMP Boot Flow |
| 2 | + |
| 3 | +# QEMU virt64 AArch64 SMP Boot Flow |
| 4 | + |
| 5 | +This guide walks through the multi-core boot path of RT-Thread on AArch64 using `bsp/qemu-virt64-aarch64` as the concrete reference. It is written to be beginner-friendly and mirrors the current BSP implementation: from `_start` assembly, early MMU bring-up, `rtthread_startup()`, PSCI wakeup of secondary cores, to all CPUs entering the scheduler. The key transitions are paired with static SVG diagrams so the flow stays readable on GitHub and in generated documentation. |
| 6 | + |
| 7 | +- Target setup: QEMU `-machine virt`, `-cpu cortex-a57`, `-smp >=2`, `RT_USING_SMP` enabled, device tree contains `enable-method = "psci"`. |
| 8 | +- Goal: Know who does what, where the code lives, and what to check when SMP does not come up. |
| 9 | + |
| 10 | +## Big Picture First |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +The overview highlights the two key boundaries in this BSP: CPU0 completes the early assembly and common board bring-up, then `rt_hw_secondary_cpu_up()` hands secondary cores over to their own ASM plus C initialization path before everyone meets in the scheduler. |
| 15 | + |
| 16 | +## Boot CPU: from `_start` to MMU on |
| 17 | + |
| 18 | +**Input registers**: QEMU firmware loads the image and jumps to `_start` at `libcpu/aarch64/cortex-a/entry_point.S`, passing the DTB physical address in `x0` (with `x1~x3` reserved). |
| 19 | + |
| 20 | +**What `_start` does (short version)** |
| 21 | + |
| 22 | +1. Clear thread pointers: zero `tpidr_el1/tpidrro_el0` to avoid stale per-cpu state. |
| 23 | +2. Unify exception level: `init_cpu_el` drops to EL1h, enables timer access, masks unwanted traps. |
| 24 | +3. Clear BSS: `init_kernel_bss` fills `__bss` with zeros so globals start clean. |
| 25 | +4. Prepare stack: `init_cpu_stack_early` switches to SP_EL1 and uses `.boot_cpu_stack_top` as the early stack. |
| 26 | +5. Remember the FDT: `rt_hw_fdt_install_early(x0)` stores DTB address/size before MMU is enabled. |
| 27 | +6. Early MMU mapping: `init_mmu_early`/`enable_mmu_early` build a 0~1G identity map, set TTBR0/TTBR1 and SCTLR_EL1, flush I/D cache and TLB, then branch to `rtthread_startup()` (address in x8). |
| 28 | + |
| 29 | +> Tip: the early page table only covers minimal kernel space; the C phase will remap a fuller layout. |
| 30 | +
|
| 31 | +## C-side startup backbone |
| 32 | + |
| 33 | +`rtthread_startup()` (in `src/components.c`) is the spine of the sequence: |
| 34 | + |
| 35 | +- **Interrupts off + spinlock ready**: `rt_hw_local_irq_disable()` followed by `_cpus_lock` init to keep early steps non-preemptible. |
| 36 | +- **Board init**: `rt_hw_board_init()` directly calls the BSP hook `rt_hw_common_setup()` (`libcpu/aarch64/common/setup.c`) to: |
| 37 | + - set VBAR, build kernel address space, copy DTB to a safe region and pre-parse it; |
| 38 | + - configure MMU mappings; init memblock/page allocator/system heap; |
| 39 | + - parse DT for console, memory, initrd; |
| 40 | + - init GIC (and GICv3 Redistributor if enabled), UART, global GTIMER; |
| 41 | + - install SMP IPIs (`RT_SCHEDULE_IPI`, `RT_STOP_IPI`, `RT_SMP_CALL_IPI`) and unmask them; |
| 42 | + - set idle hook `rt_hw_idle_wfi` so idle CPUs enter low-power wait. |
| 43 | +- **Kernel subsystems**: init system timer, scheduler, signals, and create main/timer/idle threads. |
| 44 | +- **Start scheduling**: `rt_system_scheduler_start()` runs `main_thread_entry()` first. |
| 45 | + |
| 46 | +## How secondary cores are brought up |
| 47 | + |
| 48 | +`main_thread_entry()` calls `rt_hw_secondary_cpu_up()` before invoking user `main()`, so all CPUs join scheduling. |
| 49 | + |
| 50 | +### What `rt_hw_secondary_cpu_up()` does |
| 51 | + |
| 52 | +1. Convert `_secondary_cpu_entry` to a physical address via `rt_kmem_v2p()`—the real entry the firmware jumps to. |
| 53 | +2. Walk CPU nodes recorded at boot (`cpu_info_init()` stored DTB info in `cpu_np[]` and `rt_cpu_mpidr_table[]`). |
| 54 | +3. Read `enable-method`: |
| 55 | + - QEMU virt64: `"psci"` → use `cpu_psci_ops.cpu_boot()` to issue `CPU_ON(target, entry)` to firmware. |
| 56 | + - Legacy compatibility: `"spin-table"` → write `cpu-release-addr` and `sev` to wake. |
| 57 | +4. Any failure prints a warning but does not halt the boot flow, making diagnosis easier. |
| 58 | + |
| 59 | +### What happens on a secondary core |
| 60 | + |
| 61 | +- **Assembly entry `_secondary_cpu_entry`**: |
| 62 | + - Read `mpidr_el1`, compare with `rt_cpu_mpidr_table` to find the logical CPU id, store it back, and write it into `TPIDR` for per-cpu access. |
| 63 | + - Allocate its own stack by offsetting `ARCH_SECONDARY_CPU_STACK_SIZE` per core. |
| 64 | + - Re-run `init_cpu_el`/`init_cpu_stack_early`, reuse the same early MMU path, then branch to `rt_hw_secondary_cpu_bsp_start()`. |
| 65 | + |
| 66 | +- **C-side handoff `rt_hw_secondary_cpu_bsp_start()`** (`libcpu/aarch64/common/setup.c`): |
| 67 | + - Reset VBAR and synchronize with the boot CPU via `_cpus_lock`. |
| 68 | + - Update this core's MPIDR entry and bind the shared `MMUTable`. |
| 69 | + - Init local vector table, GIC CPU interface (and GICv3 Redistributor if present), enable the local GTIMER. |
| 70 | + - Unmask the three SMP IPIs; re-calibrate `loops_per_tick` for microsecond delay if needed. |
| 71 | + - Call `rt_dm_secondary_cpu_init()` to register the CPU device, then enter the scheduler via `rt_system_scheduler_start()`. |
| 72 | + |
| 73 | +### Bring-up timeline |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +Read this figure from top to bottom: CPU0 reaches `rt_system_scheduler_start()` first, `main_thread_entry()` triggers `CPU_ON`, and each secondary CPU repeats the minimal EL plus MMU path before entering `rt_hw_secondary_cpu_bsp_start()`. |
| 78 | + |
| 79 | +## Source map (where to read the code) |
| 80 | + |
| 81 | +| Stage | File | Role | |
| 82 | +| --- | --- | --- | |
| 83 | +| Boot assembly | `libcpu/aarch64/cortex-a/entry_point.S` | `_start`, `_secondary_cpu_entry`, early MMU enable | |
| 84 | +| BSP hook | `bsp/qemu-virt64-aarch64/drivers/board.c` | Wires `rt_hw_board_init()` to `rt_hw_common_setup()` | |
| 85 | +| Memory/GIC/IPI init | `libcpu/aarch64/common/setup.c` | `rt_hw_common_setup()`, `rt_hw_secondary_cpu_up()`, `rt_hw_secondary_cpu_bsp_start()` | |
| 86 | +| C entry skeleton | `src/components.c` | `rtthread_startup()`, `main_thread_entry()` | |
| 87 | + |
| 88 | +## Quick checks when SMP fails to come up |
| 89 | + |
| 90 | +- Device tree: contains `enable-method = "psci"` and QEMU is started with `-machine virt` (PSCI firmware included). |
| 91 | +- `_secondary_cpu_entry` physical address: `rt_kmem_v2p()` must not return 0, otherwise a check fails. |
| 92 | +- Init order: GIC/Timer must be ready before calling `rt_hw_secondary_cpu_up()`; if you fork a custom BSP, do these first. |
| 93 | +- UART logs: look for `Call cpu X on success/failed`; add extra prints in `_secondary_cpu_entry` if needed, and use QEMU `-d cpu_reset -smp N` to debug. |
| 94 | + |
| 95 | +## AArch64 pocket notes (just enough) |
| 96 | + |
| 97 | +- **Exception levels**: startup may be at EL3/EL2; `init_cpu_el` descends to EL1h where the kernel runs. |
| 98 | +- **Two stack pointers**: `spsel #1` selects `SP_EL1` so user mode cannot touch the kernel stack. |
| 99 | +- **MMU bring-up order**: build page tables → configure TCR/TTBR → flush cache/TLB → set `SCTLR_EL1.M/C/I` → `isb`. |
| 100 | +- **MPIDR**: unique core affinity; stored in `rt_cpu_mpidr_table[]` to map logical CPU ids and IPI targets. |
| 101 | + |
| 102 | +With these in place, the QEMU virt64 AArch64 BSP SMP path is clear: the boot CPU prepares memory and shared peripherals, `main_thread_entry()` issues PSCI wakeups, secondary cores land with the same MMU/EL setup, and all CPUs join the scheduler. |
0 commit comments