Introduced with iOS9, the kernel integrity protection aka watchtower aka KPP, posed a new problem to arm64 jailbreaks. Effectively, it was observed that after patching the kernel code - customary in older jailbreaks - the device would silently panic after a while. It soon became obvious that some thing was checking the kernel code. That thing was outside the kernel, and it became evident pretty quickly that there was a kind of hypervisor at work.

To this day, I’m still surprised nobody did a write-up on the KPP – at least none that I know of. So, I will try to explain how the hypervision works. Specific implementation details will maybe come at a later time, provided I have the time for it. Now, without further ado, here we go…

Part 1 (the setup)

The KPP lies in a Mach-O executable file appended right after the compressed kernel chunk, inside the kernelcache img4. The iBoot carves out the KPP image, loads it at 0x4100000000 and runs it in EL3.


EL3

_start(monitor_boot_args *mba)

The structure received upon entry looks like this:

struct monitor_boot_args {
    uint64_t version;
    uint64_t virtBase;
    uint64_t physBase;
    uint64_t memSize;
    struct kernel_boot_args *kernArgs;
    uint64_t kernEntry;
    uint64_t kernPhysBase;
    uint64_t kernPhysSlide;
    uint64_t kernVirtSlide;
};

The KPP overwrites its own Mach-O header at 0x4100000000 with a trampoline which calls _start(NULL) and installs two exception handlers, sync_handler and irq_handler. Recall the AArch64 exception table:

VBAR_ELn Exception Type Description
+0x000 Synchronous Current EL with SP0
+0x080 IRQ/vIRQ  
+0x100 FIQ/vFIQ  
+0x180 SError/vSError  
+0x200 Synchronous Current EL with SPx
+0x280 IRQ/vIRQ  
+0x300 FIQ/vFIQ  
+0x380 SError/vSError  
+0x400 Synchronous Lower EL using AArch64
+0x480 IRQ/vIRQ  
+0x500 FIQ/vFIQ  
+0x580 SError/vSError  
+0x600 Synchronous Lower EL using AArch32
+0x680 IRQ/vIRQ  
+0x700 FIQ/vFIQ  
+0x780 SError/vSError  

ExceptionVector with location of the two handlers.

Next, it parses the kernel and its kexts (from __PRELINK_INFO):

  • save __TEXT, __DATA segments to map list
  • save __TEXT, __DATA::__const zones to hash list

Finally, if enabled – which always is – it sets (among others) the following registers:

CPACR_EL1 = 0x100000;  // CPACR_EL1.FPEN=1, causes instructions in EL0 that use the Floating Point execution to be trapped
CPTR_EL3 = 0x80000000; // CPTR_EL3.TCPAC=1, accesses to CPACR_EL1 will trap from EL2 and EL1 to EL3
SCR_EL3 = 0x631;       // SCR_EL3.IRQ=0, When executing at any Exception level, physical IRQ interrupts are NOT taken to EL3
                       // SCR_EL3.SMD=0, SMC instructions are ENABLED at EL1 and above
                       // SCR_EL3.SIF=1, Secure state instruction fetches from Non-secure memory are NOT permitted

EL1

Kernel starts executing in EL1:

_start() => start_first_cpu() => arm_init():
    => cpu_machine_idle_init() => monitor_call(0x800)
    => machine_startup() => kernel_bootstrap() => kernel_bootstrap_thread() => monitor_call(0x801)
_start_cpu() => arm_init_cpu() => cpu_machine_idle_init() => monitor_call(0x800)

monitor_call() will escalate to EL3 into hypervisor’s sync_handler:


EL3

sync_handler:

if (ESR_EL3 == 0x5E000011) { // ESR_EL3.EC==0x17 && ESR_EL3.IL==1 && ESR_EL3.ISS==0x11 aka "SMC #0x11" aka monitor_call() inside the kernel
    switch (arg0) {
        case 0x800: // called by cpu_machine_idle_init()
            /* save kernel entrypoint */
            return ok;
        case 0x801: // called by kernel_bootstrap_thread()
            if (enabled) {
                if (locked) {
                    FAIL(4);
                }
                /* do lockdown:
                 * hash all regions from hash list
                 * initialize some vars
                 * save SCTLR_EL1, TCR_EL1, TTBR1_EL1, VBAR_EL1
                 */
                ...
                SCR_EL3.SMD=1;
                locked = 1;
            }
            return OK;
        case 0x802: // wtf is this shit?
            FAIL(5);
    }
}
(to be continued)

When something goes wrong, FAIL(code) sets a global variable violated and signals the kernel by:

ESR_EL1 = 0xBF575400 | code // ESR_EL1.EC=0x2F, ESR_EL1.ISV=1, ESR_EL1.IS=0x575400|code
Code Meaning
1 violation in frame
2 bad syscall
3 not locked
4 already locked
5 software request
6 invalid TTE/PTE
7 violation in mapping
8 violation in system register

Execution is then transferred down as SError back into kernel’s ExceptionTable:

SError => fleh_serror() => sleh_serror() => kernel_integrity_error_handler() => panic()

Otherwise, if everything is fine and dandy, execution resumes in the kernel, right after monitor_call().

This was the setup phase, required in order to let the kernel set-up the write-once memory locations. Next, onto the heartbeat phase…

Part 2 (the ticking)

Meanwhile, userland runs code. When a FPU instruction is executed, CPACR_EL1.FPEN==1 causes a trap in kernel.


EL1

In the running kernel, fleh_synchronous() fleh_irq() fleh_fiq() and fleh_serror() all eventually tail to

exception_return_dispatch() => check_user_asts() => MSR CPACR_EL1, X0

When CPACR_EL1 is hit, it is trapped by EL3, as per CPTR_EL3.TCPAC==1 and execution is transferred to EL3 sync_handler.


EL3

This time, it means business.

sync_handler:

(continued)
else if (ESR_EL3 == 0x62340400) { // ESR_EL3.EC==0x18 && ESR_EL3.IL==1 && ESR_EL3.ISS==0x340400 aka trapped by "MSR CPACR_EL1, X0"
    if (violated) {
        FAIL(1);
    }
    if (!locked) {
        FAIL(3);
    }
    if (!(++number_of_hits & watchtower_throttle)) {
        if (!(++flip_flop & 1)) {
            if (hash_is_ready) {
                blake2b_final(&hash, digest);
                if (memcmp(cur->digest, digest, 32)) {
                    FAIL(1);
                }
                cur = get_next_region();
                if (!cur) {
                    cur = get_first_region();
                }
                cur_data_ptr = cur->base;
                cur_data_left = cur->size;
                blake2b_init(&hash);
                hash_is_ready = 0;
            } else {
                chunk = min(cur_data_left, 128);
                blake2b_update(&hash, cur_data_ptr, chunk);
                cur_data_ptr += chunk;
                cur_data_left -= chunk;
                if (!cur_data_left) {
                    hash_is_ready = 1;
                }
            }
        } else {
            /* walk and check TTE/PTE
             * verify map list
             * check system registers SCTLR_EL1, TCR_EL1, TTBR1_EL1, VBAR_EL1
             */
            ...
        }
    }
    ELR_EL3 += 4;         // skip insn
    CPTR_EL3 = 0;         // CPTR_EL3.TCPAC=0, accesses to CPACR_EL1 will not trap from EL2 and EL1 to EL3
    CPACR_EL1 = 0x300000; // CPACR_EL1.FPEN=3, does not cause any FPU instruction to be trapped
    SCR_EL3 = 0x6B3;      // SCR_EL3.IRQ=1, When executing at any Exception level, physical IRQ interrupts are taken to EL3
                          // SCR_EL3.SMD=1, SMC instructions are UNDEFINED at EL1 and above
                          // SCR_EL3.SIF=1, Secure state instruction fetches from Non-secure memory are NOT permitted
    return OK;
}

As you can see, it skips a few beats once in a while (watchtower_throttle==4) to go easy on CPU usage and/or battery. It also slowly but surely crawl over all protected areas in a sequential manner.

If all checks pass, the hypervisor disables FPU trapping (allowing the FPU to finally execute), enables IRQ to EL3 (to make sure it is hit again), resumes kernel right after CPACR_EL1 hit and waits.


EL1

Kernel/userland runs happily. When the next IRQ fires, it is taken to hypervisor’s EL3 IRQ handler.


EL3

irq_handler:

CPACR_EL1 = 0x100000;  // CPACR_EL1.FPEN=1, causes instructions in EL0 that use the Floating Point execution to be trapped
CPTR_EL3 = 0x80000000; // CPTR_EL3.TCPAC=1, accesses to CPACR_EL1 will trap from EL2 and EL1 to EL3
SCR_EL3 = 0x431;       // SCR_EL3.IRQ=0, When executing at any Exception level, physical IRQ interrupts are NOT taken to EL3
                       // SCR_EL3.SMD=0, SMC instructions are ENABLED at EL1 and above
                       // SCR_EL3.SIF=0, Secure state instruction fetches from Non-secure memory are permitted

That is: reset IRQs back to EL1, re-enable FPU trapping, re-enable trap for CPACR_EL1 accesses.

These last 4 steps are then repeated forever.

* * *

In summary: KPP makes sure the FPU always trap and the trap cannot be disabled. When FPU hits (tick) the kernel tries to disable the trapping but is immediately taken to KPP. KPP then runs its checks, frees the FPU, but routes the IRQs to itself. As soon as any IRQ fires (tock) it makes the FPU trap again and de-routes the IRQs.

This is the engine that keeps the hypervisor beating. If you patch away the trigger, that is, CPACR_EL1 access, the FPU can’t execute. However, there is one catch. We can “steal” away the CPACR_EL1 access to a separate trampoline:

  1. undo patches
  2. hit CPACR_EL1, hypervisor runs and restores execution right after our CPACR_EL1
  3. redo patches
  4. profit

This bypass was demo-ed by @qwertyoruiop in yalu102.