From aeb7cd3ca58563d66ac85f5aa7c2c06c686ccca1 Mon Sep 17 00:00:00 2001 From: HeatCrab Date: Sat, 13 Dec 2025 21:13:43 +0800 Subject: [PATCH] Implement kernel stack isolation for U-mode tasks User mode tasks require kernel stack isolation to prevent malicious or corrupted user stack pointers from compromising kernel memory during interrupt handling. Without this protection, a user task could set its stack pointer to an invalid or controlled address, causing the ISR to write trap frames to arbitrary memory locations. This commit implements stack isolation using the mscratch register as a discriminator between machine mode and user mode execution contexts. The ISR entry performs a blind swap with mscratch: for machine mode tasks (mscratch=0), the swap is immediately undone to restore the kernel stack pointer. For user mode tasks (mscratch=kernel_stack), the swap provides the kernel stack while preserving the user stack pointer in mscratch. Each user mode task is allocated a dedicated 512-byte kernel stack to ensure complete isolation between tasks and prevent stack overflow attacks. The task control block is extended to track per-task kernel stack allocations. A global pointer references the current task's kernel stack and is updated during each context switch. The ISR loads this pointer to access the appropriate per-task kernel stack through mscratch, replacing the previous approach of using a single global kernel stack shared by all user mode tasks. The interrupt frame structure is extended to include dedicated storage for the stack pointer. Task initialization zeroes the entire frame and correctly sets the initial stack pointer to support the new restoration path. For user mode tasks, the initial ISR frame is constructed on the kernel stack rather than the user stack, ensuring the frame is protected from user manipulation. Enumeration constants replace magic number usage for improved code clarity and consistency. The ISR implementation now includes separate entry and restoration paths for each privilege mode. The M-mode path maintains mscratch=0 throughout execution. The U-mode path saves the user stack pointer from mscratch immediately after frame allocation and restores mscratch to the current task's kernel stack address before returning to user mode, enabling the next trap to use the correct per-task kernel stack. Task initialization was updated to configure mscratch appropriately during the first dispatch. The dispatcher checks the current privilege level and sets mscratch to zero for machine mode tasks. For user mode tasks, it loads the current task's kernel stack pointer if available, with a fallback to the global kernel stack for initial dispatch before the first task switch. The main scheduler initialization ensures the first task's kernel stack pointer is set before entering the scheduling loop. The user mode output system call was modified to bypass the asynchronous logger queue and implement task-level synchronization. Direct output ensures strict FIFO ordering for test output clarity, while preventing task preemption during character transmission avoids interleaving when multiple user tasks print concurrently. This ensures each string is output atomically with respect to other tasks. A test helper function was added to support stack pointer manipulation during validation. Following the Linux kernel's context switching pattern, this provides precise control over stack operations without compiler interference. The validation harness uses this to verify syscall stability under corrupted stack pointer conditions. Documentation updates include the calling convention guide's stack layout section, which now distinguishes between machine mode and user mode task stack organization with detailed diagrams of the dual-stack design. The context switching guide's task initialization section reflects the updated function signature for building initial interrupt frames with per-task kernel stack parameters. Testing validates that system calls succeed even when invoked with a malicious stack pointer (0xDEADBEEF), confirming the ISR correctly uses the per-task kernel stack from mscratch rather than the user-controlled stack pointer. --- Documentation/hal-calling-convention.md | 55 ++++++- Documentation/hal-riscv-context-switch.md | 34 +++- app/umode.c | 53 ++++-- arch/riscv/boot.c | 189 ++++++++++++++++++---- arch/riscv/entry.c | 24 +++ arch/riscv/hal.c | 163 +++++++++++++------ arch/riscv/hal.h | 18 ++- include/sys/task.h | 4 + kernel/main.c | 5 + kernel/syscall.c | 17 +- kernel/task.c | 33 +++- 11 files changed, 486 insertions(+), 109 deletions(-) diff --git a/Documentation/hal-calling-convention.md b/Documentation/hal-calling-convention.md index 6df5017f..a8ed9d30 100644 --- a/Documentation/hal-calling-convention.md +++ b/Documentation/hal-calling-convention.md @@ -109,14 +109,14 @@ void hal_context_restore(jmp_buf env, int32_t val); /* Restore context + process The ISR in `boot.c` performs a complete context save of all registers: ``` -Stack Frame Layout (144 bytes, 33 words × 4 bytes, offsets from sp): +Stack Frame Layout (144 bytes, 36 words × 4 bytes, offsets from sp): 0: ra, 4: gp, 8: tp, 12: t0, 16: t1, 20: t2 24: s0, 28: s1, 32: a0, 36: a1, 40: a2, 44: a3 48: a4, 52: a5, 56: a6, 60: a7, 64: s2, 68: s3 72: s4, 76: s5, 80: s6, 84: s7, 88: s8, 92: s9 96: s10, 100:s11, 104:t3, 108: t4, 112: t5, 116: t6 -120: mcause, 124: mepc, 128: mstatus -132-143: padding (12 bytes for 16-byte alignment) +120: mcause, 124: mepc, 128: mstatus, 132: sp (for restore) +136-143: padding (8 bytes for 16-byte alignment) ``` Why full context save in ISR? @@ -127,12 +127,14 @@ Why full context save in ISR? ### ISR Stack Requirements -Each task stack must reserve space for the ISR frame: +Each task requires space for the ISR frame: ```c -#define ISR_STACK_FRAME_SIZE 144 /* 33 words × 4 bytes, 16-byte aligned */ +#define ISR_STACK_FRAME_SIZE 144 /* 36 words × 4 bytes, 16-byte aligned */ ``` -This "red zone" is reserved at the top of every task stack to guarantee ISR safety. +**M-mode tasks**: This "red zone" is reserved at the top of the task stack to guarantee ISR safety. + +**U-mode tasks**: The ISR frame is allocated on the per-task kernel stack (512 bytes), not on the user stack. This provides stack isolation and prevents user tasks from corrupting kernel trap handling state. ## Function Calling in Linmo @@ -181,7 +183,9 @@ void task_function(void) { ### Stack Layout -Each task has its own stack with this layout: +#### Machine Mode Tasks + +Each M-mode task has its own stack with this layout: ``` High Address @@ -197,6 +201,43 @@ High Address Low Address ``` +#### User Mode Tasks (Per-Task Kernel Stack) + +U-mode tasks maintain separate user and kernel stacks for isolation: + +**User Stack** (application execution): +``` +High Address ++------------------+ <- user_stack_base + user_stack_size +| | +| User Stack | <- Grows downward +| (Dynamic) | <- Task executes here in U-mode +| | ++------------------+ <- user_stack_base +Low Address +``` + +**Kernel Stack** (trap handling): +``` +High Address ++------------------+ <- kernel_stack_base + kernel_stack_size (512 bytes) +| ISR Frame | <- 144 bytes for trap context +| (144 bytes) | <- Traps switch to this stack ++------------------+ +| Trap Handler | <- Kernel code execution during traps +| Stack Space | ++------------------+ <- kernel_stack_base +Low Address +``` + +When a U-mode task enters a trap (syscall, interrupt, exception): +1. ISR swaps SP with `mscratch` via `csrrw` (mscratch contains kernel stack top) +2. ISR saves full context to kernel stack +3. Trap handler executes on kernel stack +4. Return path restores user SP and switches back + +This dual-stack design prevents user tasks from corrupting kernel state and provides strong isolation between privilege levels. + ### Stack Alignment - 16-byte alignment: Required by RISC-V ABI for stack pointer - 4-byte alignment: Minimum for all memory accesses on RV32I diff --git a/Documentation/hal-riscv-context-switch.md b/Documentation/hal-riscv-context-switch.md index f66a41f4..d274bae8 100644 --- a/Documentation/hal-riscv-context-switch.md +++ b/Documentation/hal-riscv-context-switch.md @@ -123,14 +123,26 @@ a complete interrupt service routine frame: ```c void *hal_build_initial_frame(void *stack_top, void (*task_entry)(void), - int user_mode) + int user_mode, + void *kernel_stack, + size_t kernel_stack_size) { - /* Place frame in stack with initial reserve below for proper startup */ - uint32_t *frame = (uint32_t *) ((uint8_t *) stack_top - 256 - - ISR_STACK_FRAME_SIZE); + /* For U-mode tasks, build frame on kernel stack for stack isolation. + * For M-mode tasks, build frame on user stack as before. + */ + uint32_t *frame; + if (user_mode && kernel_stack) { + /* U-mode: Place frame on per-task kernel stack */ + void *kstack_top = (uint8_t *) kernel_stack + kernel_stack_size; + frame = (uint32_t *) ((uint8_t *) kstack_top - ISR_STACK_FRAME_SIZE); + } else { + /* M-mode: Place frame on user stack with reserve below */ + frame = (uint32_t *) ((uint8_t *) stack_top - 256 - + ISR_STACK_FRAME_SIZE); + } /* Initialize all general purpose registers to zero */ - for (int i = 0; i < 32; i++) + for (int i = 0; i < 36; i++) frame[i] = 0; /* Compute thread pointer: aligned to 64 bytes from _end */ @@ -152,6 +164,18 @@ void *hal_build_initial_frame(void *stack_top, /* Set entry point */ frame[FRAME_EPC] = (uint32_t) task_entry; + /* SP value for when ISR returns (stored in frame[33]). + * For U-mode: Set to user stack top. + * For M-mode: Set to frame + ISR_STACK_FRAME_SIZE. + */ + if (user_mode && kernel_stack) { + /* U-mode: frame[33] should contain user SP */ + frame[FRAME_SP] = (uint32_t) ((uint8_t *) stack_top - 256); + } else { + /* M-mode: frame[33] contains kernel SP after frame deallocation */ + frame[FRAME_SP] = (uint32_t) ((uint8_t *) frame + ISR_STACK_FRAME_SIZE); + } + return frame; /* Return frame base as initial stack pointer */ } ``` diff --git a/app/umode.c b/app/umode.c index 518e111d..a406b90f 100644 --- a/app/umode.c +++ b/app/umode.c @@ -1,43 +1,72 @@ #include -/* U-mode Validation Task +/* Architecture-specific helper for SP manipulation testing. + * Implemented in arch/riscv/entry.c as a naked function. + */ +extern uint32_t __switch_sp(uint32_t new_sp); + +/* U-mode validation: syscall stability and privilege isolation. * - * Integrates two tests into a single task flow to ensure sequential execution: - * 1. Phase 1: Mechanism Check - Verify syscalls work. - * 2. Phase 2: Security Check - Verify privileged instructions trigger a trap. + * Phase 1: Verify syscalls work under various SP conditions (normal, + * malicious). Phase 2: Verify privileged instructions trap. */ void umode_validation_task(void) { - /* --- Phase 1: Mechanism Check (Syscalls) --- */ - umode_printf("[umode] Phase 1: Testing Syscall Mechanism\n"); + /* --- Phase 1: Kernel Stack Isolation Test --- */ + umode_printf("[umode] Phase 1: Testing Kernel Stack Isolation\n"); + umode_printf("\n"); - /* Test 1: sys_tid() - Simplest read-only syscall. */ + /* Test 1a: Baseline - Syscall with normal SP */ + umode_printf("[umode] Test 1a: sys_tid() with normal SP\n"); int my_tid = sys_tid(); if (my_tid > 0) { umode_printf("[umode] PASS: sys_tid() returned %d\n", my_tid); } else { umode_printf("[umode] FAIL: sys_tid() failed (ret=%d)\n", my_tid); } + umode_printf("\n"); + + /* Test 1b: Verify ISR uses mscratch, not malicious user SP */ + umode_printf("[umode] Test 1b: sys_tid() with malicious SP\n"); + + uint32_t saved_sp = __switch_sp(0xDEADBEEF); + int my_tid_bad_sp = sys_tid(); + __switch_sp(saved_sp); + + if (my_tid_bad_sp > 0) { + umode_printf( + "[umode] PASS: sys_tid() succeeded, ISR correctly used kernel " + "stack\n"); + } else { + umode_printf( + "[umode] FAIL: Syscall failed with malicious SP (ret=%d)\n", + my_tid_bad_sp); + } + umode_printf("\n"); - /* Test 2: sys_uptime() - Verify value transmission is correct. */ + /* Test 1c: Verify syscall functionality is still intact */ + umode_printf("[umode] Test 1c: sys_uptime() with normal SP\n"); int uptime = sys_uptime(); if (uptime >= 0) { umode_printf("[umode] PASS: sys_uptime() returned %d\n", uptime); } else { umode_printf("[umode] FAIL: sys_uptime() failed (ret=%d)\n", uptime); } + umode_printf("\n"); - /* Note: Skipping sys_tadd for now, as kernel user pointer checks might - * block function pointers in the .text segment, avoiding distraction. - */ + umode_printf( + "[umode] Phase 1 Complete: Kernel stack isolation validated\n"); + umode_printf("\n"); /* --- Phase 2: Security Check (Privileged Access) --- */ umode_printf("[umode] ========================================\n"); + umode_printf("\n"); umode_printf("[umode] Phase 2: Testing Security Isolation\n"); + umode_printf("\n"); umode_printf( "[umode] Action: Attempting to read 'mstatus' CSR from U-mode.\n"); umode_printf("[umode] Expect: Kernel Panic with 'Illegal instruction'.\n"); - umode_printf("[umode] ========================================\n"); + umode_printf("\n"); /* CRITICAL: Delay before suicide to ensure logs are flushed from * buffer to UART. diff --git a/arch/riscv/boot.c b/arch/riscv/boot.c index 8e46f4c9..1c260aea 100644 --- a/arch/riscv/boot.c +++ b/arch/riscv/boot.c @@ -19,6 +19,9 @@ void main(void); void do_trap(uint32_t cause, uint32_t epc); void hal_panic(void); +/* Current task's kernel stack top (set by dispatcher, NULL for M-mode tasks) */ +extern void *current_kernel_stack_top; + /* Machine-mode entry point ('_entry'). This is the first code executed on * reset. It performs essential low-level setup of the processor state, * initializes memory, and then jumps to the C-level main function. @@ -93,22 +96,29 @@ __attribute__((naked, section(".text.prologue"))) void _entry(void) : "memory"); } -/* Size of the full trap context frame saved on the stack by the ISR. - * 30 GPRs (x1, x3-x31) + mcause + mepc + mstatus = 33 words * 4 bytes = 132 - * bytes. Round up to 144 bytes for 16-byte alignment. +/* ISR trap frame layout (144 bytes = 36 words). + * [0-29]: GPRs (ra, gp, tp, t0-t6, s0-s11, a0-a7) + * [30]: mcause + * [31]: mepc + * [32]: mstatus + * [33]: SP (user SP in U-mode, original SP in M-mode) */ #define ISR_CONTEXT_SIZE 144 -/* Low-level Interrupt Service Routine (ISR) trampoline. - * - * This is the common entry point for all traps. It performs a FULL context - * save, creating a complete trap frame on the stack. This makes the C handler - * robust, as it does not need to preserve any registers itself. - */ +/* Low-level ISR common entry for all traps with full context save */ __attribute__((naked, aligned(4))) void _isr(void) { asm volatile( - /* Allocate stack frame for full context save */ + /* Blind swap with mscratch for kernel stack isolation. + * Convention: M-mode (mscratch=0, SP=kernel), U-mode (mscratch=kernel, + * SP=user). After swap: if SP != 0 came from U-mode, else M-mode. + */ + "csrrw sp, mscratch, sp\n" + "bnez sp, .Lumode_entry\n" + + /* Undo swap and continue for M-mode */ + "csrrw sp, mscratch, sp\n" + "addi sp, sp, -%0\n" /* Save all general-purpose registers except x0 (zero) and x2 (sp). @@ -120,7 +130,7 @@ __attribute__((naked, aligned(4))) void _isr(void) * 48: a4, 52: a5, 56: a6, 60: a7, 64: s2, 68: s3 * 72: s4, 76: s5, 80: s6, 84: s7, 88: s8, 92: s9 * 96: s10, 100:s11, 104:t3, 108: t4, 112: t5, 116: t6 - * 120: mcause, 124: mepc + * 120: mcause, 124: mepc, 128: mstatus, 132: SP */ "sw ra, 0*4(sp)\n" "sw gp, 1*4(sp)\n" @@ -153,33 +163,158 @@ __attribute__((naked, aligned(4))) void _isr(void) "sw t5, 28*4(sp)\n" "sw t6, 29*4(sp)\n" - /* Save trap-related CSRs and prepare arguments for do_trap */ + /* Save original SP before frame allocation */ + "addi t0, sp, %0\n" + "sw t0, 33*4(sp)\n" + + /* Save machine CSRs (mcause, mepc, mstatus) */ "csrr a0, mcause\n" "csrr a1, mepc\n" - "csrr a2, mstatus\n" /* For context switching in privilege change */ - + "csrr a2, mstatus\n" "sw a0, 30*4(sp)\n" "sw a1, 31*4(sp)\n" "sw a2, 32*4(sp)\n" - "mv a2, sp\n" /* a2 = isr_sp */ - - /* Call the high-level C trap handler. - * Returns: a0 = SP to use for restoring context (may be different - * task's stack if context switch occurred). - */ + /* Call trap handler with frame pointer */ + "mv a2, sp\n" "call do_trap\n" + "mv sp, a0\n" + + /* Load mstatus and extract MPP to determine M-mode or U-mode return + path */ + "lw t0, 32*4(sp)\n" + "csrw mstatus, t0\n" + + "srli t1, t0, 11\n" + "andi t1, t1, 0x3\n" + "beqz t1, .Lrestore_umode\n" + + /* M-mode restore */ + ".Lrestore_mmode:\n" + "csrw mscratch, zero\n" + + "lw t1, 31*4(sp)\n" /* Restore mepc */ + "csrw mepc, t1\n" + + /* Restore all GPRs */ + "lw ra, 0*4(sp)\n" + "lw gp, 1*4(sp)\n" + "lw tp, 2*4(sp)\n" + "lw t0, 3*4(sp)\n" + "lw t1, 4*4(sp)\n" + "lw t2, 5*4(sp)\n" + "lw s0, 6*4(sp)\n" + "lw s1, 7*4(sp)\n" + "lw a0, 8*4(sp)\n" + "lw a1, 9*4(sp)\n" + "lw a2, 10*4(sp)\n" + "lw a3, 11*4(sp)\n" + "lw a4, 12*4(sp)\n" + "lw a5, 13*4(sp)\n" + "lw a6, 14*4(sp)\n" + "lw a7, 15*4(sp)\n" + "lw s2, 16*4(sp)\n" + "lw s3, 17*4(sp)\n" + "lw s4, 18*4(sp)\n" + "lw s5, 19*4(sp)\n" + "lw s6, 20*4(sp)\n" + "lw s7, 21*4(sp)\n" + "lw s8, 22*4(sp)\n" + "lw s9, 23*4(sp)\n" + "lw s10, 24*4(sp)\n" + "lw s11, 25*4(sp)\n" + "lw t3, 26*4(sp)\n" + "lw t4, 27*4(sp)\n" + "lw t5, 28*4(sp)\n" + "lw t6, 29*4(sp)\n" + + /* Restore SP from frame[33] */ + "lw sp, 33*4(sp)\n" - /* Use returned SP for context restore (enables context switching) */ + /* Return from trap */ + "mret\n" + + /* U-mode entry receives kernel stack in SP and user SP in mscratch */ + ".Lumode_entry:\n" + "addi sp, sp, -%0\n" + + /* Save t6 first to preserve it before using it as scratch */ + "sw t6, 29*4(sp)\n" + + /* Retrieve user SP from mscratch into t6 and save it */ + "csrr t6, mscratch\n" + "sw t6, 33*4(sp)\n" + + /* Save remaining GPRs */ + "sw ra, 0*4(sp)\n" + "sw gp, 1*4(sp)\n" + "sw tp, 2*4(sp)\n" + "sw t0, 3*4(sp)\n" + "sw t1, 4*4(sp)\n" + "sw t2, 5*4(sp)\n" + "sw s0, 6*4(sp)\n" + "sw s1, 7*4(sp)\n" + "sw a0, 8*4(sp)\n" + "sw a1, 9*4(sp)\n" + "sw a2, 10*4(sp)\n" + "sw a3, 11*4(sp)\n" + "sw a4, 12*4(sp)\n" + "sw a5, 13*4(sp)\n" + "sw a6, 14*4(sp)\n" + "sw a7, 15*4(sp)\n" + "sw s2, 16*4(sp)\n" + "sw s3, 17*4(sp)\n" + "sw s4, 18*4(sp)\n" + "sw s5, 19*4(sp)\n" + "sw s6, 20*4(sp)\n" + "sw s7, 21*4(sp)\n" + "sw s8, 22*4(sp)\n" + "sw s9, 23*4(sp)\n" + "sw s10, 24*4(sp)\n" + "sw s11, 25*4(sp)\n" + "sw t3, 26*4(sp)\n" + "sw t4, 27*4(sp)\n" + "sw t5, 28*4(sp)\n" + /* t6 already saved */ + + /* Save CSRs */ + "csrr a0, mcause\n" + "csrr a1, mepc\n" + "csrr a2, mstatus\n" + "sw a0, 30*4(sp)\n" + "sw a1, 31*4(sp)\n" + "sw a2, 32*4(sp)\n" + + "mv a2, sp\n" /* a2 = ISR frame pointer */ + "call do_trap\n" "mv sp, a0\n" - /* Restore mstatus from frame[32] */ + /* Check MPP in mstatus to determine return path */ "lw t0, 32*4(sp)\n" "csrw mstatus, t0\n" - /* Restore mepc from frame[31] (might have been modified by handler) */ + "srli t1, t0, 11\n" + "andi t1, t1, 0x3\n" + "bnez t1, .Lrestore_mmode\n" + + /* Setup mscratch for U-mode restore to prepare for next trap */ + ".Lrestore_umode:\n" "lw t1, 31*4(sp)\n" "csrw mepc, t1\n" + + /* Setup mscratch = kernel stack for next trap entry. + * U-mode convention: mscratch holds kernel stack, SP holds user stack. + * On next trap, csrrw will swap them: SP gets kernel, mscratch gets + * user. Load current task's kernel stack top (set by dispatcher). + */ + "la t0, current_kernel_stack_top\n" + "lw t0, 0(t0)\n" /* t0 = *current_kernel_stack_top */ + "bnez t0, 1f\n" /* If non-NULL, use it */ + "la t0, _stack\n" /* Fallback to global stack if NULL */ + "1:\n" + "csrw mscratch, t0\n" + + /* Restore all GPRs */ "lw ra, 0*4(sp)\n" "lw gp, 1*4(sp)\n" "lw tp, 2*4(sp)\n" @@ -211,12 +346,12 @@ __attribute__((naked, aligned(4))) void _isr(void) "lw t5, 28*4(sp)\n" "lw t6, 29*4(sp)\n" - /* Deallocate stack frame */ - "addi sp, sp, %0\n" + /* Restore user SP from frame[33] */ + "lw sp, 33*4(sp)\n" /* Return from trap */ "mret\n" - : /* no outputs */ - : "i"(ISR_CONTEXT_SIZE) /* +16 for mcause, mepc, mstatus */ + : /* no outputs */ + : "i"(ISR_CONTEXT_SIZE) : "memory"); } diff --git a/arch/riscv/entry.c b/arch/riscv/entry.c index 9956558e..da9a53b4 100644 --- a/arch/riscv/entry.c +++ b/arch/riscv/entry.c @@ -15,6 +15,7 @@ */ #include +#include /* Architecture-specific syscall implementation using ecall trap. * This overrides the weak symbol defined in kernel/syscall.c. @@ -40,3 +41,26 @@ int syscall(int num, void *arg1, void *arg2, void *arg3) return a0; } + +/* Stack Pointer Swap for Testing + * + * This naked function provides atomic SP swapping for kernel validation tests. + * Using __attribute__((naked)) ensures the compiler does not generate any + * prologue/epilogue code that would use the stack, and prevents instruction + * reordering that could break the swap semantics. + * + * Inspired by Linux kernel's __switch_to for context switching. + */ + +/* Atomically swap the stack pointer with a new value. + * @new_sp: New stack pointer value to install (in a0) + * @return: Previous stack pointer value (in a0) + */ +__attribute__((naked)) uint32_t __switch_sp(uint32_t new_sp) +{ + asm volatile( + "mv t0, sp \n" /* Save current SP to temporary */ + "mv sp, a0 \n" /* Install new SP from argument */ + "mv a0, t0 \n" /* Return old SP in a0 */ + "ret \n"); +} diff --git a/arch/riscv/hal.c b/arch/riscv/hal.c index 7ad5806f..e2dde667 100644 --- a/arch/riscv/hal.c +++ b/arch/riscv/hal.c @@ -35,7 +35,7 @@ #define CONTEXT_MSTATUS 16 /* Machine Status CSR */ /* Defines the size of the full trap frame saved by the ISR in 'boot.c'. - * The _isr routine saves 33 words (30 GPRs + mcause + mepc + mstatus), + * The _isr routine saves 34 words (30 GPRs + mcause + mepc + mstatus + SP), * resulting in a 144-byte frame with alignment padding. This space MUST be * reserved at the top of every task's stack (as a "red zone") to guarantee * that an interrupt, even at peak stack usage, will not corrupt memory @@ -48,39 +48,40 @@ * Indices are in word offsets (divide byte offset by 4). */ enum { - FRAME_RA = 0, /* x1 - Return Address */ - FRAME_GP = 1, /* x3 - Global Pointer */ - FRAME_TP = 2, /* x4 - Thread Pointer */ - FRAME_T0 = 3, /* x5 - Temporary register 0 */ - FRAME_T1 = 4, /* x6 - Temporary register 1 */ - FRAME_T2 = 5, /* x7 - Temporary register 2 */ - FRAME_S0 = 6, /* x8 - Saved register 0 / Frame Pointer */ - FRAME_S1 = 7, /* x9 - Saved register 1 */ - FRAME_A0 = 8, /* x10 - Argument/Return 0 */ - FRAME_A1 = 9, /* x11 - Argument/Return 1 */ - FRAME_A2 = 10, /* x12 - Argument 2 */ - FRAME_A3 = 11, /* x13 - Argument 3 */ - FRAME_A4 = 12, /* x14 - Argument 4 */ - FRAME_A5 = 13, /* x15 - Argument 5 */ - FRAME_A6 = 14, /* x16 - Argument 6 */ - FRAME_A7 = 15, /* x17 - Argument 7 / Syscall Number */ - FRAME_S2 = 16, /* x18 - Saved register 2 */ - FRAME_S3 = 17, /* x19 - Saved register 3 */ - FRAME_S4 = 18, /* x20 - Saved register 4 */ - FRAME_S5 = 19, /* x21 - Saved register 5 */ - FRAME_S6 = 20, /* x22 - Saved register 6 */ - FRAME_S7 = 21, /* x23 - Saved register 7 */ - FRAME_S8 = 22, /* x24 - Saved register 8 */ - FRAME_S9 = 23, /* x25 - Saved register 9 */ - FRAME_S10 = 24, /* x26 - Saved register 10 */ - FRAME_S11 = 25, /* x27 - Saved register 11 */ - FRAME_T3 = 26, /* x28 - Temporary register 3 */ - FRAME_T4 = 27, /* x29 - Temporary register 4 */ - FRAME_T5 = 28, /* x30 - Temporary register 5 */ - FRAME_T6 = 29, /* x31 - Temporary register 6 */ - FRAME_MCAUSE = 30, /* Machine Cause CSR */ - FRAME_EPC = 31, /* Machine Exception PC (mepc) */ - FRAME_MSTATUS = 32 /* Machine Status CSR */ + FRAME_RA = 0, /* x1 - Return Address */ + FRAME_GP = 1, /* x3 - Global Pointer */ + FRAME_TP = 2, /* x4 - Thread Pointer */ + FRAME_T0 = 3, /* x5 - Temporary register 0 */ + FRAME_T1 = 4, /* x6 - Temporary register 1 */ + FRAME_T2 = 5, /* x7 - Temporary register 2 */ + FRAME_S0 = 6, /* x8 - Saved register 0 / Frame Pointer */ + FRAME_S1 = 7, /* x9 - Saved register 1 */ + FRAME_A0 = 8, /* x10 - Argument/Return 0 */ + FRAME_A1 = 9, /* x11 - Argument/Return 1 */ + FRAME_A2 = 10, /* x12 - Argument 2 */ + FRAME_A3 = 11, /* x13 - Argument 3 */ + FRAME_A4 = 12, /* x14 - Argument 4 */ + FRAME_A5 = 13, /* x15 - Argument 5 */ + FRAME_A6 = 14, /* x16 - Argument 6 */ + FRAME_A7 = 15, /* x17 - Argument 7 / Syscall Number */ + FRAME_S2 = 16, /* x18 - Saved register 2 */ + FRAME_S3 = 17, /* x19 - Saved register 3 */ + FRAME_S4 = 18, /* x20 - Saved register 4 */ + FRAME_S5 = 19, /* x21 - Saved register 5 */ + FRAME_S6 = 20, /* x22 - Saved register 6 */ + FRAME_S7 = 21, /* x23 - Saved register 7 */ + FRAME_S8 = 22, /* x24 - Saved register 8 */ + FRAME_S9 = 23, /* x25 - Saved register 9 */ + FRAME_S10 = 24, /* x26 - Saved register 10 */ + FRAME_S11 = 25, /* x27 - Saved register 11 */ + FRAME_T3 = 26, /* x28 - Temporary register 3 */ + FRAME_T4 = 27, /* x29 - Temporary register 4 */ + FRAME_T5 = 28, /* x30 - Temporary register 5 */ + FRAME_T6 = 29, /* x31 - Temporary register 6 */ + FRAME_MCAUSE = 30, /* Machine Cause CSR */ + FRAME_EPC = 31, /* Machine Exception PC (mepc) */ + FRAME_MSTATUS = 32, /* Machine Status CSR */ + FRAME_SP = 33 /* Stack Pointer (saved for restore) */ }; /* Global variable to hold the new stack pointer for pending context switch. @@ -96,6 +97,15 @@ static void *pending_switch_sp = NULL; */ static uint32_t current_isr_frame_sp = 0; + +/* Current task's kernel stack top address for U-mode trap entry. + * For U-mode tasks: points to (kernel_stack + kernel_stack_size). + * For M-mode tasks: NULL (uses global _stack). + * Updated by dispatcher during context switches. + * The ISR restore path loads this into mscratch before mret. + */ +void *current_kernel_stack_top = NULL; + /* NS16550A UART0 - Memory-mapped registers for the QEMU 'virt' machine's serial * port. */ @@ -495,33 +505,38 @@ extern uint32_t _gp, _end; */ void *hal_build_initial_frame(void *stack_top, void (*task_entry)(void), - int user_mode) + int user_mode, + void *kernel_stack, + size_t kernel_stack_size) { #define INITIAL_STACK_RESERVE \ 256 /* Reserve space below stack_top for task startup */ - /* Place frame deeper in stack so after ISR deallocates (sp += 128), - * SP will be at (stack_top - INITIAL_STACK_RESERVE), not at stack_top. + /* For U-mode tasks, build frame on kernel stack instead of user stack. + * For M-mode tasks, build frame on user stack as before. */ - uint32_t *frame = - (uint32_t *) ((uint8_t *) stack_top - INITIAL_STACK_RESERVE - - ISR_STACK_FRAME_SIZE); + uint32_t *frame; + if (user_mode && kernel_stack) { + /* U-mode: Place frame on kernel stack */ + void *kstack_top = (uint8_t *) kernel_stack + kernel_stack_size; + frame = (uint32_t *) ((uint8_t *) kstack_top - ISR_STACK_FRAME_SIZE); + } else { + /* M-mode: Place frame on user stack with reserve space */ + frame = (uint32_t *) ((uint8_t *) stack_top - INITIAL_STACK_RESERVE - + ISR_STACK_FRAME_SIZE); + } /* Zero out entire frame */ - for (int i = 0; i < 32; i++) { + for (int i = 0; i < 36; i++) { frame[i] = 0; } /* Compute tp value same as boot.c: aligned to 64 bytes from _end */ uint32_t tp_val = ((uint32_t) &_end + 63) & ~63U; - /* Initialize critical registers for proper task startup: - * - frame[1] = gp: Global pointer, required for accessing global variables - * - frame[2] = tp: Thread pointer, required for thread-local storage - * - frame[32] = mepc: Task entry point, where mret will jump to - */ - frame[1] = (uint32_t) &_gp; /* gp - global pointer */ - frame[2] = tp_val; /* tp - thread pointer */ + /* Initialize critical registers for proper task startup */ + frame[FRAME_GP] = (uint32_t) &_gp; /* gp - global pointer */ + frame[FRAME_TP] = tp_val; /* tp - thread pointer */ /* Initialize mstatus for new task: * - MPIE=1: mret will copy this to MIE, enabling interrupts after task @@ -535,6 +550,19 @@ void *hal_build_initial_frame(void *stack_top, frame[FRAME_EPC] = (uint32_t) task_entry; /* mepc - entry point */ + /* SP value for when ISR returns (frame[33] will hold this value). + * For U-mode: Set to user stack top (will be saved to frame[33] in ISR). + * For M-mode: Set to frame + ISR_STACK_FRAME_SIZE as before. + */ + if (user_mode && kernel_stack) { + /* U-mode: frame[33] should contain user SP */ + frame[FRAME_SP] = + (uint32_t) ((uint8_t *) stack_top - INITIAL_STACK_RESERVE); + } else { + /* M-mode: frame[33] contains kernel SP after frame deallocation */ + frame[FRAME_SP] = (uint32_t) ((uint8_t *) frame + ISR_STACK_FRAME_SIZE); + } + return (void *) frame; } @@ -744,6 +772,21 @@ void hal_switch_stack(void **old_sp, void *new_sp) pending_switch_sp = new_sp; } +/* Updates the kernel stack top for the current task. + * Called by dispatcher during context switch to set up mscratch for next trap. + */ +void hal_set_kernel_stack(void *kernel_stack, size_t kernel_stack_size) +{ + if (kernel_stack && kernel_stack_size > 0) { + /* U-mode task: point to top of per-task kernel stack */ + current_kernel_stack_top = + (void *) ((uint8_t *) kernel_stack + kernel_stack_size); + } else { + /* M-mode task: NULL signals to use global _stack */ + current_kernel_stack_top = NULL; + } +} + /* Enable interrupts on first run of a task. * Checks if task's return address still points to entry (meaning it hasn't * run yet), and if so, enables global interrupts. @@ -811,7 +854,29 @@ static void __attribute__((naked, used)) __dispatch_init_isr(void) "lw t0, 32*4(sp)\n" "csrw mstatus, t0\n" - /* Restore mepc from frame[31] */ + /* Initialize mscratch based on MPP field in mstatus. + * For M-mode set mscratch to zero, for U-mode set to kernel stack. + * ISR uses this to detect privilege mode via blind swap. + */ + "srli t2, t0, 11\n" + "andi t2, t2, 0x3\n" + "bnez t2, .Ldispatch_mmode\n" + + /* U-mode path */ + "la t2, current_kernel_stack_top\n" + "lw t2, 0(t2)\n" + "bnez t2, .Ldispatch_umode_ok\n" + "la t2, _stack\n" + ".Ldispatch_umode_ok:\n" + "csrw mscratch, t2\n" + "j .Ldispatch_done\n" + + /* M-mode path */ + ".Ldispatch_mmode:\n" + "csrw mscratch, zero\n" + ".Ldispatch_done:\n" + + /* Restore mepc */ "lw t1, 31*4(sp)\n" "csrw mepc, t1\n" diff --git a/arch/riscv/hal.h b/arch/riscv/hal.h index 7946a0fe..a665d63f 100644 --- a/arch/riscv/hal.h +++ b/arch/riscv/hal.h @@ -3,9 +3,10 @@ #include /* Symbols from the linker script, defining memory boundaries */ -extern uint32_t _stack_start, _stack_end; /* Start/end of the STACK memory */ -extern uint32_t _heap_start, _heap_end; /* Start/end of the HEAP memory */ -extern uint32_t _heap_size; /* Size of HEAP memory */ +extern uint32_t _gp; /* Global pointer initialized at reset */ +extern uint32_t _stack; /* Kernel stack top for ISR and boot */ +extern uint32_t _heap_start, _heap_end; /* Start/end of the HEAP memory */ +extern uint32_t _heap_size; /* Size of HEAP memory */ extern uint32_t _sidata; /* Start address for .data initialization */ extern uint32_t _sdata, _edata; /* Start/end address for .data section */ extern uint32_t _sbss, _ebss; /* Start/end address for .bss section */ @@ -88,6 +89,13 @@ void hal_dispatch_init(void *ctx); */ void hal_switch_stack(void **old_sp, void *new_sp); +/* Updates the kernel stack top for the current task. + * Called by dispatcher during context switch to set up mscratch for next trap. + * @kernel_stack: Base address of task's kernel stack (NULL for M-mode tasks) + * @kernel_stack_size: Size of kernel stack in bytes (0 for M-mode tasks) + */ +void hal_set_kernel_stack(void *kernel_stack, size_t kernel_stack_size); + /* Provides a blocking, busy-wait delay. * This function monopolizes the CPU and should only be used for very short * delays or in pre-scheduling initialization code. @@ -112,7 +120,9 @@ void hal_interrupt_tick(void); /* Enable interrupts on first task run */ void *hal_build_initial_frame( void *stack_top, void (*task_entry)(void), - int user_mode); /* Build ISR frame for preemptive mode */ + int user_mode, + void *kernel_stack, + size_t kernel_stack_size); /* Build ISR frame for preemptive mode */ /* Initializes the context structure for a new task. * @ctx : Pointer to jmp_buf to initialize (must be non-NULL). diff --git a/include/sys/task.h b/include/sys/task.h index ccf5f4fa..4ccc159d 100644 --- a/include/sys/task.h +++ b/include/sys/task.h @@ -72,6 +72,10 @@ typedef struct tcb { size_t stack_sz; /* Total size of the stack in bytes */ void (*entry)(void); /* Task's entry point function */ + /* Kernel Stack for U-mode Tasks */ + void *kernel_stack; /* Base address of kernel stack (NULL for M-mode) */ + size_t kernel_stack_size; /* Size of kernel stack in bytes (0 for M-mode) */ + /* Scheduling Parameters */ uint16_t prio; /* Encoded priority (base and time slice counter) */ uint8_t prio_level; /* Priority level (0-7, 0 = highest) */ diff --git a/kernel/main.c b/kernel/main.c index 0015dca7..b7eaa2e4 100644 --- a/kernel/main.c +++ b/kernel/main.c @@ -72,6 +72,11 @@ int32_t main(void) */ scheduler_started = true; + /* Initialize kernel stack for first task */ + if (kcb->preemptive) + hal_set_kernel_stack(first_task->kernel_stack, + first_task->kernel_stack_size); + /* In preemptive mode, tasks are managed via ISR frames (sp). * In cooperative mode, tasks are managed via jmp_buf (context). */ diff --git a/kernel/syscall.c b/kernel/syscall.c index 7be66632..a89f6ba8 100644 --- a/kernel/syscall.c +++ b/kernel/syscall.c @@ -384,16 +384,25 @@ int sys_uptime(void) } /* User mode safe output syscall. - * Outputs a string from user mode by executing puts() in kernel context. - * This avoids privilege violations from printf's logger mutex operations. + * Outputs a string from user mode directly via UART, bypassing the logger + * queue. Direct output ensures strict ordering for U-mode tasks and avoids race + * conditions with the async logger task. */ static int _tputs(const char *str) { if (unlikely(!str)) return -EINVAL; - /* Use puts() which will handle logger enqueue or direct output */ - return puts(str); + /* Prevent task switching during output to avoid character interleaving. + * Ensures the entire string is output atomically with respect to other + * tasks. Limit output length to prevent unbounded blocking. + */ + NOSCHED_ENTER(); + for (const char *p = str; *p && (p - str) < 256; p++) + _putchar(*p); + NOSCHED_LEAVE(); + + return 0; } int sys_tputs(const char *str) diff --git a/kernel/task.c b/kernel/task.c index c9973e19..6725b9fa 100644 --- a/kernel/task.c +++ b/kernel/task.c @@ -44,6 +44,9 @@ static volatile uint32_t timer_work_generation = 0; /* counter for coalescing */ #define TIMER_WORK_DELAY_UPDATE (1U << 1) /* Task delay processing */ #define TIMER_WORK_CRITICAL (1U << 2) /* High-priority timer work */ +/* Kernel stack size for U-mode tasks */ +#define KERNEL_STACK_SIZE 512 /* 512 bytes per U-mode task */ + #if CONFIG_STACK_PROTECTION /* Stack canary checking frequency - check every N context switches */ #define STACK_CHECK_INTERVAL 32 @@ -628,6 +631,10 @@ void dispatch(void) * When we return, ISR will restore from next_task's stack. */ hal_switch_stack(&prev_task->sp, next_task->sp); + + /* Update kernel stack for next trap entry */ + hal_set_kernel_stack(next_task->kernel_stack, + next_task->kernel_stack_size); } else { /* Cooperative mode: Always call hal_context_restore() because it uses * setjmp/longjmp mechanism. Even if same task continues, we must @@ -778,15 +785,37 @@ static int32_t task_spawn_impl(void *task_entry, CRITICAL_LEAVE(); + /* Allocate per-task kernel stack for U-mode tasks */ + if (user_mode) { + tcb->kernel_stack = malloc(KERNEL_STACK_SIZE); + if (!tcb->kernel_stack) { + CRITICAL_ENTER(); + list_remove(kcb->tasks, node); + kcb->task_count--; + CRITICAL_LEAVE(); + free(tcb->stack); + free(tcb); + panic(ERR_STACK_ALLOC); + } + tcb->kernel_stack_size = KERNEL_STACK_SIZE; + } else { + tcb->kernel_stack = NULL; + tcb->kernel_stack_size = 0; + } + /* Initialize execution context outside critical section. */ hal_context_init(&tcb->context, (size_t) tcb->stack, new_stack_size, (size_t) task_entry, user_mode); /* Initialize SP for preemptive mode. * Build initial ISR frame on stack with mepc pointing to task entry. + * For U-mode tasks, frame is built on kernel stack; for M-mode on user + * stack. */ void *stack_top = (void *) ((uint8_t *) tcb->stack + new_stack_size); - tcb->sp = hal_build_initial_frame(stack_top, task_entry, user_mode); + tcb->sp = + hal_build_initial_frame(stack_top, task_entry, user_mode, + tcb->kernel_stack, tcb->kernel_stack_size); printf("task %u: entry=%p stack=%p size=%u prio_level=%u time_slice=%u\n", tcb->id, task_entry, tcb->stack, (unsigned int) new_stack_size, @@ -843,6 +872,8 @@ int32_t mo_task_cancel(uint16_t id) /* Free memory outside critical section */ free(tcb->stack); + if (tcb->kernel_stack) + free(tcb->kernel_stack); free(tcb); return ERR_OK; }