From: Lunaixsky Date: Sun, 18 Feb 2024 17:15:39 +0000 (+0000) Subject: A Total Overhaul on the Lunaix's Virtual Memory Model (#26) X-Git-Url: https://scm.lunaixsky.com/lunaix-os.git/commitdiff_plain/69777bdcab284335651a8002e2896f3862fa423d?hp=965940833071025bf0d386f4a9c70a5258453dbd A Total Overhaul on the Lunaix's Virtual Memory Model (#26) * * Introducing a new declaritive pte manipulation toolset. Prior to this patch, the original page table API is a simple, straightforward, and yet much verbose design. Which can be seen through with following characteristics: 1. The `vmm_set_mapping` is the only way provided to set pte in the page table. It require explicitly specifying the physical, virtual and pte attributes, as was by-design to provide a comprehensiveness. However, we found that it always accompanied with cumbersome address calculations and nasty pte bit-masking just for setting these argment right, especially when doing non-trivial mapping. 2. The existing design assume a strict 2-level paging and fixed 4K page size, tightly coupled with x86's 32-bit paging. It makes it impossible to extend beyond these assumption, for example, adding huge page or supporting any non-x86 mmu. 3. Interfacing to page table manipulation is not centralised, there is a vast amount of eccentric and yet odd API dangling in the kboot area. In light of these limitations, we have redesign the entire virtual memory interface. By realising the pointer to pte has already encodes enough information to complete any pte read/write of any level, and the pointer arithematics will automatically result the valid pointer to the desired pte, allowing use to remove the bloat of invoking the vmm_set_mapping. Architectural-dependent information related to PTE are abstracted away from the generic kernel code base, giving a pure declaritive PTE construction and page table manipulation. * Refactoring done on making kboot using the new api. * Refactoring done on pfault handler. * * Correct ptep address deduction to take account of pte size, which previously result an unaligned ptw write * Correct the use of memset and tlb invalidation when zeroing an newly allocated pagetable. Deduce the next-level ptep and use it accordingly * Simplyfy the pre-boot stuff (boot.S) moves the setting of CRx into a more readable form. * Allocate a new stack reside in higher half mem for boostraping stage allow us to free the bootctx safely before getting into lunad * Adjust the bootctx helpers to work with the new vmm api. * (LunaDBG) update the mm lookup to detect the huge-page mapping correctly * * Dynamically allocate page table when ptep trigger page fault for pointing to a pte that do not have containing page table. Which previously we always assume that table is allocated before pte is written into. This on-demand allocation greatly remove the overhead as we need to go through all n-level just to ensure the hierarchy. * Page fault handling procedure is refactored, we put all the important information such as faulting pte and eip into a dedicated struct fault_context. * State out the definition we have invented for making things clear. * Rewrite vmap function with the new ptep feature, the reduction in LoC and complexity is significant. * * Use huge page to perform fast and memory-efficient identity mapping on physical address space (first 3GiB). Doing that enable us to eliminate the need of selective mapping on bootloader's mem_map. * Correct the address calculation in __alloc_contig_ptes * Change the behavior of previously pagetable_alloc, to offload most pte setting to it's caller, makes it more portable. We also renamed it to 'vmm_alloc_page' * Perform some formattings to make things more easy to read. * * Rewrite the vms duplication and deletion. Using the latest vmm refactoring, the implementation is much clean and intuitive than before, althought the LoC is slightly longer. The rewrited version is named to `vmscpy` and `vmsfree` as it remove the assumption of source vms to be VMS_SELF * Add `pmm_free_one` to allow user free the pmm page based on the attribute, which is intented to solve the recent discovered leakage in physical page resource, where the old pmm_free_page lack the feature to free the PP_FGLOCKED page which is allocated to page table, thus resulting pages that couldn't be freed by any means. * Rename some functions for better clarity. * * Rewrite the vmm_lookupat with new pte interface * Adjust the memory layout such that the guest vms mount point is shifted just before the vms self mounting point. This is remove effort to locate it and skip it during vmscpy * Add empty thread obj as place-holder, to prevent write to undefined location when context save/store happened before threaded environment is initialized * * Fix more issues related to recent refactoring 1. introduce pte_mkhuge to mark pte as a leaf which previously confuse the use of PS bit that has another interpretation on last level pte 2. fix the address increamention at vmap 3. invalidate the tlb cache whenever we dynamically allocated a page. * (LunaDBG) rewrite the vm probing, employing the latest pte interfacing and make it much more efficient by actually doing page-walk rather than scanning linearly * * Fix an issue where the boostrap stack is too small that the overflow corrupt adjacent kernel structure * Add assertion in pmm to enforce better consistency and invariants * Page fault handler how aware of ptep fault and assign suitable permission for level creation and page pre-allocation * Ensure the mapping on dest_mnt are properly invalidated in TLB cache after we setup the vms to be copied to. * (LunaDBG) Fix the ptep calculation at specified level when querying an individual pte * * Rework the vms mount, they are now have more unified interface and remove the burden of passing vm_mnt on each function call. It also allow us to track any dangling mount points * Fix a issue that dup_kernel_stack use stack top as start address to perform copying. Which cause the subsequent exec address to be corrupted * Fix ptep_step_out failed on non-VMS_SELF mount point * Change the way that assertion failure reporting, now they just report it directly without going into another sys-trap, thus preserve the stack around failing point to ease our debugging experience. * * ensure the tail pte checking is peformed regardless the pte value when doing page table walking (e.g., vmsfree and vmscpy). Which previously is a bug * the self-mount point is located incorrectly and thus cause wrong one being freed (vmsfree) * ensure we unref the physical page only when the corresponding pte is present (thus the pa is meaningful) * add a flag in fault_context to indicate the mem-access privilege level * address a issue that stack start ptep calculation is offseted by 1, causing a destoryed thread accidentially free adjacent one's kernel stack * * Purge the old page.h * * Refactor the fault.c to remove un-needed thing from arch-dependent side. * (LunaDBG) add utilities to interpret pte value and manipulate the ptep * * Add generic definition for arch-dependent pagetable --- diff --git a/docs/img/lunaix-os-mem.png b/docs/img/lunaix-os-mem.png index bb241be..519ea34 100644 Binary files a/docs/img/lunaix-os-mem.png and b/docs/img/lunaix-os-mem.png differ diff --git a/lunaix-os/arch/generic/includes/sys/mm/pagetable.h b/lunaix-os/arch/generic/includes/sys/mm/pagetable.h new file mode 100644 index 0000000..8522a11 --- /dev/null +++ b/lunaix-os/arch/generic/includes/sys/mm/pagetable.h @@ -0,0 +1,318 @@ +/** + * @file pagetable.h + * @author Lunaixsky (lunaxisky@qq.com) + * @brief Generic (skeleton) definition for pagetable.h + * @version 0.1 + * @date 2024-02-18 + * + * @copyright Copyright (c) 2024 + * + */ + +#ifndef __LUNAIX_ARCH_PAGETABLE_H +#define __LUNAIX_ARCH_PAGETABLE_H + +#include +#include + +/* ******** Page Table Manipulation ******** */ + +// Levels of page table to traverse for a single page walk +#define _PTW_LEVEL 2 + +#define _PAGE_BASE_SHIFT 12 +#define _PAGE_BASE_SIZE ( 1UL << _PAGE_BASE_SHIFT ) +#define _PAGE_BASE_MASK ( _PAGE_BASE_SIZE - 1) + +#define _PAGE_LEVEL_SHIFT 10 +#define _PAGE_LEVEL_SIZE ( 1UL << _PAGE_LEVEL_SHIFT ) +#define _PAGE_LEVEL_MASK ( _PAGE_LEVEL_SIZE - 1 ) +#define _PAGE_Ln_SIZE(n) ( 1UL << (_PAGE_BASE_SHIFT + _PAGE_LEVEL_SHIFT * (_PTW_LEVEL - (n) - 1)) ) + +// Note: we set VMS_SIZE = VMS_MASK as it is impossible +// to express 4Gi in 32bit unsigned integer + +#define VMS_MASK ( -1UL ) +#define VMS_SIZE VMS_MASK + +/* General size of a LnT huge page */ + +#define L0T_SIZE _PAGE_Ln_SIZE(0) +#define L1T_SIZE _PAGE_Ln_SIZE(1) +#define L2T_SIZE _PAGE_Ln_SIZE(1) +#define L3T_SIZE _PAGE_Ln_SIZE(1) +#define LFT_SIZE _PAGE_Ln_SIZE(1) + +/* General mask to get page offset of a LnT huge page */ + +#define L0T_MASK ( L0T_SIZE - 1 ) +#define L1T_MASK ( L1T_SIZE - 1 ) +#define L2T_MASK ( L2T_SIZE - 1 ) +#define L3T_MASK ( L3T_SIZE - 1 ) +#define LFT_MASK ( LFT_SIZE - 1 ) + +/* Masks to get index of a LnTE */ + +#define L0T_INDEX_MASK ( VMS_MASK ^ L0T_MASK ) +#define L1T_INDEX_MASK ( L0T_MASK ^ L1T_MASK ) +#define L2T_INDEX_MASK ( L1T_MASK ^ L2T_MASK ) +#define L3T_INDEX_MASK ( L2T_MASK ^ L3T_MASK ) +#define LFT_INDEX_MASK ( L3T_MASK ^ LFT_MASK ) + +#define PAGE_SHIFT _PAGE_BASE_SHIFT +#define PAGE_SIZE _PAGE_BASE_SIZE +#define PAGE_MASK _PAGE_BASE_MASK + +#define LEVEL_SHIFT _PAGE_LEVEL_SHIFT +#define LEVEL_SIZE _PAGE_LEVEL_SIZE +#define LEVEL_MASK _PAGE_LEVEL_MASK + +// max PTEs number +#define MAX_PTEN _PAGE_LEVEL_SIZE + +// max translation level supported +#define MAX_LEVEL _PTW_LEVEL + + +/* ******** PTE Manipulation ******** */ + +struct __pte { + unsigned int val; +} align(4); + +#ifndef pte_t +typedef struct __pte pte_t; +#endif + +typedef unsigned int pfn_t; +typedef unsigned int pte_attr_t; + +#define _PTE_P (0) +#define _PTE_W (0) +#define _PTE_U (0) +#define _PTE_A (0) +#define _PTE_D (0) +#define _PTE_X (0) +#define _PTE_R (0) + +#define _PTE_PROT_MASK ( _PTE_W | _PTE_U | _PTE_X ) + +#define KERNEL_PAGE ( _PTE_P ) +#define KERNEL_EXEC ( KERNEL_PAGE | _PTE_X ) +#define KERNEL_DATA ( KERNEL_PAGE | _PTE_W ) +#define KERNEL_RDONLY ( KERNEL_PAGE ) + +#define USER_PAGE ( _PTE_P | _PTE_U ) +#define USER_EXEC ( USER_PAGE | _PTE_X ) +#define USER_DATA ( USER_PAGE | _PTE_W ) +#define USER_RDONLY ( USER_PAGE ) + +#define SELF_MAP ( KERNEL_DATA | _PTE_WT | _PTE_CD ) + +#define __mkpte_from(pte_val) ((pte_t){ .val = (pte_val) }) +#define __MEMGUARD 0xdeadc0deUL + +#define null_pte ( __mkpte_from(0) ) +#define guard_pte ( __mkpte_from(__MEMGUARD) ) +#define pte_val(pte) ( pte.val ) + + +static inline bool +pte_isguardian(pte_t pte) +{ + return pte.val == __MEMGUARD; +} + +static inline pte_t +mkpte_prot(pte_attr_t prot) +{ + return null_pte; +} + +static inline pte_t +mkpte(ptr_t paddr, pte_attr_t prot) +{ + return null_pte; +} + +static inline pte_t +mkpte_root(ptr_t paddr, pte_attr_t prot) +{ + return null_pte; +} + +static inline pte_t +mkpte_raw(unsigned long pte_val) +{ + return null_pte; +} + +static inline pte_t +pte_setpaddr(pte_t pte, ptr_t paddr) +{ + return pte; +} + +static inline ptr_t +pte_paddr(pte_t pte) +{ + return 0; +} + +static inline pte_t +pte_setprot(pte_t pte, ptr_t prot) +{ + return pte; +} + +static inline pte_attr_t +pte_prot(pte_t pte) +{ + return 0; +} + +static inline bool +pte_isnull(pte_t pte) +{ + return !pte.val; +} + +static inline pte_t +pte_mkhuge(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mkvolatile(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mkroot(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_usepat(pte_t pte) +{ + return pte; +} + +static inline bool +pte_huge(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkloaded(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mkunloaded(pte_t pte) +{ + return pte; +} + +static inline bool +pte_isloaded(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkwprotect(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mkwritable(pte_t pte) +{ + return pte; +} + +static inline bool +pte_iswprotect(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkuser(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mkkernel(pte_t pte) +{ + return pte; +} + +static inline bool +pte_allow_user(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkexec(pte_t pte) +{ + return pte; +} + +static inline pte_t +pte_mknonexec(pte_t pte) +{ + return pte; +} + +static inline bool +pte_isexec(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkuntouch(pte_t pte) +{ + return pte; +} + +static inline bool +pte_istouched(pte_t pte) +{ + return false; +} + +static inline pte_t +pte_mkclean(pte_t pte) +{ + return pte; +} + +static inline bool +pte_dirty(pte_t pte) +{ + return false; +} + +static inline void +set_pte(pte_t* ptep, pte_t pte) +{ + ptep->val = pte.val; +} + +static inline pte_t +pte_at(pte_t* ptep) { + return *ptep; +} + + +#endif /* __LUNAIX_ARCH_PAGETABLE_H */ diff --git a/lunaix-os/arch/i386/boot/archinit.h b/lunaix-os/arch/i386/boot/archinit.h new file mode 100644 index 0000000..b936ca1 --- /dev/null +++ b/lunaix-os/arch/i386/boot/archinit.h @@ -0,0 +1,14 @@ +#ifndef __LUNAIX_ARCHINIT_H +#define __LUNAIX_ARCHINIT_H + +#include +#include +#include + +ptr_t boot_text +kpg_init(); + +struct boot_handoff* boot_text +mb_parse(struct multiboot_info* mb); + +#endif /* __LUNAIX_ARCHINIT_H */ diff --git a/lunaix-os/arch/i386/boot/boot.S b/lunaix-os/arch/i386/boot/boot.S index 9d75d98..2f48bbc 100644 --- a/lunaix-os/arch/i386/boot/boot.S +++ b/lunaix-os/arch/i386/boot/boot.S @@ -13,23 +13,8 @@ /* 根据System V ABI,栈地址必须16字节对齐 */ /* 这里只是一个临时栈,在_hhk_init里面我们会初始化内核专用栈 */ .align 16 - stack_bottom: - .skip 4096, 0 - __stack_top: - - -/* - 1 page directory, - 9 page tables: - 1. Mapping reserved area and hhk_init - 2-9. Remapping the kernels -*/ - -.section .kpg - .global _k_ptd - _k_ptd: - .skip KPG_SIZE, 0 - + .skip 256, 0 + __boot_stack_top: .section .boot.text .global start_ @@ -40,49 +25,14 @@ cld # 确保屏蔽所有外中断,直到我们准备好PIC为止 cli - movl $__stack_top, %esp + movl $__boot_stack_top, %esp subl $16, %esp - /* - parse multiboot struct into arch-agnostic boot info struct - */ - movl %ebx, (%esp) - call mb_parse - - /* - kpg_init用来初始化内核页表: - 1. 初始化最简单的PD与PT(重新映射我们的内核至3GiB处,以及对相应的地方进行Identity Map) - */ - - movl $(KPG_SIZE), 4(%esp) - movl $(_k_ptd - 0xC0000000), (%esp) /* PTD物理地址 */ - call kpg_init - - /* - 基本的映射定义好了,我们可以放心的打开分页了 - 我们只需要把PTD的基地址加载进CR3就好了。 - */ - - /* 加载PTD基地址(物理地址) */ - movl (%esp), %eax - andl $0xfffff000, %eax # 有点多余,但写上还算明白一点 - movl %eax, %cr3 - - movl %cr0, %eax - orl $0x80000000, %eax /* 开启分页与地址转换 (CR0.PG=1, CR0.WP=0) */ - andl $0xfffffffb, %eax - orl $0x2, %eax /* 启用x87 FPU (CR0.MP=1, CR0.EM=0) */ - movl %eax, %cr0 - - movl %cr4, %eax - orl $0x600, %eax - movl %eax, %cr4 /* CR4.OSFXSR=1, CR4.OSXMMEXCPT=1 */ - /* x87 FPU 已配置 */ + call x86_init addl $16, %esp - /* 进入高半核! */ pushl $hhk_entry_ ret \ No newline at end of file diff --git a/lunaix-os/arch/i386/boot/init32.c b/lunaix-os/arch/i386/boot/init32.c new file mode 100644 index 0000000..1b908f8 --- /dev/null +++ b/lunaix-os/arch/i386/boot/init32.c @@ -0,0 +1,17 @@ +#include "archinit.h" +#include +#include + +void boot_text +x86_init(struct multiboot_info* mb) +{ + mb_parse(mb); + + cr4_setfeature(CR4_OSXMMEXCPT | CR4_OSFXSR | CR4_PSE36); + + ptr_t pagetable = kpg_init(); + cpu_chvmspace(pagetable); + + cr0_unsetfeature(CR0_WP | CR0_EM); + cr0_setfeature(CR0_PG | CR0_MP); +} \ No newline at end of file diff --git a/lunaix-os/arch/i386/boot/kpt_setup.c b/lunaix-os/arch/i386/boot/kpt_setup.c index 50d95ee..26a348d 100644 --- a/lunaix-os/arch/i386/boot/kpt_setup.c +++ b/lunaix-os/arch/i386/boot/kpt_setup.c @@ -1,121 +1,105 @@ #define __BOOT_CODE__ -#include +#include +#include #include -#include +#include -#define PT_ADDR(ptd, pt_index) ((ptd_t*)ptd + (pt_index + 1) * 1024) -#define SET_PDE(ptd, pde_index, pde) *((ptd_t*)ptd + pde_index) = pde; -#define SET_PTE(ptd, pt_index, pte_index, pte) \ - *(PT_ADDR(ptd, pt_index) + pte_index) = pte; -#define sym_val(sym) (ptr_t)(&sym) - -#define KERNEL_PAGE_COUNT \ - ((sym_val(__kexec_end) - sym_val(__kexec_start) + 0x1000 - 1) >> 12); -#define HHK_PAGE_COUNT \ - ((sym_val(__kexec_boot_end) - 0x100000 + 0x1000 - 1) >> 12) - -#define V2P(vaddr) ((ptr_t)(vaddr)-KERNEL_EXEC) - -// use table #1 -#define PG_TABLE_IDENTITY 0 +// Provided by linker (see linker.ld) +extern u8_t __kexec_start[]; +extern u8_t __kexec_end[]; +extern u8_t __kexec_text_start[]; +extern u8_t __kexec_text_end[]; +extern u8_t __kboot_start[]; +extern u8_t __kboot_end[]; -// use table #2-8 -// hence the max size of kernel is 8MiB -#define PG_TABLE_KERNEL 1 +// define the initial page table layout +struct kernel_map { + pte_t l0t[_PAGE_LEVEL_SIZE]; -// use table #9 -#define PG_TABLE_STACK 8 + struct { + pte_t _lft[_PAGE_LEVEL_SIZE]; + } kernel_lfts[16]; +} align(4); -// Provided by linker (see linker.ld) -extern u8_t __kexec_start; -extern u8_t __kexec_end; -extern u8_t __kexec_text_start; -extern u8_t __kexec_text_end; +static struct kernel_map kernel_pt __section(".kpg"); +export_symbol(debug, boot, kernel_pt); -extern u8_t __kexec_boot_end; void boot_text -_init_page(x86_page_table* ptd) +_init_page() { - ptd->entry[0] = NEW_L1_ENTRY(PG_PREM_RW, (ptd_t*)ptd + PG_MAX_ENTRIES); - - // 对低1MiB空间进行对等映射(Identity - // mapping),也包括了我们的VGA,方便内核操作。 - x86_page_table* id_pt = - (x86_page_table*)GET_PG_ADDR(ptd->entry[PG_TABLE_IDENTITY]); + struct kernel_map* kpt_pa = (struct kernel_map*)to_kphysical(&kernel_pt); - for (u32_t i = 0; i < 256; i++) { - id_pt->entry[i] = NEW_L2_ENTRY(PG_PREM_RW, (i << PG_SIZE_BITS)); - } + pte_t* kl0tep = (pte_t*) &kpt_pa->l0t[pfn_at(KERNEL_RESIDENT, L0T_SIZE)]; + pte_t* kl1tep = (pte_t*) kpt_pa->kernel_lfts; + pte_t* boot_l0tep = (pte_t*) kpt_pa; - // 对等映射我们的hhk_init,这样一来,当分页与地址转换开启后,我们依然能够照常执行最终的 - // jmp 指令来跳转至 - // 内核的入口点 - for (u32_t i = 0; i < HHK_PAGE_COUNT; i++) { - id_pt->entry[256 + i] = - NEW_L2_ENTRY(PG_PREM_RW, 0x100000 + (i << PG_SIZE_BITS)); - } + set_pte(boot_l0tep, pte_mkhuge(mkpte_prot(KERNEL_DATA))); // --- 将内核重映射至高半区 --- - // 这里是一些计算,主要是计算应当映射进的 页目录 与 页表 的条目索引(Entry - // Index) - u32_t kernel_pde_index = L1_INDEX(sym_val(__kexec_start)); - u32_t kernel_pte_index = L2_INDEX(sym_val(__kexec_start)); - u32_t kernel_pg_counts = KERNEL_PAGE_COUNT; - - // 将内核所需要的页表注册进页目录 - // 当然,就现在而言,我们的内核只占用不到50个页(每个页表包含1024个页) - // 这里分配了3个页表(12MiB),未雨绸缪。 - for (u32_t i = 0; i < PG_TABLE_STACK - PG_TABLE_KERNEL; i++) { - ptd->entry[kernel_pde_index + i] = - NEW_L1_ENTRY(PG_PREM_RW, PT_ADDR(ptd, PG_TABLE_KERNEL + i)); + // Hook the kernel reserved LFTs onto L0T + pte_t pte = mkpte((ptr_t)kl1tep, KERNEL_DATA); + + for (u32_t i = 0; i < KEXEC_RSVD; i++) { + pte = pte_setpaddr(pte, (ptr_t)&kpt_pa->kernel_lfts[i]); + set_pte(kl0tep, pte); + + kl0tep++; } - // 首先,检查内核的大小是否可以fit进我们这几个表(12MiB) - if (kernel_pg_counts > - (PG_TABLE_STACK - PG_TABLE_KERNEL) * PG_MAX_ENTRIES) { + // Ensure the size of kernel is within the reservation + pfn_t kimg_pagecount = + pfn((ptr_t)__kexec_end - (ptr_t)__kexec_start); + if (kimg_pagecount > KEXEC_RSVD * _PAGE_LEVEL_SIZE) { // ERROR: require more pages // here should do something else other than head into blocking asm("ud2"); } - // 计算内核.text段的物理地址 - ptr_t kernel_pm = V2P(&__kexec_start); - ptr_t ktext_start = V2P(&__kexec_text_start); - ptr_t ktext_end = V2P(&__kexec_text_end); + // Now, map the kernel + + pfn_t kimg_end = pfn(to_kphysical(__kexec_end)); + pfn_t i = pfn(to_kphysical(__kexec_text_start)); + kl1tep += i; - // 重映射内核至高半区地址(>=0xC0000000) - for (u32_t i = 0; i < kernel_pg_counts; i++) { - ptr_t paddr = kernel_pm + (i << PG_SIZE_BITS); - u32_t flags = PG_PREM_RW; + // kernel .text + pte = pte_setprot(pte, KERNEL_EXEC); + pfn_t ktext_end = pfn(to_kphysical(__kexec_text_end)); + for (; i < ktext_end; i++) { + pte = pte_setpaddr(pte, page_addr(i)); + set_pte(kl1tep, pte); + + kl1tep++; + } - if (paddr >= ktext_start && paddr <= ktext_end) { - flags = PG_PREM_R; - } + // all remaining kernel sections + pte = pte_setprot(pte, KERNEL_DATA); + for (; i < kimg_end; i++) { + pte = pte_setpaddr(pte, page_addr(i)); + set_pte(kl1tep, pte); - SET_PTE(ptd, - PG_TABLE_KERNEL, - kernel_pte_index + i, - NEW_L2_ENTRY(flags, paddr)) + kl1tep++; } - // 最后一个entry用于循环映射 - ptd->entry[PG_MAX_ENTRIES - 1] = NEW_L1_ENTRY(T_SELF_REF_PERM, ptd); + // XXX: Mapping the kernel .rodata section? + + // Build up self-reference + pte = mkpte_root((ptr_t)kpt_pa, KERNEL_DATA); + set_pte(boot_l0tep + _PAGE_LEVEL_MASK, pte); } -void boot_text -kpg_init(x86_page_table* ptd, u32_t kpg_size) +ptr_t boot_text +kpg_init() { - - // 初始化 kpg 全为0 - // P.s. 真没想到GRUB会在这里留下一堆垃圾! 老子的页表全乱套了! - u8_t* kpg = (u8_t*)ptd; - for (u32_t i = 0; i < kpg_size; i++) { - *(kpg + i) = 0; + ptr_t kmap_pa = to_kphysical(&kernel_pt); + for (size_t i = 0; i < sizeof(kernel_pt); i++) { + ((u8_t*)kmap_pa)[i] = 0; } - _init_page(ptd); + _init_page(); + + return kmap_pa; } \ No newline at end of file diff --git a/lunaix-os/arch/i386/boot/prologue.S b/lunaix-os/arch/i386/boot/prologue.S index ff77ef5..f8d59ce 100644 --- a/lunaix-os/arch/i386/boot/prologue.S +++ b/lunaix-os/arch/i386/boot/prologue.S @@ -3,12 +3,24 @@ #define __ASM__ #include +.section .bss + .align 16 + .skip 2048, 0 + __kinit_stack_top: + # TODO + # This stack was too small that corrupt the ambient kernel structures. + # (took me days to figure this out!) + # We should spent more time to implement a good strategy to detect such + # run-over (we can check these invariants when we trapped in some non-recoverable + # state and provide good feedback to user) + .section .text .global hhk_entry_ hhk_entry_: /* 欢迎来到虚拟内存的世界! :D */ + movl $__kinit_stack_top, %esp andl $stack_alignment, %esp subl $16, %esp /* diff --git a/lunaix-os/arch/i386/exceptions/i386_isrm.c b/lunaix-os/arch/i386/exceptions/i386_isrm.c index 0628535..f1facf9 100644 --- a/lunaix-os/arch/i386/exceptions/i386_isrm.c +++ b/lunaix-os/arch/i386/exceptions/i386_isrm.c @@ -90,7 +90,7 @@ isrm_bindirq(int irq, isr_cb irq_handler) { int iv; if (!(iv = isrm_ivexalloc(irq_handler))) { - panickf("out of IV resource. (irq=%d)", irq); + fail("out of IV resource."); return 0; // never reach } diff --git a/lunaix-os/arch/i386/exceptions/interrupt.S b/lunaix-os/arch/i386/exceptions/interrupt.S index d8df554..e6656c9 100644 --- a/lunaix-os/arch/i386/exceptions/interrupt.S +++ b/lunaix-os/arch/i386/exceptions/interrupt.S @@ -112,11 +112,11 @@ andl $3, %eax jz 1f - ## FIXME x87 fpu context - movl current_thread, %eax - movl thread_ustack_top(%eax), %eax - test %eax, %eax - jz 1f + # # FIXME x87 fpu context + # movl current_thread, %eax + # movl thread_ustack_top(%eax), %eax + # test %eax, %eax + # jz 1f # fxrstor (%eax) 1: diff --git a/lunaix-os/arch/i386/exceptions/interrupts.c b/lunaix-os/arch/i386/exceptions/interrupts.c index 5951236..8e6fa20 100644 --- a/lunaix-os/arch/i386/exceptions/interrupts.c +++ b/lunaix-os/arch/i386/exceptions/interrupts.c @@ -6,7 +6,6 @@ #include #include -#include #include #include #include @@ -50,7 +49,9 @@ intr_handler(isr_param* param) done: - intc_notify_eoi(0, execp->vector); + if (execp->vector > IV_BASE_END) { + intc_notify_eoi(0, execp->vector); + } return; } \ No newline at end of file diff --git a/lunaix-os/arch/i386/includes/sys/crx.h b/lunaix-os/arch/i386/includes/sys/crx.h new file mode 100644 index 0000000..270a897 --- /dev/null +++ b/lunaix-os/arch/i386/includes/sys/crx.h @@ -0,0 +1,52 @@ +#ifndef __LUNAIX_CRX_H +#define __LUNAIX_CRX_H + +#define CR4_PSE36 ( 1UL << 4 ) +#define CR4_OSXMMEXCPT ( 1UL << 10 ) +#define CR4_OSFXSR ( 1UL << 9 ) +#define CR4_PCIDE ( 1UL << 17 ) +#define CR4_PGE ( 1UL << 7 ) +#define CR4_LA57 ( 1UL << 12 ) + +#define CR0_PG ( 1UL << 31 ) +#define CR0_WP ( 1UL << 16 ) +#define CR0_EM ( 1UL << 2 ) +#define CR0_MP ( 1UL << 1 ) + +#define crx_addflag(crx, flag) \ + asm( \ + "movl %%" #crx ", %%eax\n" \ + "orl %0, %%eax\n" \ + "movl %%eax, %%" #crx "\n" \ + ::"r"(flag) \ + :"eax" \ + ); + +#define crx_rmflag(crx, flag) \ + asm( \ + "movl %%" #crx ", %%eax\n" \ + "andl %0, %%eax\n" \ + "movl %%eax, %%" #crx "\n" \ + ::"r"(~(flag)) \ + :"eax" \ + ); + +static inline void +cr4_setfeature(unsigned long feature) +{ + crx_addflag(cr4, feature); +} + +static inline void +cr0_setfeature(unsigned long feature) +{ + crx_addflag(cr0, feature); +} + +static inline void +cr0_unsetfeature(unsigned long feature) +{ + crx_rmflag(cr0, feature); +} + +#endif /* __LUNAIX_CR4_H */ diff --git a/lunaix-os/arch/i386/includes/sys/interrupts.h b/lunaix-os/arch/i386/includes/sys/interrupts.h index a138610..540da4c 100644 --- a/lunaix-os/arch/i386/includes/sys/interrupts.h +++ b/lunaix-os/arch/i386/includes/sys/interrupts.h @@ -7,9 +7,6 @@ #include #include -#define saved_fp(isrm) ((isrm)->registers.ebp) -#define kernel_context(isrm) (!(((isrm)->execp->cs) & 0b11)) - struct exec_param; struct regcontext @@ -50,6 +47,9 @@ struct exec_param u32_t ss; } compact; +#define saved_fp(isrm) ((isrm)->registers.ebp) +#define kernel_context(isrm) (!(((isrm)->execp->cs) & 0b11)) + #endif #endif /* __LUNAIX_INTERRUPTS_H */ diff --git a/lunaix-os/arch/i386/includes/sys/mm/memory.h b/lunaix-os/arch/i386/includes/sys/mm/memory.h new file mode 100644 index 0000000..62bab8d --- /dev/null +++ b/lunaix-os/arch/i386/includes/sys/mm/memory.h @@ -0,0 +1,27 @@ +#ifndef __LUNAIX_MEMORY_H +#define __LUNAIX_MEMORY_H + +#include +#include + +static inline pte_attr_t +translate_vmr_prot(unsigned int vmr_prot) +{ + pte_attr_t _pte_prot = _PTE_U; + if ((vmr_prot & PROT_READ)) { + _pte_prot |= _PTE_R; + } + + if ((vmr_prot & PROT_WRITE)) { + _pte_prot |= _PTE_W; + } + + if ((vmr_prot & PROT_EXEC)) { + _pte_prot |= _PTE_X; + } + + return _pte_prot; +} + + +#endif /* __LUNAIX_MEMORY_H */ diff --git a/lunaix-os/arch/i386/includes/sys/mm/mempart.h b/lunaix-os/arch/i386/includes/sys/mm/mempart.h index 0ffaad1..d923d45 100644 --- a/lunaix-os/arch/i386/includes/sys/mm/mempart.h +++ b/lunaix-os/arch/i386/includes/sys/mm/mempart.h @@ -25,33 +25,33 @@ #define USR_STACK_SIZE 0x40000UL #define USR_STACK_END 0xbffffff0UL -#define KERNEL_EXEC 0xc0000000UL -#define KERNEL_EXEC_SIZE 0x4000000UL -#define KERNEL_EXEC_END 0xc3ffffffUL +#define KERNEL_IMG 0xc0000000UL +#define KERNEL_IMG_SIZE 0x4000000UL +#define KERNEL_IMG_END 0xc3ffffffUL -#define VMS_MOUNT_1 0xc4000000UL -#define VMS_MOUNT_1_SIZE 0x400000UL -#define VMS_MOUNT_1_END 0xc43fffffUL - -#define PG_MOUNT_1 0xc4400000UL +#define PG_MOUNT_1 0xc4000000UL #define PG_MOUNT_1_SIZE 0x1000UL -#define PG_MOUNT_1_END 0xc4400fffUL +#define PG_MOUNT_1_END 0xc4000fffUL -#define PG_MOUNT_2 0xc4401000UL +#define PG_MOUNT_2 0xc4001000UL #define PG_MOUNT_2_SIZE 0x1000UL -#define PG_MOUNT_2_END 0xc4401fffUL +#define PG_MOUNT_2_END 0xc4001fffUL -#define PG_MOUNT_3 0xc4402000UL +#define PG_MOUNT_3 0xc4002000UL #define PG_MOUNT_3_SIZE 0x1000UL -#define PG_MOUNT_3_END 0xc4402fffUL +#define PG_MOUNT_3_END 0xc4002fffUL -#define PG_MOUNT_4 0xc4403000UL +#define PG_MOUNT_4 0xc4003000UL #define PG_MOUNT_4_SIZE 0x1000UL -#define PG_MOUNT_4_END 0xc4403fffUL +#define PG_MOUNT_4_END 0xc4003fffUL -#define VMAP 0xc4800000UL +#define VMAP 0xc4400000UL #define VMAP_SIZE 0x3b400000UL -#define VMAP_END 0xffbfffffUL +#define VMAP_END 0xff7fffffUL + +#define VMS_MOUNT_1 0xff800000UL +#define VMS_MOUNT_1_SIZE 0x400000UL +#define VMS_MOUNT_1_END 0xffbfffffUL #define PD_REF 0xffc00000UL #define PD_REF_SIZE 0x400000UL diff --git a/lunaix-os/arch/i386/includes/sys/mm/mm_defs.h b/lunaix-os/arch/i386/includes/sys/mm/mm_defs.h index 8f51352..8dd7df6 100644 --- a/lunaix-os/arch/i386/includes/sys/mm/mm_defs.h +++ b/lunaix-os/arch/i386/includes/sys/mm/mm_defs.h @@ -2,12 +2,32 @@ #define __LUNAIX_MM_DEFS_H #include "mempart.h" +#include "pagetable.h" -#define KSTACK_SIZE (3 * MEM_PAGE) +#define KSTACK_PAGES 3 +#define KSTACK_SIZE (KSTACK_PAGES * MEM_PAGE) -#define MEMGUARD 0xdeadc0deUL +/* + Regardless architecture we need to draw the line very carefully, and must + take the size of VM into account. In general, we aims to achieve + "sufficiently large" of memory for kernel -#define kernel_addr(addr) ((addr) >= KERNEL_EXEC) -#define guardian_page(pte) ((pte) == MEMGUARD) + In terms of x86_32: + * #768~1022 PTEs of PD (0x00000000c0000000, ~1GiB) + + In light of upcomming x86_64 support (for Level 4&5 Paging): + * #510 entry of PML4 (0x0000ff0000000000, ~512GiB) + * #510 entry of PML5 (0x01fe000000000000, ~256TiB) +*/ +// Where the kernel getting re-mapped. +#define KERNEL_RESIDENT 0xc0000000UL + +// Pages reserved for kernel image +#define KEXEC_RSVD 16 + +#define kernel_addr(addr) ((addr) >= KERNEL_RESIDENT || (addr) < USR_EXEC) + +#define to_kphysical(k_va) ((ptr_t)(k_va) - KERNEL_RESIDENT) +#define to_kvirtual(k_pa) ((ptr_t)(k_pa) - KERNEL_RESIDENT) #endif /* __LUNAIX_MM_DEFS_H */ diff --git a/lunaix-os/arch/i386/includes/sys/mm/pagetable.h b/lunaix-os/arch/i386/includes/sys/mm/pagetable.h new file mode 100644 index 0000000..4fd9439 --- /dev/null +++ b/lunaix-os/arch/i386/includes/sys/mm/pagetable.h @@ -0,0 +1,315 @@ +#ifndef __LUNAIX_ARCH_PAGETABLE_H +#define __LUNAIX_ARCH_PAGETABLE_H + +#include +#include + +/* ******** Page Table Manipulation ******** */ + +// Levels of page table to traverse for a single page walk +#define _PTW_LEVEL 2 + +#define _PAGE_BASE_SHIFT 12 +#define _PAGE_BASE_SIZE ( 1UL << _PAGE_BASE_SHIFT ) +#define _PAGE_BASE_MASK ( _PAGE_BASE_SIZE - 1) + +#define _PAGE_LEVEL_SHIFT 10 +#define _PAGE_LEVEL_SIZE ( 1UL << _PAGE_LEVEL_SHIFT ) +#define _PAGE_LEVEL_MASK ( _PAGE_LEVEL_SIZE - 1 ) +#define _PAGE_Ln_SIZE(n) ( 1UL << (_PAGE_BASE_SHIFT + _PAGE_LEVEL_SHIFT * (_PTW_LEVEL - (n) - 1)) ) + +// Note: we set VMS_SIZE = VMS_MASK as it is impossible +// to express 4Gi in 32bit unsigned integer + +#define VMS_MASK ( -1UL ) +#define VMS_SIZE VMS_MASK + +/* General size of a LnT huge page */ + +#define L0T_SIZE _PAGE_Ln_SIZE(0) +#define L1T_SIZE _PAGE_Ln_SIZE(1) +#define L2T_SIZE _PAGE_Ln_SIZE(1) +#define L3T_SIZE _PAGE_Ln_SIZE(1) +#define LFT_SIZE _PAGE_Ln_SIZE(1) + +/* General mask to get page offset of a LnT huge page */ + +#define L0T_MASK ( L0T_SIZE - 1 ) +#define L1T_MASK ( L1T_SIZE - 1 ) +#define L2T_MASK ( L2T_SIZE - 1 ) +#define L3T_MASK ( L3T_SIZE - 1 ) +#define LFT_MASK ( LFT_SIZE - 1 ) + +/* Masks to get index of a LnTE */ + +#define L0T_INDEX_MASK ( VMS_MASK ^ L0T_MASK ) +#define L1T_INDEX_MASK ( L0T_MASK ^ L1T_MASK ) +#define L2T_INDEX_MASK ( L1T_MASK ^ L2T_MASK ) +#define L3T_INDEX_MASK ( L2T_MASK ^ L3T_MASK ) +#define LFT_INDEX_MASK ( L3T_MASK ^ LFT_MASK ) + +#define PAGE_SHIFT _PAGE_BASE_SHIFT +#define PAGE_SIZE _PAGE_BASE_SIZE +#define PAGE_MASK _PAGE_BASE_MASK + +#define LEVEL_SHIFT _PAGE_LEVEL_SHIFT +#define LEVEL_SIZE _PAGE_LEVEL_SIZE +#define LEVEL_MASK _PAGE_LEVEL_MASK + +// max PTEs number +#define MAX_PTEN _PAGE_LEVEL_SIZE + +// max translation level supported +#define MAX_LEVEL _PTW_LEVEL + + +/* ******** PTE Manipulation ******** */ + +struct __pte { + unsigned int val; +} align(4); + +#ifndef pte_t +typedef struct __pte pte_t; +#endif + +typedef unsigned int pfn_t; +typedef unsigned int pte_attr_t; + +#define _PTE_P (1 << 0) +#define _PTE_W (1 << 1) +#define _PTE_U (1 << 2) +#define _PTE_WT (1 << 3) +#define _PTE_CD (1 << 4) +#define _PTE_A (1 << 5) +#define _PTE_D (1 << 6) +#define _PTE_PS (1 << 7) +#define _PTE_PAT (1 << 7) +#define _PTE_G (1 << 8) +#define _PTE_X (0) +#define _PTE_R (0) + +#define _PTE_PROT_MASK ( _PTE_W | _PTE_U | _PTE_X ) + +#define KERNEL_PAGE ( _PTE_P ) +#define KERNEL_EXEC ( KERNEL_PAGE | _PTE_X ) +#define KERNEL_DATA ( KERNEL_PAGE | _PTE_W ) +#define KERNEL_RDONLY ( KERNEL_PAGE ) + +#define USER_PAGE ( _PTE_P | _PTE_U ) +#define USER_EXEC ( USER_PAGE | _PTE_X ) +#define USER_DATA ( USER_PAGE | _PTE_W ) +#define USER_RDONLY ( USER_PAGE ) + +#define SELF_MAP ( KERNEL_DATA | _PTE_WT | _PTE_CD ) + +#define __mkpte_from(pte_val) ((pte_t){ .val = (pte_val) }) +#define __MEMGUARD 0xdeadc0deUL + +#define null_pte ( __mkpte_from(0) ) +#define guard_pte ( __mkpte_from(__MEMGUARD) ) +#define pte_val(pte) ( pte.val ) + + +static inline bool +pte_isguardian(pte_t pte) +{ + return pte.val == __MEMGUARD; +} + +static inline pte_t +mkpte_prot(pte_attr_t prot) +{ + pte_attr_t attrs = (prot & _PTE_PROT_MASK) | _PTE_P; + return __mkpte_from(attrs); +} + +static inline pte_t +mkpte(ptr_t paddr, pte_attr_t prot) +{ + pte_attr_t attrs = (prot & _PTE_PROT_MASK) | _PTE_P; + return __mkpte_from((paddr & ~_PAGE_BASE_MASK) | attrs); +} + +static inline pte_t +mkpte_root(ptr_t paddr, pte_attr_t prot) +{ + pte_attr_t attrs = (prot & _PTE_PROT_MASK) | _PTE_P; + return __mkpte_from((paddr & ~_PAGE_BASE_MASK) | attrs); +} + +static inline pte_t +mkpte_raw(unsigned long pte_val) +{ + return __mkpte_from(pte_val); +} + +static inline pte_t +pte_setpaddr(pte_t pte, ptr_t paddr) +{ + return __mkpte_from((pte.val & _PAGE_BASE_MASK) | (paddr & ~_PAGE_BASE_MASK)); +} + +static inline ptr_t +pte_paddr(pte_t pte) +{ + return pte.val & ~_PAGE_BASE_MASK; +} + +static inline pte_t +pte_setprot(pte_t pte, ptr_t prot) +{ + return __mkpte_from((pte.val & ~_PTE_PROT_MASK) | (prot & _PTE_PROT_MASK)); +} + +static inline pte_attr_t +pte_prot(pte_t pte) +{ + return (pte.val & _PTE_PROT_MASK); +} + +static inline bool +pte_isnull(pte_t pte) +{ + return !pte.val; +} + +static inline pte_t +pte_mkhuge(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_PS); +} + +static inline pte_t +pte_mkvolatile(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_WT | _PTE_CD); +} + +static inline pte_t +pte_mkroot(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_PS); +} + +static inline pte_t +pte_usepat(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_PAT); +} + +static inline bool +pte_huge(pte_t pte) +{ + return !!(pte.val & _PTE_PS); +} + +static inline pte_t +pte_mkloaded(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_P); +} + +static inline pte_t +pte_mkunloaded(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_P); +} + +static inline bool +pte_isloaded(pte_t pte) +{ + return !!(pte.val & _PTE_P); +} + +static inline pte_t +pte_mkwprotect(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_W); +} + +static inline pte_t +pte_mkwritable(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_W); +} + +static inline bool +pte_iswprotect(pte_t pte) +{ + return !(pte.val & _PTE_W); +} + +static inline pte_t +pte_mkuser(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_U); +} + +static inline pte_t +pte_mkkernel(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_U); +} + +static inline bool +pte_allow_user(pte_t pte) +{ + return !!(pte.val & _PTE_U); +} + +static inline pte_t +pte_mkexec(pte_t pte) +{ + return __mkpte_from(pte.val | _PTE_X); +} + +static inline pte_t +pte_mknonexec(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_X); +} + +static inline bool +pte_isexec(pte_t pte) +{ + return !!(pte.val & _PTE_X); +} + +static inline pte_t +pte_mkuntouch(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_A); +} + +static inline bool +pte_istouched(pte_t pte) +{ + return !!(pte.val & _PTE_A); +} + +static inline pte_t +pte_mkclean(pte_t pte) +{ + return __mkpte_from(pte.val & ~_PTE_D); +} + +static inline bool +pte_dirty(pte_t pte) +{ + return !!(pte.val & _PTE_D); +} + +static inline void +set_pte(pte_t* ptep, pte_t pte) +{ + ptep->val = pte.val; +} + +static inline pte_t +pte_at(pte_t* ptep) { + return *ptep; +} + + +#endif /* __LUNAIX_ARCH_PAGETABLE_H */ diff --git a/lunaix-os/arch/i386/mm/fault.c b/lunaix-os/arch/i386/mm/fault.c new file mode 100644 index 0000000..44f01a6 --- /dev/null +++ b/lunaix-os/arch/i386/mm/fault.c @@ -0,0 +1,24 @@ +#include +#include +#include +#include + +#include + +bool +__arch_prepare_fault_context(struct fault_context* fault) +{ + isr_param* ictx = fault->ictx; + + ptr_t ptr = cpu_ldeaddr(); + if (!ptr) { + return false; + } + + fault->fault_ptep = mkptep_va(VMS_SELF, ptr); + fault->fault_data = ictx->execp->err_code; + fault->fault_instn = ictx->execp->eip; + fault->fault_va = ptr; + + return true; +} \ No newline at end of file diff --git a/lunaix-os/arch/i386/mm/pfault.c b/lunaix-os/arch/i386/mm/pfault.c deleted file mode 100644 index d0dd56c..0000000 --- a/lunaix-os/arch/i386/mm/pfault.c +++ /dev/null @@ -1,170 +0,0 @@ -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -#include - -LOG_MODULE("pf") - - - -#define COW_MASK (REGION_RSHARED | REGION_READ | REGION_WRITE) - -extern void -__print_panic_msg(const char* msg, const isr_param* param); - -void -intr_routine_page_fault(const isr_param* param) -{ - if (param->depth > 10) { - // Too many nested fault! we must messed up something - // XXX should we failed silently? - spin(); - } - - uint32_t errcode = param->execp->err_code; - ptr_t ptr = cpu_ldeaddr(); - if (!ptr) { - goto segv_term; - } - - v_mapping mapping; - if (!vmm_lookup(ptr, &mapping)) { - goto segv_term; - } - - // XXX do kernel trigger pfault? - - volatile x86_pte_t* pte = &PTE_MOUNTED(VMS_SELF, ptr >> 12); - - if (guardian_page(*pte)) { - ERROR("memory region over-running"); - goto segv_term; - } - - vm_regions_t* vmr = vmregions(__current); - struct mm_region* hit_region = region_get(vmr, ptr); - - if (!hit_region) { - // 当你凝视深渊时…… - goto segv_term; - } - - if (PG_IS_PRESENT(*pte)) { - if (((errcode ^ mapping.flags) & PG_ALLOW_USER)) { - // invalid access - DEBUG("invalid user access. (%p->%p, attr:0x%x)", - mapping.va, - mapping.pa, - mapping.flags); - goto segv_term; - } - if ((hit_region->attr & COW_MASK) == COW_MASK) { - // normal page fault, do COW - cpu_flush_page((ptr_t)pte); - - ptr_t pa = (ptr_t)vmm_dup_page(PG_ENTRY_ADDR(*pte)); - - pmm_free_page(*pte & ~0xFFF); - *pte = (*pte & 0xFFF & ~PG_DIRTY) | pa | PG_WRITE; - - goto resolved; - } - // impossible cases or accessing privileged page - goto segv_term; - } - - // an anonymous page and not present - // -> a new page need to be alloc - if ((hit_region->attr & REGION_ANON)) { - if (!PG_IS_PRESENT(*pte)) { - cpu_flush_page((ptr_t)pte); - - ptr_t pa = pmm_alloc_page(0); - if (!pa) { - goto oom; - } - - *pte = pa | region_ptattr(hit_region); - memset((void*)PG_ALIGN(ptr), 0, PG_SIZE); - goto resolved; - } - // permission denied on anon page (e.g., write on readonly page) - goto segv_term; - } - - // if mfile is set (Non-anonymous), then it is a mem map - if (hit_region->mfile && !PG_IS_PRESENT(*pte)) { - struct v_file* file = hit_region->mfile; - - ptr = PG_ALIGN(ptr); - - u32_t mseg_off = (ptr - hit_region->start); - u32_t mfile_off = mseg_off + hit_region->foff; - ptr_t pa = pmm_alloc_page(0); - - if (!pa) { - goto oom; - } - - cpu_flush_page((ptr_t)pte); - *pte = pa | region_ptattr(hit_region); - - memset((void*)ptr, 0, PG_SIZE); - - int errno = file->ops->read_page(file->inode, (void*)ptr, mfile_off); - - if (errno < 0) { - ERROR("fail to populate page (%d)", errno); - goto segv_term; - } - - *pte &= ~PG_DIRTY; - - goto resolved; - } - - // page not present, might be a chance to introduce swap file? - __print_panic_msg("WIP page fault route", param); - while (1) - ; - -oom: - ERROR("out of memory"); - -segv_term: - ERROR("(pid: %d) Segmentation fault on %p (%p:%p,e=0x%x)", - __current->pid, - ptr, - param->execp->cs, - param->execp->eip, - param->execp->err_code); - - trace_printstack_isr(param); - - if (kernel_context(param)) { - ERROR("[page fault on kernel]"); - // halt kernel if segv comes from kernel space - spin(); - } - - thread_setsignal(current_thread, _SIGSEGV); - - schedule(); - // should not reach - while (1) - ; - -resolved: - cpu_flush_page(ptr); - return; -} \ No newline at end of file diff --git a/lunaix-os/arch/i386/mm/vmutils.c b/lunaix-os/arch/i386/mm/vmutils.c new file mode 100644 index 0000000..d7a98ea --- /dev/null +++ b/lunaix-os/arch/i386/mm/vmutils.c @@ -0,0 +1,23 @@ +#include +#include +#include + +ptr_t +vmm_dup_page(ptr_t pa) +{ + ptr_t new_ppg = pmm_alloc_page(0); + mount_page(PG_MOUNT_3, new_ppg); + mount_page(PG_MOUNT_4, pa); + + asm volatile("movl %1, %%edi\n" + "movl %2, %%esi\n" + "rep movsl\n" ::"c"(1024), + "r"(PG_MOUNT_3), + "r"(PG_MOUNT_4) + : "memory", "%edi", "%esi"); + + unmount_page(PG_MOUNT_3); + unmount_page(PG_MOUNT_4); + + return new_ppg; +} \ No newline at end of file diff --git a/lunaix-os/arch/i386/pcontext.c b/lunaix-os/arch/i386/pcontext.c index a0ff6a6..6b56b41 100644 --- a/lunaix-os/arch/i386/pcontext.c +++ b/lunaix-os/arch/i386/pcontext.c @@ -13,17 +13,17 @@ volatile struct x86_tss _tss = { .link = 0, bool inject_transfer_context(ptr_t vm_mnt, struct transfer_context* tctx) { - v_mapping mapping; - if (!vmm_lookupat(vm_mnt, tctx->inject, &mapping)) { + pte_t pte; + if (!vmm_lookupat(vm_mnt, tctx->inject, &pte)) { return false; } - vmm_mount_pg(PG_MOUNT_4, mapping.pa); + mount_page(PG_MOUNT_4, pte_paddr(pte)); - ptr_t mount_inject = PG_MOUNT_4 + PG_OFFSET(tctx->inject); + ptr_t mount_inject = PG_MOUNT_4 + va_offset(tctx->inject); memcpy((void*)mount_inject, &tctx->transfer, sizeof(tctx->transfer)); - vmm_unmount_pg(PG_MOUNT_4); + unmount_page(PG_MOUNT_4); return true; } diff --git a/lunaix-os/hal/char/devnull.c b/lunaix-os/hal/char/devnull.c index 410f3ba..77650b0 100644 --- a/lunaix-os/hal/char/devnull.c +++ b/lunaix-os/hal/char/devnull.c @@ -1,11 +1,11 @@ #include -#include +#include static int __null_wr_pg(struct device* dev, void* buf, size_t offset) { // do nothing - return PG_SIZE; + return PAGE_SIZE; } static int diff --git a/lunaix-os/hal/char/devzero.c b/lunaix-os/hal/char/devzero.c index 77885d7..9013a47 100644 --- a/lunaix-os/hal/char/devzero.c +++ b/lunaix-os/hal/char/devzero.c @@ -1,13 +1,13 @@ #include -#include +#include #include static int __zero_rd_pg(struct device* dev, void* buf, size_t offset) { - memset(&((u8_t*)buf)[offset], 0, PG_SIZE); - return PG_SIZE; + memset(&((u8_t*)buf)[offset], 0, PAGE_SIZE); + return PAGE_SIZE; } static int diff --git a/lunaix-os/hal/char/lxconsole.c b/lunaix-os/hal/char/lxconsole.c index 53d01bb..6387670 100644 --- a/lunaix-os/hal/char/lxconsole.c +++ b/lunaix-os/hal/char/lxconsole.c @@ -99,13 +99,13 @@ done: int __tty_write_pg(struct device* dev, void* buf, size_t offset) { - return __tty_write(dev, buf, offset, PG_SIZE); + return __tty_write(dev, buf, offset, PAGE_SIZE); } int __tty_read_pg(struct device* dev, void* buf, size_t offset) { - return __tty_read(dev, buf, offset, PG_SIZE); + return __tty_read(dev, buf, offset, PAGE_SIZE); } int diff --git a/lunaix-os/hal/rng/rngx86.c b/lunaix-os/hal/rng/rngx86.c index 0b8a586..b108740 100644 --- a/lunaix-os/hal/rng/rngx86.c +++ b/lunaix-os/hal/rng/rngx86.c @@ -1,5 +1,5 @@ #include -#include +#include #include @@ -19,8 +19,8 @@ rng_fill(void* data, size_t len) static int __rand_rd_pg(struct device* dev, void* buf, size_t offset) { - rng_fill(buf, PG_SIZE); - return PG_SIZE; + rng_fill(buf, PAGE_SIZE); + return PAGE_SIZE; } static int diff --git a/lunaix-os/includes/lunaix/compiler.h b/lunaix-os/includes/lunaix/compiler.h index c3eab58..a0b8374 100644 --- a/lunaix-os/includes/lunaix/compiler.h +++ b/lunaix-os/includes/lunaix/compiler.h @@ -1,30 +1,32 @@ #ifndef __LUNAIX_COMPILER_H #define __LUNAIX_COMPILER_H -#define likely(x) __builtin_expect(!!(x), 1) -#define unlikely(x) __builtin_expect(!!(x), 0) - -#define weak_alias(name) __attribute__((weak, alias(name))) -#define weak __attribute__((weak)) -#define noret __attribute__((noreturn)) -#define optimize(opt) __attribute__((optimize(opt))) -#define must_inline __attribute__((always_inline)) -#define must_emit __attribute__((used)) -#define unreachable __builtin_unreachable() - -#define clz(bits) __builtin_clz(bits) +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x) __builtin_expect(!!(x), 0) + +#define __section(name) __attribute__((section(name))) +#define weak_alias(name) __attribute__((weak, alias(name))) +#define optimize(opt) __attribute__((optimize(opt))) +#define weak __attribute__((weak)) +#define noret __attribute__((noreturn)) +#define must_inline __attribute__((always_inline)) +#define must_emit __attribute__((used)) +#define unreachable __builtin_unreachable() +#define no_inline __attribute__((noinline)) + +#define clz(bits) __builtin_clz(bits) #define sadd_overflow(a, b, of) __builtin_sadd_overflow(a, b, of) #define umul_overflow(a, b, of) __builtin_umul_overflow(a, b, of) -#define offsetof(f, m) __builtin_offsetof(f, m) +#define offsetof(f, m) __builtin_offsetof(f, m) -#define prefetch_rd(ptr, ll) __builtin_prefetch((ptr), 0, ll) -#define prefetch_wr(ptr, ll) __builtin_prefetch((ptr), 1, ll) +#define prefetch_rd(ptr, ll) __builtin_prefetch((ptr), 0, ll) +#define prefetch_wr(ptr, ll) __builtin_prefetch((ptr), 1, ll) #define stringify(v) #v #define stringify__(v) stringify(v) -#define compact __attribute__((packed)) -#define align(v) __attribute__((aligned (v))) +#define compact __attribute__((packed)) +#define align(v) __attribute__((aligned (v))) #define export_symbol(domain, namespace, symbol)\ typeof(symbol)* must_emit __SYMEXPORT_Z##domain##_N##namespace##_S##symbol = &(symbol) diff --git a/lunaix-os/includes/lunaix/mm/fault.h b/lunaix-os/includes/lunaix/mm/fault.h new file mode 100644 index 0000000..77aab84 --- /dev/null +++ b/lunaix-os/includes/lunaix/mm/fault.h @@ -0,0 +1,49 @@ +#ifndef __LUNAIX_FAULT_H +#define __LUNAIX_FAULT_H + +#include +#include +#include + +#define RESOLVE_OK ( 0b000001 ) +#define NO_PREALLOC ( 0b000010 ) + +struct fault_context +{ + isr_param* ictx; + + struct + { + pte_t* fault_ptep; + ptr_t fault_va; + ptr_t fault_data; + ptr_t fault_instn; + }; // arch-dependent fault state + + pte_t fault_pte; // the fault-causing pte + ptr_t fault_refva; // referneced va, for ptep fault, equals to fault_va otherwise + pte_t resolving; // the pte that will resolve the fault + + ptr_t prealloc_pa; // preallocated physical page in-case of empty fault-pte + + bool kernel_vmfault:1; // faulting address that is kernel + bool ptep_fault:1; // faulting address is a ptep + bool remote_fault:1; // referenced faulting address is remote vms + bool kernel_access:1; // kernel mem access causing the fault + + struct proc_mm* mm; // process memory space associated with fault, might be remote + struct mm_region* vmr; + + int resolve_type; +}; + +bool +__arch_prepare_fault_context(struct fault_context* context); + +static inline void +fault_resolved(struct fault_context* fault, pte_t resolved, int flags) +{ + fault->resolving = resolved; + fault->resolve_type |= (flags | RESOLVE_OK); +} +#endif /* __LUNAIX_FAULT_H */ diff --git a/lunaix-os/includes/lunaix/mm/mm.h b/lunaix-os/includes/lunaix/mm/mm.h index 31954d3..d5b8edd 100644 --- a/lunaix-os/includes/lunaix/mm/mm.h +++ b/lunaix-os/includes/lunaix/mm/mm.h @@ -2,6 +2,9 @@ #define __LUNAIX_MM_H #include +#include + +#include #include @@ -42,4 +45,30 @@ #define REGION_TYPE_STACK (4 << 16) #define REGION_TYPE_VARS (5 << 16) +struct mm_region +{ + struct llist_header head; // must be first field! + struct proc_mm* proc_vms; + + // file mapped to this region + struct v_file* mfile; + // mapped file offset + off_t foff; + // mapped file length + u32_t flen; // XXX it seems that we don't need this actually.. + + ptr_t start; + ptr_t end; + u32_t attr; + + void** index; // fast reference, to accelerate access to this very region. + + void* data; + // when a region is copied + void (*region_copied)(struct mm_region*); + // when a region is unmapped + void (*destruct_region)(struct mm_region*); +}; + + #endif /* __LUNAIX_MM_H */ diff --git a/lunaix-os/includes/lunaix/mm/page.h b/lunaix-os/includes/lunaix/mm/page.h deleted file mode 100644 index c28cca0..0000000 --- a/lunaix-os/includes/lunaix/mm/page.h +++ /dev/null @@ -1,110 +0,0 @@ -#ifndef __LUNAIX_PAGE_H -#define __LUNAIX_PAGE_H -#include - -#define PG_SIZE_BITS 12 -#define PG_SIZE (1 << PG_SIZE_BITS) - -#define PG_MAX_ENTRIES 1024U -#define PG_LAST_TABLE PG_MAX_ENTRIES - 1 -#define PG_FIRST_TABLE 0 - -#define PTE_NULL 0 - -#define PG_ALIGN(addr) ((ptr_t)(addr)&0xFFFFF000UL) -#define PG_MOD(addr) ((ptr_t)(addr) & (PG_SIZE - 1)) -#define PG_ALIGNED(addr) (!((ptr_t)(addr)&0x00000FFFUL)) -#define PN(addr) (((ptr_t)(addr) >> 12)) - -#define L1_INDEX(vaddr) (u32_t)(((ptr_t)(vaddr)&0xFFC00000UL) >> 22) -#define L2_INDEX(vaddr) (u32_t)(((ptr_t)(vaddr)&0x003FF000UL) >> 12) -#define PG_OFFSET(vaddr) (u32_t)((ptr_t)(vaddr)&0x00000FFFUL) - -#define GET_PT_ADDR(pde) PG_ALIGN(pde) -#define GET_PG_ADDR(pte) PG_ALIGN(pte) - -#define IS_CACHED(entry) ((entry & 0x1)) - -#define PG_PRESENT (0x1) -#define PG_DIRTY (1 << 6) -#define PG_ACCESSED (1 << 5) -#define PG_WRITE (0x1 << 1) -#define PG_ALLOW_USER (0x1 << 2) -#define PG_WRITE_THROUGH (1 << 3) -#define PG_DISABLE_CACHE (1 << 4) -#define PG_PDE_4MB (1 << 7) - -#define PG_IS_DIRTY(pte) ((pte)&PG_DIRTY) -#define PG_IS_ACCESSED(pte) ((pte)&PG_ACCESSED) -#define PG_IS_PRESENT(pte) ((pte)&PG_PRESENT) - -#define NEW_L1_ENTRY(flags, pt_addr) \ - (PG_ALIGN(pt_addr) | (((flags) | PG_WRITE_THROUGH) & 0xfff)) -#define NEW_L2_ENTRY(flags, pg_addr) (PG_ALIGN(pg_addr) | ((flags)&0xfff)) - -#define V_ADDR(pd, pt, offset) ((pd) << 22 | (pt) << 12 | (offset)) -#define P_ADDR(ppn, offset) ((ppn << 12) | (offset)) - -#define PG_ENTRY_FLAGS(entry) ((entry)&0xFFFU) -#define PG_ENTRY_ADDR(entry) ((entry) & ~0xFFFU) - -#define HAS_FLAGS(entry, flags) ((PG_ENTRY_FLAGS(entry) & (flags)) == flags) -#define CONTAINS_FLAGS(entry, flags) (PG_ENTRY_FLAGS(entry) & (flags)) - -#define PG_PREM_R (PG_PRESENT) -#define PG_PREM_RW (PG_PRESENT | PG_WRITE) -#define PG_PREM_UR (PG_PRESENT | PG_ALLOW_USER) -#define PG_PREM_URW (PG_PRESENT | PG_WRITE | PG_ALLOW_USER) - -// 用于对PD进行循环映射,因为我们可能需要对PD进行频繁操作,我们在这里禁用TLB缓存 -#define T_SELF_REF_PERM PG_PREM_RW | PG_DISABLE_CACHE | PG_WRITE_THROUGH - -// 页目录的虚拟基地址,可以用来访问到各个PDE -#define L1_BASE_VADDR 0xFFFFF000U - -// 页表的虚拟基地址,可以用来访问到各个PTE -#define L2_BASE_VADDR 0xFFC00000U - -// 用来获取特定的页表的虚拟地址 -#define L2_VADDR(pd_offset) (L2_BASE_VADDR | (pd_offset << 12)) - -typedef unsigned long ptd_t; -typedef unsigned long pt_t; -typedef unsigned int pt_attr; -typedef u32_t x86_pte_t; - -/** - * @brief 虚拟映射属性 - * - */ -typedef struct -{ - // 虚拟页地址 - ptr_t va; - // 物理页码(如果不存在映射,则为0) - u32_t pn; - // 物理页地址(如果不存在映射,则为0) - ptr_t pa; - // 映射的flags - u16_t flags; - // PTE地址 - x86_pte_t* pte; -} v_mapping; - -typedef struct -{ - x86_pte_t entry[PG_MAX_ENTRIES]; -} __attribute__((packed, aligned(4))) x86_page_table; - -/* 四个页挂载点,两个页目录挂载点: 用于临时创建&编辑页表 */ -#define PG_MOUNT_RANGE(l1_index) (701 <= l1_index && l1_index <= 703) - -/* - 当前进程内存空间挂载点 -*/ -#define VMS_SELF L2_BASE_VADDR - -#define PTE_MOUNTED(mnt, vpn) \ - (((x86_page_table*)((mnt) | (((vpn)&0xffc00) << 2)))->entry[(vpn)&0x3ff]) - -#endif /* __LUNAIX_PAGE_H */ diff --git a/lunaix-os/includes/lunaix/mm/pagetable.h b/lunaix-os/includes/lunaix/mm/pagetable.h new file mode 100644 index 0000000..4c2fce5 --- /dev/null +++ b/lunaix-os/includes/lunaix/mm/pagetable.h @@ -0,0 +1,507 @@ +#ifndef __LUNAIX_PAGETABLE_H +#define __LUNAIX_PAGETABLE_H + +/* + Defines page related attributes for different page table + hierarchies. In Lunaix, we define five arch-agnostic alias + to those arch-dependent hierarchies: + + + L0T: Level-0 Table, the root page table + + L1T: Level-1 Table, indexed by L0P entries + + L2T: Level-2 Table, indexed by L1P entries + + L3T: Level-3 Table, indexed by L2P entries + + LFT: Leaf-Level Table (Level-4), indexed by L3P entries + + Therefore, "LnTE" can be used to refer "Entry in a Level-n Table". + Consequently, we can further define + + + LnTEP - pointer to an entry within LnT + + LnTP - pointer to (the first entry of) LnT + + To better identify all derived value from virtual and physical + adress, we defines: + + + Virtual Address Space (VAS): + A set of all virtual addresses that can be interpreted + by page table walker. + + + Virtual Mappings Space (VMS): + A imaginary linear table compose a set of mappings that + define the translation rules from virtual memory address + space to physical memory address. (Note: mappings are stored + in the hierarchy of page tables, however it is transparent, + just as indexing into a big linear table, thus 'imaginary') + A VMS is a realisation of a VAS. + + + Virtual Mappings Mounting (VM_MNT or MNT): + A subregion within current VAS for where the VA/PA mappings + of another VMS are accessible. + + + Page Frame Number (PFN): + Index of a page with it's base size. For most machine, it + is 4K. Note, this is not limited to physical address, for + virtual address, this is the index of a virtual page within + it's VMS. + + + Virtual Frame Number (VFN): + Index of a virtual page within it's parent page table. It's + range is bounded aboved by maximium number of PTEs per page + table + + In the context of x86 archiecture (regardless x86_32 or x86_64), + these naming have the following realisations: + + + L0T: PD (32-Bit 2-Level paging) + PML4 (4-Level Paging) + PML5 (5-Level Paging) + + + L1T: ( N/A ) (32-Bit 2-Level paging) + PDP (4-Level Paging) + PML4 (5-Level Paging) + + + L2T: ( N/A ) (32-Bit 2-Level paging) + PD (4-Level Paging) + PDP (5-Level Paging) + + + L3T: ( N/A ) (32-Bit 2-Level paging) + ( N/A ) (4-Level Paging) + PD (5-Level Paging) + + + LFT: Page Table (All) + + + VAS: + [0, 2^32) (32-Bit 2-Level paging) + [0, 2^48) (4-Level Paging) + [0, 2^57) (5-Level Paging) + + + PFN (Vitrual): + [0, 2^22) (32-Bit 2-Level paging) + [0, 2^36) (4-Level Paging) + [0, 2^45) (5-Level Paging) + + + PFN (Physical): + [0, 2^32) (x86_32) + [0, 2^52) (x86_64, all paging modes) + + + VFN: [0, 1024) (32-Bit 2-Level paging) + [0, 512) (4-Level Paging) + [0, 512) (5-Level Paging) + + In addition, we also defines VMS_{MASK|SIZE} to provide + information about maximium size of addressable virtual + memory space (hence VMS). Which is effectively a + "Level -1 Page" (i.e., _PAGE_Ln_SIZE(-1)) + + +*/ + +struct __pte; +typedef struct __pte pte_t; + + +#include +#include + +#define _LnTEP_AT(vm_mnt, sz) ( ((vm_mnt) | L0T_MASK) & ~(sz) ) +#define _L0TEP_AT(vm_mnt) ( ((vm_mnt) | L0T_MASK) & ~LFT_MASK ) +#define _L1TEP_AT(vm_mnt) ( ((vm_mnt) | L0T_MASK) & ~L3T_MASK ) +#define _L2TEP_AT(vm_mnt) ( ((vm_mnt) | L0T_MASK) & ~L2T_MASK ) +#define _L3TEP_AT(vm_mnt) ( ((vm_mnt) | L0T_MASK) & ~L1T_MASK ) +#define _LFTEP_AT(vm_mnt) ( ((vm_mnt) | L0T_MASK) & ~L0T_MASK ) + +#define _VM_OF(ptep) ( (ptr_t)(ptep) & ~L0T_MASK ) +#define _VM_PFN_OF(ptep) ( ((ptr_t)(ptep) & L0T_MASK) / sizeof(pte_t) ) +#define VMS_SELF ( ~L0T_MASK & VMS_MASK ) + +#define __LnTI_OF(ptep, n)\ + (_VM_PFN_OF(ptep) * LFT_SIZE / L##n##T_SIZE) + +#define __LnTEP(ptep, va, n)\ + ( (pte_t*)_L##n##TEP_AT(_VM_OF(ptep)) + (((va) & VMS_MASK) / L##n##T_SIZE) ) + +#define __LnTEP_OF(ptep, n)\ + ( (pte_t*)_L##n##TEP_AT(_VM_OF(ptep)) + __LnTI_OF(ptep, n)) + +#define __LnTEP_SHIFT_NEXT(ptep)\ + ( (pte_t*)(_VM_OF(ptep) | ((_VM_PFN_OF(ptep) * LFT_SIZE) & L0T_MASK)) ) + +#define _has_LnT(n) (L##n##T_SIZE != LFT_SIZE) +#define LnT_ENABLED(n) _has_LnT(n) + +#define ptep_with_level(ptep, lvl_size) \ + ({ \ + ptr_t __p = _LnTEP_AT(_VM_OF(ptep), lvl_size); \ + ((ptr_t)(ptep) & __p) == __p; \ + }) + +pte_t +vmm_alloc_page(pte_t* ptep, pte_t pte); + +/** + * @brief Try page walk to the pte pointed by ptep and + * allocate any missing level-table en-route + * + * @param ptep + * @param va + */ +void +ptep_alloc_hierarchy(pte_t* ptep, ptr_t va, pte_attr_t prot); + +static inline bool +__alloc_level(pte_t* ptep, pte_t pte, pte_attr_t prot) +{ + if (!pte_isnull(pte)) { + return true; + } + + pte = pte_setprot(pte, prot); + return !pte_isnull(vmm_alloc_page(ptep, pte)); +} + +/** + * @brief Get the page frame number encoded in ptep + * + * @param ptep + * @return ptr_t + */ +static inline ptr_t +ptep_pfn(pte_t* ptep) +{ + return _VM_PFN_OF(ptep); +} + +/** + * @brief Get the virtual frame number encoded in ptep + * + * @param ptep + * @return ptr_t + */ +static inline unsigned int +ptep_vfn(pte_t* ptep) +{ + return ((ptr_t)ptep & PAGE_MASK) / sizeof(pte_t); +} + +static inline ptr_t +ptep_va(pte_t* ptep, size_t lvl_size) +{ + return ((ptr_t)ptep) / sizeof(pte_t) * lvl_size; +} + +static inline ptr_t +ptep_vm_mnt(pte_t* ptep) +{ + return _VM_OF(ptep); +} + +/** + * @brief Make a L0TEP from given ptep + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +mkl0tep(pte_t* ptep) +{ + return __LnTEP_OF(ptep, 0); +} + +/** + * @brief Make a L1TEP from given ptep + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +mkl1tep(pte_t* ptep) +{ + return __LnTEP_OF(ptep, 1); +} + +/** + * @brief Make a L2TEP from given ptep + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +mkl2tep(pte_t* ptep) +{ + return __LnTEP_OF(ptep, 2); +} + +/** + * @brief Make a L3TEP from given ptep + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +mkl3tep(pte_t* ptep) +{ + return __LnTEP_OF(ptep, 3); +} + +/** + * @brief Create the L1T pointed by L0TE + * + * @param l0t_ptep + * @param va + * @return pte_t* + */ +static inline pte_t* +mkl1t(pte_t* l0t_ptep, ptr_t va, pte_attr_t prot) +{ +#if _has_LnT(1) + if (!l0t_ptep) { + return NULL; + } + + pte_t pte = pte_at(l0t_ptep); + + if (pte_huge(pte)) { + return l0t_ptep; + } + + return __alloc_level(l0t_ptep, pte, prot) + ? __LnTEP(l0t_ptep, va, 1) + : NULL; +#else + return l0t_ptep; +#endif +} + +/** + * @brief Create the L2T pointed by L1TE + * + * @param l0t_ptep + * @param va + * @return pte_t* + */ +static inline pte_t* +mkl2t(pte_t* l1t_ptep, ptr_t va, pte_attr_t prot) +{ +#if _has_LnT(2) + if (!l1t_ptep) { + return NULL; + } + + pte_t pte = pte_at(l1t_ptep); + + if (pte_huge(pte)) { + return l1t_ptep; + } + + return __alloc_level(l1t_ptep, pte, prot) + ? __LnTEP(l1t_ptep, va, 2) + : NULL; +#else + return l1t_ptep; +#endif +} + +/** + * @brief Create the L3T pointed by L2TE + * + * @param l0t_ptep + * @param va + * @return pte_t* + */ +static inline pte_t* +mkl3t(pte_t* l2t_ptep, ptr_t va, pte_attr_t prot) +{ +#if _has_LnT(3) + if (!l2t_ptep) { + return NULL; + } + + pte_t pte = pte_at(l2t_ptep); + + if (pte_huge(pte)) { + return l2t_ptep; + } + + return __alloc_level(l2t_ptep, pte, prot) + ? __LnTEP(l2t_ptep, va, 3) + : NULL; +#else + return l2t_ptep; +#endif +} + +/** + * @brief Create the LFT pointed by L3TE + * + * @param l0t_ptep + * @param va + * @return pte_t* + */ +static inline pte_t* +mklft(pte_t* l3t_ptep, ptr_t va, pte_attr_t prot) +{ + if (!l3t_ptep) { + return NULL; + } + + pte_t pte = pte_at(l3t_ptep); + + if (pte_huge(pte)) { + return l3t_ptep; + } + + return __alloc_level(l3t_ptep, pte, prot) + ? __LnTEP(l3t_ptep, va, F) + : NULL; +} + +static inline pte_t* +getl1tep(pte_t* l0t_ptep, ptr_t va) { +#if _has_LnT(1) + return __LnTEP(l0t_ptep, va, 1); +#else + return l0t_ptep; +#endif +} + +static inline pte_t* +getl2tep(pte_t* l1t_ptep, ptr_t va) { +#if _has_LnT(2) + return __LnTEP(l1t_ptep, va, 2); +#else + return l1t_ptep; +#endif +} + +static inline pte_t* +getl3tep(pte_t* l2t_ptep, ptr_t va) { +#if _has_LnT(3) + return __LnTEP(l2t_ptep, va, 3); +#else + return l2t_ptep; +#endif +} + +static inline pte_t* +getlftep(pte_t* l3t_ptep, ptr_t va) { + return __LnTEP(l3t_ptep, va, F); +} + +static inline unsigned int +l0te_index(pte_t* ptep) { + return __LnTI_OF(ptep, 1); +} + +static inline unsigned int +l1te_index(pte_t* ptep) { + return __LnTI_OF(ptep, 2); +} + +static inline unsigned int +l2te_index(pte_t* ptep) { + return __LnTI_OF(ptep, 3); +} + +static inline unsigned int +l3te_index(pte_t* ptep) { + return __LnTI_OF(ptep, F); +} + +static inline pfn_t +pfn(ptr_t addr) { + return (addr / PAGE_SIZE) & VMS_MASK; +} + +static inline size_t +leaf_count(size_t size) { + return (size + PAGE_MASK) / PAGE_SIZE; +} + +static inline size_t +page_count(size_t size, size_t page_size) { + return (size + (page_size - 1)) / page_size; +} + +static inline unsigned int +va_offset(ptr_t addr) { + return addr & PAGE_MASK; +} + +static inline ptr_t +page_addr(ptr_t pfn) { + return pfn * PAGE_SIZE; +} + +static inline ptr_t +va_align(ptr_t va) { + return va & ~PAGE_MASK; +} + +static inline ptr_t +va_alignup(ptr_t va) { + return (va + PAGE_MASK) & ~PAGE_MASK; +} + +static inline pte_t* +mkptep_va(ptr_t vm_mnt, ptr_t vaddr) +{ + return (pte_t*)(vm_mnt & ~L0T_MASK) + pfn(vaddr); +} + +static inline pte_t* +mkptep_pn(ptr_t vm_mnt, ptr_t pn) +{ + return (pte_t*)(vm_mnt & ~L0T_MASK) + (pn & L0T_MASK); +} + +static inline pfn_t +pfn_at(ptr_t va, size_t lvl_size) { + return va / lvl_size; +} + + +/** + * @brief Shift the ptep such that it points to an + * immediate lower level of page table + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +ptep_step_into(pte_t* ptep) +{ + return __LnTEP_SHIFT_NEXT(ptep); +} + +/** + * @brief Shift the ptep such that it points to an + * immediate upper level of page table + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +ptep_step_out(pte_t* ptep) +{ + ptr_t unshifted = (ptr_t)mkptep_pn(VMS_SELF, ptep_pfn(ptep)); + return mkptep_va(_VM_OF(ptep), unshifted); +} + +/** + * @brief Make a L0TEP from given mnt and va + * + * @param ptep + * @return pte_t* + */ +static inline pte_t* +mkl0tep_va(ptr_t mnt, ptr_t va) +{ + return mkl0tep(mkptep_va(mnt, va)); +} + +static inline bool +pt_last_level(int level) +{ + return level == _PTW_LEVEL - 1; +} + +#endif /* __LUNAIX_PAGETABLE_H */ diff --git a/lunaix-os/includes/lunaix/mm/pmm.h b/lunaix-os/includes/lunaix/mm/pmm.h index ef9ec0b..fa22f7b 100644 --- a/lunaix-os/includes/lunaix/mm/pmm.h +++ b/lunaix-os/includes/lunaix/mm/pmm.h @@ -13,13 +13,13 @@ * @brief 长久页:不会被缓存,但允许释放 * */ -#define PP_FGPERSIST 0x1 +#define PP_FGPERSIST 0b00001 /** * @brief 锁定页:不会被缓存,不能被释放 * */ -#define PP_FGLOCKED 0x2 +#define PP_FGLOCKED 0b00011 typedef u32_t pp_attr_t; @@ -96,15 +96,42 @@ struct pp_struct* pmm_query(ptr_t pa); /** - * @brief 释放一个已分配的物理页,假若页地址不存在,则无操作。 + * @brief Free physical page with given attributes + * + * @param page + * @return int + */ +int +pmm_free_one(ptr_t page, pp_attr_t attr_mask); + +/** + * @brief Free a normal physical page * * @param page 页地址 * @return 是否成功 */ -int -pmm_free_page(ptr_t page); +static inline int +pmm_free_page(ptr_t page) +{ + return pmm_free_one(page, 0); +} + +/** + * @brief Free physical page regardless of it's attribute + * + * @param page + * @return int + */ +static inline int +pmm_free_any(ptr_t page) +{ + return pmm_free_one(page, -1); +} int pmm_ref_page(ptr_t page); +void +pmm_set_attr(ptr_t page, pp_attr_t attr); + #endif /* __LUNAIX_PMM_H */ diff --git a/lunaix-os/includes/lunaix/mm/procvm.h b/lunaix-os/includes/lunaix/mm/procvm.h index 295d93a..820bf1e 100644 --- a/lunaix-os/includes/lunaix/mm/procvm.h +++ b/lunaix-os/includes/lunaix/mm/procvm.h @@ -5,41 +5,17 @@ #include #include #include +#include struct proc_mm; struct proc_info; -struct mm_region -{ - struct llist_header head; // must be first field! - struct proc_mm* proc_vms; - - // file mapped to this region - struct v_file* mfile; - // mapped file offset - off_t foff; - // mapped file length - u32_t flen; // XXX it seems that we don't need this actually.. - - ptr_t start; - ptr_t end; - u32_t attr; - - void** index; // fast reference, to accelerate access to this very region. - - void* data; - // when a region is copied - void (*region_copied)(struct mm_region*); - // when a region is unmapped - void (*destruct_region)(struct mm_region*); -}; - struct remote_vmctx { ptr_t vms_mnt; ptr_t local_mnt; ptr_t remote; - size_t page_cnt; + pfn_t page_cnt; }; @@ -55,10 +31,13 @@ typedef struct llist_header vm_regions_t; struct proc_mm { // virtual memory root (i.e. root page table) - ptr_t vmroot; - vm_regions_t regions; + ptr_t vmroot; + ptr_t vm_mnt; // current mount point + vm_regions_t regions; + struct mm_region* heap; struct proc_info* proc; + struct proc_mm* guest_mm; // vmspace mounted by this vmspace }; /** @@ -71,26 +50,45 @@ struct proc_mm* procvm_create(struct proc_info* proc); /** - * @brief Initialize the vm of `proc` to duplication of current process + * @brief Initialize and mount the vm of `proc` to duplication of current process * * @param proc * @return struct proc_mm* */ void -procvm_dup(struct proc_info* proc); +procvm_dupvms_mount(struct proc_mm* proc); void -procvm_cleanup(ptr_t vm_mnt, struct proc_info* proc); +procvm_unmount_release(struct proc_mm* proc); +void +procvm_mount(struct proc_mm* mm); + +void +procvm_unmount(struct proc_mm* mm); /** - * @brief Initialize the vm of `proc` as a clean slate which contains + * @brief Initialize and mount the vms of `proc` as a clean slate which contains * nothing but shared global mapping of kernel image. * * @param proc */ void -procvm_init_clean(struct proc_info* proc); +procvm_initvms_mount(struct proc_mm* mm); + + +/* + Mount and unmount from VMS_SELF. + Although every vms is mounted to that spot by default, + this just serve the purpose to ensure the scheduled + vms does not dangling in some other's vms. +*/ + +void +procvm_mount_self(struct proc_mm* mm); + +void +procvm_unmount_self(struct proc_mm* mm); /* @@ -101,13 +99,13 @@ procvm_init_clean(struct proc_info* proc); ptr_t procvm_enter_remote_transaction(struct remote_vmctx* rvmctx, struct proc_mm* mm, - ptr_t vm_mnt, ptr_t remote_base, size_t size); + ptr_t remote_base, size_t size); int procvm_copy_remote(struct remote_vmctx* rvmctx, ptr_t remote_dest, void* local_src, size_t sz); void -procvm_exit_remote_transaction(struct remote_vmctx* rvmctx); +procvm_exit_remote(struct remote_vmctx* rvmctx); #endif /* __LUNAIX_PROCVM_H */ diff --git a/lunaix-os/includes/lunaix/mm/region.h b/lunaix-os/includes/lunaix/mm/region.h index d030f6d..cc7d3df 100644 --- a/lunaix-os/includes/lunaix/mm/region.h +++ b/lunaix-os/includes/lunaix/mm/region.h @@ -2,7 +2,6 @@ #define __LUNAIX_REGION_H #include -#include #include #define prev_region(vmr) list_prev(vmr, struct mm_region, head) @@ -31,6 +30,21 @@ region_size(struct mm_region* mm) { return mm->end - mm->start; } +static inline bool +anon_region(struct mm_region* mm) { + return (mm->attr & REGION_ANON); +} + +static inline bool +writable_region(struct mm_region* mm) { + return !!(mm->attr & (REGION_RSHARED | REGION_WRITE)); +} + +static inline bool +shared_writable_region(struct mm_region* mm) { + return !!(mm->attr & REGION_WSHARED); +} + struct mm_region* region_create(ptr_t start, ptr_t end, u32_t attr); @@ -56,17 +70,10 @@ region_copy_mm(struct proc_mm* src, struct proc_mm* dest); struct mm_region* region_dup(struct mm_region* origin); -static u32_t -region_ptattr(struct mm_region* vmr) +static inline pte_attr_t +region_pteprot(struct mm_region* vmr) { - u32_t vmr_attr = vmr->attr; - u32_t ptattr = PG_PRESENT | PG_ALLOW_USER; - - if ((vmr_attr & PROT_WRITE)) { - ptattr |= PG_WRITE; - } - - return ptattr & 0xfff; + return translate_vmr_prot(vmr->attr); } #endif /* __LUNAIX_REGION_H */ diff --git a/lunaix-os/includes/lunaix/mm/vmm.h b/lunaix-os/includes/lunaix/mm/vmm.h index 74cba52..a3da8c0 100644 --- a/lunaix-os/includes/lunaix/mm/vmm.h +++ b/lunaix-os/includes/lunaix/mm/vmm.h @@ -1,6 +1,7 @@ #ifndef __LUNAIX_VMM_H #define __LUNAIX_VMM_H -#include + +#include #include #include // Virtual memory manager @@ -38,14 +39,6 @@ void vmm_init(); -/** - * @brief 创建一个页目录 - * - * @return ptd_entry* 页目录的物理地址,随时可以加载进CR3 - */ -x86_page_table* -vmm_init_pd(); - /** * @brief 在指定地址空间中,添加一个映射 * @@ -56,7 +49,37 @@ vmm_init_pd(); * @return int */ int -vmm_set_mapping(ptr_t mnt, ptr_t va, ptr_t pa, pt_attr attr, int options); +vmm_set_mapping(ptr_t mnt, ptr_t va, ptr_t pa, pte_attr_t prot); + +static inline void +vmm_set_ptes_contig(pte_t* ptep, pte_t pte, size_t lvl_size, size_t n) +{ + do { + set_pte(ptep, pte); + pte_val(pte) += lvl_size; + ptep++; + } while (--n > 0); +} + +static inline void +vmm_set_ptes(pte_t* ptep, pte_t pte, size_t n) +{ + do { + set_pte(ptep, pte); + ptep++; + } while (--n > 0); +} + + +static inline void +vmm_unset_ptes(pte_t* ptep, size_t n) +{ + do { + set_pte(ptep, null_pte); + ptep++; + } while (--n > 0); +} + /** * @brief 删除一个映射 @@ -69,14 +92,8 @@ vmm_set_mapping(ptr_t mnt, ptr_t va, ptr_t pa, pt_attr attr, int options); ptr_t vmm_del_mapping(ptr_t mnt, ptr_t va); -/** - * @brief 在当前虚拟地址空间里查找一个映射 - * - * @param va 虚拟地址 - * @param mapping 映射相关属性 - */ -int -vmm_lookup(ptr_t va, v_mapping* mapping); +pte_t +vmm_tryptep(pte_t* ptep, size_t lvl_size); /** * @brief 在指定的虚拟地址空间里查找一个映射 @@ -86,8 +103,15 @@ vmm_lookup(ptr_t va, v_mapping* mapping); * @param mapping 映射相关属性 * @return int */ -int -vmm_lookupat(ptr_t mnt, ptr_t va, v_mapping* mapping); +static inline bool +vmm_lookupat(ptr_t mnt, ptr_t va, pte_t* pte_out) +{ + pte_t pte = vmm_tryptep(mkptep_va(mnt, va), LFT_SIZE); + *pte_out = pte; + + return !pte_isnull(pte); +} + /** * @brief (COW) 为虚拟页创建副本。 @@ -105,25 +129,28 @@ vmm_dup_page(ptr_t pa); * @return ptr_t */ ptr_t -vmm_mount_pd(ptr_t mnt, ptr_t pde); +vms_mount(ptr_t mnt, ptr_t pde); /** * @brief 卸载已挂载的虚拟地址空间 * */ ptr_t -vmm_unmount_pd(ptr_t mnt); +vms_unmount(ptr_t mnt); static inline ptr_t -vmm_mount_pg(ptr_t mnt, ptr_t pa) { +mount_page(ptr_t mnt, ptr_t pa) { assert(pa); - vmm_set_mapping(VMS_SELF, mnt, pa, PG_PREM_RW, 0); + pte_t* ptep = mkptep_va(VMS_SELF, mnt); + set_pte(ptep, mkpte(pa, KERNEL_DATA)); + cpu_flush_page(mnt); return mnt; } static inline ptr_t -vmm_unmount_pg(ptr_t mnt) { - vmm_del_mapping(VMS_SELF, mnt); +unmount_page(ptr_t mnt) { + pte_t* ptep = mkptep_va(VMS_SELF, mnt); + set_pte(ptep, null_pte); return mnt; } @@ -133,15 +160,6 @@ vmm_ioremap(ptr_t paddr, size_t size); void* vmm_next_free(ptr_t start, int options); -/** - * @brief 将当前地址空间的虚拟地址转译为物理地址。 - * - * @param va 虚拟地址 - * @return void* - */ -ptr_t -vmm_v2p(ptr_t va); - /** * @brief 将指定地址空间的虚拟地址转译为物理地址 * @@ -152,54 +170,58 @@ vmm_v2p(ptr_t va); ptr_t vmm_v2pat(ptr_t mnt, ptr_t va); -/* - 表示一个 vmap 区域 - (One must not get confused with vmap_area in Linux!) -*/ -struct vmap_area -{ - ptr_t start; - size_t size; - pt_attr area_attr; -}; - /** - * @brief 将连续的物理地址空间映射到内核虚拟地址空间 + * @brief 将当前地址空间的虚拟地址转译为物理地址。 * - * @param paddr 物理地址空间的基地址 - * @param size 物理地址空间的大小 + * @param va 虚拟地址 * @return void* */ -void* -vmap(ptr_t paddr, size_t size, pt_attr attr, int flags); +static inline ptr_t +vmm_v2p(ptr_t va) +{ + return vmm_v2pat(VMS_SELF, va); +} /** - * @brief 创建一个 vmap 区域 - * - * @param paddr - * @param attr - * @return ptr_t + * @brief Maps a number of contiguous ptes in kernel + * address space + * + * @param pte the pte to be mapped + * @param lvl_size size of the page pointed by the given pte + * @param n number of ptes + * @return ptr_t */ -struct vmap_area* -vmap_varea(size_t size, pt_attr attr); +ptr_t +vmap_ptes_at(pte_t pte, size_t lvl_size, int n); /** - * @brief 在 vmap区域内映射一个单页 - * - * @param paddr - * @param attr - * @return ptr_t + * @brief Maps a number of contiguous ptes in kernel + * address space (leaf page size) + * + * @param pte the pte to be mapped + * @param n number of ptes + * @return ptr_t */ -ptr_t -vmap_area_page(struct vmap_area* area, ptr_t paddr, pt_attr attr); +static inline ptr_t +vmap_leaf_ptes(pte_t pte, int n) +{ + return vmap_ptes_at(pte, LFT_SIZE, n); +} /** - * @brief 在 vmap区域删除一个已映射的页 - * - * @param paddr - * @return ptr_t + * @brief Maps a contiguous range of physical address + * into kernel address space (leaf page size) + * + * @param paddr start of the physical address range + * @param size size of the physical range + * @param prot default protection to be applied + * @return ptr_t */ -ptr_t -vmap_area_rmpage(struct vmap_area* area, ptr_t vaddr); +static inline ptr_t +vmap(ptr_t paddr, size_t size, pte_attr_t prot) +{ + pte_t _pte = mkpte(paddr, prot); + return vmap_ptes_at(_pte, LFT_SIZE, leaf_count(size)); +} #endif /* __LUNAIX_VMM_H */ diff --git a/lunaix-os/includes/lunaix/process.h b/lunaix-os/includes/lunaix/process.h index 78756de..f4a526f 100644 --- a/lunaix-os/includes/lunaix/process.h +++ b/lunaix-os/includes/lunaix/process.h @@ -6,7 +6,7 @@ #include #include #include -#include +#include #include #include #include @@ -167,19 +167,19 @@ set_current_executing(struct thread* thread) static inline struct proc_mm* vmspace(struct proc_info* proc) { - return proc->mm; + return proc ? proc->mm : NULL; } static inline ptr_t vmroot(struct proc_info* proc) { - return proc->mm->vmroot; + return proc ? proc->mm->vmroot : 0; } static inline vm_regions_t* vmregions(struct proc_info* proc) { - return &proc->mm->regions; + return proc ? &proc->mm->regions : NULL; } static inline void @@ -302,7 +302,7 @@ struct thread* alloc_thread(struct proc_info* process); void -destory_thread(ptr_t vm_mnt, struct thread* thread); +destory_thread(struct thread* thread); void terminate_thread(struct thread* thread, ptr_t val); @@ -311,26 +311,26 @@ void terminate_current_thread(ptr_t val); struct thread* -create_thread(struct proc_info* proc, ptr_t vm_mnt, bool with_ustack); +create_thread(struct proc_info* proc, bool with_ustack); void -start_thread(struct thread* th, ptr_t vm_mnt, ptr_t entry); +start_thread(struct thread* th, ptr_t entry); static inline void spawn_kthread(ptr_t entry) { assert(kernel_process(__current)); - struct thread* th = create_thread(__current, VMS_SELF, false); + struct thread* th = create_thread(__current, false); assert(th); - start_thread(th, VMS_SELF, entry); + start_thread(th, entry); } void exit_thread(void* val); void -thread_release_mem(struct thread* thread, ptr_t vm_mnt); +thread_release_mem(struct thread* thread); /* ========= Signal ========= diff --git a/lunaix-os/includes/lunaix/spike.h b/lunaix-os/includes/lunaix/spike.h index da901fc..cf40bab 100644 --- a/lunaix-os/includes/lunaix/spike.h +++ b/lunaix-os/includes/lunaix/spike.h @@ -84,7 +84,7 @@ #define must_success(statement) \ do { \ int err = (statement); \ - if (err) panickf(#statement "failed with errcode=%d", err); \ + if (err) fail(#statement " failed"); \ } while(0) #define fail(msg) __assert_fail(msg, __FILE__, __LINE__); @@ -101,9 +101,6 @@ __assert_fail(const char* expr, const char* file, unsigned int line) void noret panick(const char* msg); -void noret -panickf(const char* fmt, ...); - #define wait_until(cond) \ while (!(cond)) \ ; diff --git a/lunaix-os/includes/lunaix/syslog.h b/lunaix-os/includes/lunaix/syslog.h index 423823c..621c81a 100644 --- a/lunaix-os/includes/lunaix/syslog.h +++ b/lunaix-os/includes/lunaix/syslog.h @@ -2,6 +2,7 @@ #define __LUNAIX_SYSLOG_H #include +#include #include #define KLOG_DEBUG 0 @@ -35,7 +36,7 @@ #define FATAL(fmt, ...) \ ({ \ kprintf(KFATAL fmt, ##__VA_ARGS__); \ - spin(); \ + fail(fmt); \ }) void diff --git a/lunaix-os/kernel/block/block.c b/lunaix-os/kernel/block/block.c index bf83b5f..8ca5117 100644 --- a/lunaix-os/kernel/block/block.c +++ b/lunaix-os/kernel/block/block.c @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include #include @@ -156,7 +156,7 @@ __block_read_page(struct device* dev, void* buf, size_t offset) struct block_dev* bdev = (struct block_dev*)dev->underlay; u32_t lba = offset / bdev->blk_size + bdev->start_lba; - u32_t rd_lba = MIN(lba + PG_SIZE / bdev->blk_size, bdev->end_lba); + u32_t rd_lba = MIN(lba + PAGE_SIZE / bdev->blk_size, bdev->end_lba); if (rd_lba <= lba) { return 0; @@ -183,7 +183,7 @@ __block_write_page(struct device* dev, void* buf, size_t offset) struct block_dev* bdev = (struct block_dev*)dev->underlay; u32_t lba = offset / bdev->blk_size + bdev->start_lba; - u32_t wr_lba = MIN(lba + PG_SIZE / bdev->blk_size, bdev->end_lba); + u32_t wr_lba = MIN(lba + PAGE_SIZE / bdev->blk_size, bdev->end_lba); if (wr_lba <= lba) { return 0; diff --git a/lunaix-os/kernel/boot_helper.c b/lunaix-os/kernel/boot_helper.c index 3ac5bcb..f7ad158 100644 --- a/lunaix-os/kernel/boot_helper.c +++ b/lunaix-os/kernel/boot_helper.c @@ -1,11 +1,10 @@ #include #include -#include #include #include #include #include -#include +#include /** * @brief Reserve memory for kernel bootstrapping initialization @@ -16,34 +15,36 @@ void boot_begin(struct boot_handoff* bhctx) { bhctx->prepare(bhctx); + + // Identity-map the first 3GiB address spaces + pte_t* ptep = mkl0tep(mkptep_va(VMS_SELF, 0)); + pte_t pte = mkpte_prot(KERNEL_DATA); + size_t count = page_count(KERNEL_RESIDENT, L0T_SIZE); + + vmm_set_ptes_contig(ptep, pte_mkhuge(pte), L0T_SIZE, count); struct boot_mmapent *mmap = bhctx->mem.mmap, *mmapent; for (size_t i = 0; i < bhctx->mem.mmap_len; i++) { mmapent = &mmap[i]; - size_t size_pg = PN(ROUNDUP(mmapent->size, PG_SIZE)); + size_t size_pg = leaf_count(mmapent->size); + pfn_t start_pfn = pfn(mmapent->start); if (mmapent->type == BOOT_MMAP_FREE) { - pmm_mark_chunk_free(PN(mmapent->start), size_pg); + pmm_mark_chunk_free(start_pfn, size_pg); continue; } - - ptr_t pa = PG_ALIGN(mmapent->start); - for (size_t j = 0; j < size_pg && pa < KERNEL_EXEC; - j++, pa += PM_PAGE_SIZE) { - vmm_set_mapping(VMS_SELF, pa, pa, PG_PREM_RW, VMAP_IGNORE); - } } /* Reserve region for all loaded modules */ for (size_t i = 0; i < bhctx->mods.mods_num; i++) { struct boot_modent* mod = &bhctx->mods.entries[i]; - pmm_mark_chunk_occupied(PN(mod->start), - CEIL(mod->end - mod->start, PG_SIZE_BITS), - PP_FGLOCKED); + unsigned int counts = leaf_count(mod->end - mod->start); + + pmm_mark_chunk_occupied(pfn(mod->start), counts, PP_FGLOCKED); } } -extern u8_t __kexec_boot_end; /* link/linker.ld */ +extern u8_t __kboot_end; /* link/linker.ld */ /** * @brief Release memory for kernel bootstrapping initialization @@ -56,20 +57,14 @@ boot_end(struct boot_handoff* bhctx) struct boot_mmapent *mmap = bhctx->mem.mmap, *mmapent; for (size_t i = 0; i < bhctx->mem.mmap_len; i++) { mmapent = &mmap[i]; - size_t size_pg = PN(ROUNDUP(mmapent->size, PG_SIZE)); - - if (mmapent->start >= KERNEL_EXEC || mmapent->type == BOOT_MMAP_FREE) { - continue; - } + size_t size_pg = leaf_count(mmapent->size); if (mmapent->type == BOOT_MMAP_RCLM) { - pmm_mark_chunk_free(PN(mmapent->start), size_pg); + pmm_mark_chunk_free(pfn(mmapent->start), size_pg); } - ptr_t pa = PG_ALIGN(mmapent->start); - for (size_t j = 0; j < size_pg && pa < KERNEL_EXEC; - j++, pa += PM_PAGE_SIZE) { - vmm_del_mapping(VMS_SELF, pa); + if (mmapent->type == BOOT_MMAP_FREE) { + continue; } } @@ -83,11 +78,9 @@ boot_end(struct boot_handoff* bhctx) void boot_cleanup() { - // clean up - for (size_t i = 0; i < (ptr_t)(&__kexec_boot_end); i += PG_SIZE) { - vmm_del_mapping(VMS_SELF, (ptr_t)i); - pmm_free_page((ptr_t)i); - } + pte_t* ptep = mkl0tep(mkptep_va(VMS_SELF, 0)); + size_t count = page_count(KERNEL_RESIDENT, L0T_SIZE); + vmm_unset_ptes(ptep, count); } void diff --git a/lunaix-os/kernel/debug/trace.c b/lunaix-os/kernel/debug/trace.c index 1e5c44a..2e30b09 100644 --- a/lunaix-os/kernel/debug/trace.c +++ b/lunaix-os/kernel/debug/trace.c @@ -1,4 +1,3 @@ -#include #include #include #include @@ -23,11 +22,12 @@ trace_modksyms_init(struct boot_handoff* bhctx) for (size_t i = 0; i < bhctx->mods.mods_num; i++) { struct boot_modent* mod = &bhctx->mods.entries[i]; if (streq(mod->str, "modksyms")) { - assert(PG_ALIGNED(mod->start)); + assert(!va_offset(mod->start)); - ptr_t end = ROUNDUP(mod->end, PG_SIZE); - ptr_t ksym_va = - (ptr_t)vmap(mod->start, (end - mod->start), PG_PREM_R, 0); + pte_t pte = mkpte(mod->start, KERNEL_DATA); + size_t n = pfn(mod->end) - pfn(mod->start); + + ptr_t ksym_va = vmap_leaf_ptes(pte, n); assert(ksym_va); trace_ctx.ksym_table = (struct ksyms*)ksym_va; @@ -43,7 +43,7 @@ trace_sym_lookup(ptr_t addr) int i = c - 1, j = 0, m = 0; - if (addr > ksent[i].pc || addr < ksent[j].pc || addr < KERNEL_EXEC) { + if (addr > ksent[i].pc || addr < ksent[j].pc || !kernel_addr(addr)) { return NULL; } @@ -147,7 +147,12 @@ trace_printstack_of(ptr_t fp) void trace_printstack() { - trace_printstack_of(abi_get_callframe()); + if (current_thread) { + trace_printstack_isr(current_thread->intr_ctx); + } + else { + trace_printstack_of(abi_get_callframe()); + } } static void diff --git a/lunaix-os/kernel/device/input.c b/lunaix-os/kernel/device/input.c index 6ced39d..cc3cc13 100644 --- a/lunaix-os/kernel/device/input.c +++ b/lunaix-os/kernel/device/input.c @@ -1,6 +1,6 @@ #include #include -#include +#include #include #include #include @@ -66,7 +66,7 @@ __input_dev_read(struct device* dev, void* buf, size_t offset, size_t len) int __input_dev_read_pg(struct device* dev, void* buf, size_t offset) { - return __input_dev_read(dev, buf, offset, PG_SIZE); + return __input_dev_read(dev, buf, offset, PAGE_SIZE); } struct input_device* diff --git a/lunaix-os/kernel/exe/elf32/ldelf32.c b/lunaix-os/kernel/exe/elf32/ldelf32.c index a72a14d..64e0eba 100644 --- a/lunaix-os/kernel/exe/elf32/ldelf32.c +++ b/lunaix-os/kernel/exe/elf32/ldelf32.c @@ -1,7 +1,6 @@ #include #include #include -#include #include #include @@ -15,7 +14,7 @@ elf32_smap(struct load_context* ldctx, { struct v_file* elfile = (struct v_file*)elf->elf_file; - assert(PG_ALIGNED(phdre->p_offset)); + assert(!va_offset(phdre->p_offset)); int proct = 0; if ((phdre->p_flags & PF_R)) { @@ -33,17 +32,17 @@ elf32_smap(struct load_context* ldctx, struct mmap_param param = { .vms_mnt = container->vms_mnt, .pvms = vmspace(container->proc), .proct = proct, - .offset = PG_ALIGN(phdre->p_offset), - .mlen = ROUNDUP(phdre->p_memsz, PG_SIZE), + .offset = va_align(phdre->p_offset), + .mlen = va_alignup(phdre->p_memsz), .flags = MAP_FIXED | MAP_PRIVATE, .type = REGION_TYPE_CODE }; struct mm_region* seg_reg; - int status = mmap_user(NULL, &seg_reg, PG_ALIGN(va), elfile, ¶m); + int status = mmap_user(NULL, &seg_reg, va_align(va), elfile, ¶m); if (!status) { size_t next_addr = phdre->p_memsz + va; - ldctx->end = MAX(ldctx->end, ROUNDUP(next_addr, PG_SIZE)); + ldctx->end = MAX(ldctx->end, va_alignup(next_addr)); ldctx->mem_sz += phdre->p_memsz; } else { // we probably fucked up our process @@ -117,7 +116,7 @@ load_executable(struct load_context* context, const struct v_file* exefile) continue; } - if (phdr->p_align != PG_SIZE) { + if (phdr->p_align != PAGE_SIZE) { // surprising alignment! errno = ENOEXEC; break; diff --git a/lunaix-os/kernel/exe/exec.c b/lunaix-os/kernel/exe/exec.c index 9ab8ad7..6f2962e 100644 --- a/lunaix-os/kernel/exe/exec.c +++ b/lunaix-os/kernel/exe/exec.c @@ -120,7 +120,7 @@ exec_load(struct exec_container* container, struct v_file* executable) if (!argv_extra[1]) { // If loading a statically linked file, then heap remapping we can do, // otherwise delayed. - create_heap(vmspace(proc), PG_ALIGN(container->exe.end)); + create_heap(vmspace(proc), va_align(container->exe.end)); } if (container->vms_mnt == VMS_SELF) { diff --git a/lunaix-os/kernel/fs/pcache.c b/lunaix-os/kernel/fs/pcache.c index 4c6f69c..3e77cf5 100644 --- a/lunaix-os/kernel/fs/pcache.c +++ b/lunaix-os/kernel/fs/pcache.c @@ -1,7 +1,6 @@ #include #include #include -#include #include #include #include @@ -36,7 +35,7 @@ pcache_alloc_page() return NULL; } - if (!(va = (ptr_t)vmap(pp, PG_SIZE, PG_PREM_RW, 0))) { + if (!(va = (ptr_t)vmap(pp, PAGE_SIZE, KERNEL_DATA))) { pmm_free_page(pp); return NULL; } @@ -47,7 +46,7 @@ pcache_alloc_page() void pcache_init(struct pcache* pcache) { - btrie_init(&pcache->tree, PG_SIZE_BITS); + btrie_init(&pcache->tree, PAGE_SHIFT); llist_init_head(&pcache->dirty); llist_init_head(&pcache->pages); @@ -134,7 +133,7 @@ pcache_write(struct v_inode* inode, void* data, u32_t len, u32_t fpos) struct pcache_pg* pg; while (buf_off < len && errno >= 0) { - u32_t wr_bytes = MIN(PG_SIZE - pg_off, len - buf_off); + u32_t wr_bytes = MIN(PAGE_SIZE - pg_off, len - buf_off); int new_page = pcache_get_page(pcache, fpos, &pg_off, &pg); @@ -145,7 +144,7 @@ pcache_write(struct v_inode* inode, void* data, u32_t len, u32_t fpos) if (errno < 0) { break; } - if (errno < PG_SIZE) { + if (errno < (int)PAGE_SIZE) { // EOF len = MIN(len, buf_off + errno); } @@ -182,7 +181,7 @@ pcache_read(struct v_inode* inode, void* data, u32_t len, u32_t fpos) if (errno < 0) { break; } - if (errno < PG_SIZE) { + if (errno < (int)PAGE_SIZE) { // EOF len = MIN(len, buf_off + errno); } diff --git a/lunaix-os/kernel/fs/vfs.c b/lunaix-os/kernel/fs/vfs.c index c59584a..eb42396 100644 --- a/lunaix-os/kernel/fs/vfs.c +++ b/lunaix-os/kernel/fs/vfs.c @@ -47,7 +47,6 @@ #include #include #include -#include #include #include #include @@ -1454,7 +1453,7 @@ __DEFINE_LXSYSCALL2(int, fstat, int, fd, struct file_stat*, stat) .st_blocks = vino->lb_usage, .st_size = vino->fsize, .mode = vino->itype, - .st_ioblksize = PG_SIZE, + .st_ioblksize = PAGE_SIZE, .st_blksize = vino->sb->blksize}; if (VFS_DEVFILE(vino->itype)) { diff --git a/lunaix-os/kernel/kinit.c b/lunaix-os/kernel/kinit.c index 2d911bd..b19dc30 100644 --- a/lunaix-os/kernel/kinit.c +++ b/lunaix-os/kernel/kinit.c @@ -7,7 +7,6 @@ #include #include #include -#include #include #include #include @@ -23,7 +22,7 @@ #include #include -#include +#include #include #include @@ -56,7 +55,7 @@ kernel_bootstrap(struct boot_handoff* bhctx) invoke_init_function(on_earlyboot); // FIXME this goes to hal/gfxa - tty_init(ioremap(0xB8000, PG_SIZE)); + tty_init(ioremap(0xB8000, PAGE_SIZE)); tty_set_theme(VGA_COLOR_WHITE, VGA_COLOR_BLACK); device_sysconf_load(); @@ -94,6 +93,7 @@ kernel_bootstrap(struct boot_handoff* bhctx) * and start geting into uspace */ boot_end(bhctx); + boot_cleanup(); spawn_lunad(); } @@ -124,13 +124,23 @@ kmem_init(struct boot_handoff* bhctx) { extern u8_t __kexec_end; // 将内核占据的页,包括前1MB,hhk_init 设为已占用 - size_t pg_count = ((ptr_t)&__kexec_end - KERNEL_EXEC) >> PG_SIZE_BITS; + size_t pg_count = leaf_count((ptr_t)&__kexec_end - KERNEL_RESIDENT); pmm_mark_chunk_occupied(0, pg_count, PP_FGLOCKED); - // reserve higher half - for (size_t i = L1_INDEX(KERNEL_EXEC); i < 1023; i++) { - assert(vmm_set_mapping(VMS_SELF, i << 22, 0, 0, VMAP_NOMAP)); - } + pte_t* ptep = mkptep_va(VMS_SELF, KERNEL_RESIDENT); + ptep = mkl0tep(ptep); + + do { +#if LnT_ENABLED(1) + assert(mkl1t(ptep++, 0, KERNEL_DATA)); +#elif LnT_ENABLED(2) + assert(mkl2t(ptep++, 0, KERNEL_DATA)); +#elif LnT_ENABLED(3) + assert(mkl3t(ptep++, 0, KERNEL_DATA)); +#else + assert(mklft(ptep++, 0, KERNEL_DATA)); +#endif + } while (ptep_vfn(ptep) < MAX_PTEN - 2); // allocators cake_init(); diff --git a/lunaix-os/kernel/lunad.c b/lunaix-os/kernel/lunad.c index 99cd066..34368b6 100644 --- a/lunaix-os/kernel/lunad.c +++ b/lunaix-os/kernel/lunad.c @@ -74,12 +74,6 @@ lunad_do_usr() { void _preemptible lunad_main() { - /* - * We must defer boot code/data cleaning to here, after we successfully - * escape that area - */ - boot_cleanup(); - spawn_kthread((ptr_t)init_platform); /* diff --git a/lunaix-os/kernel/mm/cake.c b/lunaix-os/kernel/mm/cake.c index 9503260..9707d52 100644 --- a/lunaix-os/kernel/mm/cake.c +++ b/lunaix-os/kernel/mm/cake.c @@ -32,7 +32,7 @@ __alloc_cake(unsigned int cake_pg) if (!pa) { return NULL; } - return vmap(pa, cake_pg * PG_SIZE, PG_PREM_RW, 0); + return (void*)vmap(pa, cake_pg * PAGE_SIZE, KERNEL_DATA); } struct cake_s* @@ -86,7 +86,7 @@ __init_pile(struct cake_pile* pile, *pile = (struct cake_pile){ .piece_size = piece_size, .cakes_count = 0, .pieces_per_cake = - (pg_per_cake * PG_SIZE) / + (pg_per_cake * PAGE_SIZE) / (piece_size + sizeof(piece_index_t)), .pg_per_cake = pg_per_cake }; diff --git a/lunaix-os/kernel/mm/dmm.c b/lunaix-os/kernel/mm/dmm.c index a76b9de..f665e9d 100644 --- a/lunaix-os/kernel/mm/dmm.c +++ b/lunaix-os/kernel/mm/dmm.c @@ -21,7 +21,7 @@ create_heap(struct proc_mm* pvms, ptr_t addr) .flags = MAP_ANON | MAP_PRIVATE, .type = REGION_TYPE_HEAP, .proct = PROT_READ | PROT_WRITE, - .mlen = PG_SIZE }; + .mlen = PAGE_SIZE }; int status = 0; struct mm_region* heap; if ((status = mmap_user(NULL, &heap, addr, NULL, &map_param))) { diff --git a/lunaix-os/kernel/mm/fault.c b/lunaix-os/kernel/mm/fault.c new file mode 100644 index 0000000..6ba63a7 --- /dev/null +++ b/lunaix-os/kernel/mm/fault.c @@ -0,0 +1,312 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include + +LOG_MODULE("pf") + +static void +__gather_memaccess_info(struct fault_context* context) +{ + pte_t* ptep = (pte_t*)context->fault_va; + ptr_t mnt = ptep_vm_mnt(ptep); + ptr_t refva; + + context->mm = vmspace(__current); + + if (mnt < VMS_MOUNT_1) { + refva = (ptr_t)ptep; + goto done; + } + + context->ptep_fault = true; + context->remote_fault = (mnt != VMS_SELF); + + if (context->remote_fault && context->mm) { + context->mm = context->mm->guest_mm; + assert(context->mm); + } + +#if LnT_ENABLED(1) + ptep = (pte_t*)page_addr(ptep_pfn(ptep)); + mnt = ptep_vm_mnt(ptep); + if (mnt < VMS_MOUNT_1) { + refva = (ptr_t)ptep; + goto done; + } +#endif + +#if LnT_ENABLED(2) + ptep = (pte_t*)page_addr(ptep_pfn(ptep)); + mnt = ptep_vm_mnt(ptep); + if (mnt < VMS_MOUNT_1) { + refva = (ptr_t)ptep; + goto done; + } +#endif + +#if LnT_ENABLED(3) + ptep = (pte_t*)page_addr(ptep_pfn(ptep)); + mnt = ptep_vm_mnt(ptep); + if (mnt < VMS_MOUNT_1) { + refva = (ptr_t)ptep; + goto done; + } +#endif + + ptep = (pte_t*)page_addr(ptep_pfn(ptep)); + mnt = ptep_vm_mnt(ptep); + + assert(mnt < VMS_MOUNT_1); + refva = (ptr_t)ptep; + +done: + context->fault_refva = refva; +} + +static bool +__prepare_fault_context(struct fault_context* fault) +{ + if (!__arch_prepare_fault_context(fault)) { + return false; + } + + __gather_memaccess_info(fault); + + pte_t* fault_ptep = fault->fault_ptep; + ptr_t fault_va = fault->fault_va; + pte_t fault_pte = *fault_ptep; + bool kernel_vmfault = kernel_addr(fault_va); + bool kernel_refaddr = kernel_addr(fault->fault_refva); + + // for a ptep fault, the parent page tables should match the actual + // accesser permission + if (kernel_refaddr) { + ptep_alloc_hierarchy(fault_ptep, fault_va, KERNEL_DATA); + } else { + ptep_alloc_hierarchy(fault_ptep, fault_va, USER_DATA); + } + + fault->fault_pte = fault_pte; + + if (fault->ptep_fault && !kernel_refaddr) { + fault->resolving = pte_setprot(fault_pte, USER_DATA); + } else { + fault->resolving = pte_setprot(fault_pte, KERNEL_DATA); + } + + fault->kernel_vmfault = kernel_vmfault; + fault->kernel_access = kernel_context(fault->ictx); + + return true; +} + +static void +__handle_conflict_pte(struct fault_context* fault) +{ + pte_t pte = fault->fault_pte; + ptr_t fault_pa = pte_paddr(pte); + if (!pte_allow_user(pte)) { + return; + } + + assert(pte_iswprotect(pte)); + + if (writable_region(fault->vmr)) { + // normal page fault, do COW + // TODO makes `vmm_dup_page` arch-independent + ptr_t pa = (ptr_t)vmm_dup_page(fault_pa); + + pmm_free_page(fault_pa); + pte_t new_pte = pte_setpaddr(pte, pa); + new_pte = pte_mkwritable(new_pte); + + fault_resolved(fault, new_pte, NO_PREALLOC); + } + + return; +} + + +static void +__handle_anon_region(struct fault_context* fault) +{ + pte_t pte = fault->resolving; + pte_attr_t prot = region_pteprot(fault->vmr); + pte = pte_setprot(pte, prot); + + fault_resolved(fault, pte, 0); +} + + +static void +__handle_named_region(struct fault_context* fault) +{ + struct mm_region* vmr = fault->vmr; + struct v_file* file = vmr->mfile; + + pte_t pte = fault->resolving; + ptr_t fault_va = va_align(fault->fault_va); + + u32_t mseg_off = (fault_va - vmr->start); + u32_t mfile_off = mseg_off + vmr->foff; + + int errno = file->ops->read_page(file->inode, (void*)fault_va, mfile_off); + if (errno < 0) { + ERROR("fail to populate page (%d)", errno); + return; + } + + pte_attr_t prot = region_pteprot(vmr); + pte = pte_setprot(pte, prot); + + fault_resolved(fault, pte, 0); +} + +static void +__handle_kernel_page(struct fault_context* fault) +{ + // we must ensure only ptep fault is resolvable + if (fault->fault_va < VMS_MOUNT_1) { + return; + } + + fault_resolved(fault, fault->resolving, 0); + pmm_set_attr(fault->prealloc_pa, PP_FGPERSIST); +} + + +static void +fault_prealloc_page(struct fault_context* fault) +{ + if (!pte_isnull(fault->fault_pte)) { + return; + } + + pte_t pte; + + pte = vmm_alloc_page(fault->fault_ptep, fault->resolving); + if (pte_isnull(pte)) { + return; + } + + fault->resolving = pte; + fault->prealloc_pa = pte_paddr(fault->resolving); + + pmm_set_attr(fault->prealloc_pa, 0); + cpu_flush_page(fault->fault_va); +} + + +static void noret +__fail_to_resolve(struct fault_context* fault) +{ + if (fault->prealloc_pa) { + pmm_free_page(fault->prealloc_pa); + } + + ERROR("(pid: %d) Segmentation fault on %p (%p,e=0x%x)", + __current->pid, + fault->fault_va, + fault->fault_instn, + fault->fault_data); + + trace_printstack_isr(fault->ictx); + + if (fault->kernel_access) { + // if a page fault from kernel is not resolvable, then + // something must be went south + FATAL("unresolvable page fault"); + unreachable; + } + + thread_setsignal(current_thread, _SIGSEGV); + + schedule(); + fail("Unexpected return from segfault"); + + unreachable; +} + +static bool +__try_resolve_fault(struct fault_context* fault) +{ + pte_t fault_pte = fault->fault_pte; + if (pte_isguardian(fault_pte)) { + ERROR("memory region over-running"); + return false; + } + + if (fault->kernel_vmfault && fault->kernel_access) { + __handle_kernel_page(fault); + goto done; + } + + assert(fault->mm); + vm_regions_t* vmr = &fault->mm->regions; + fault->vmr = region_get(vmr, fault->fault_va); + + if (!fault->vmr) { + return false; + } + + if (pte_isloaded(fault_pte)) { + __handle_conflict_pte(fault); + } + else if (anon_region(fault->vmr)) { + __handle_anon_region(fault); + } + else if (fault->vmr->mfile) { + __handle_named_region(fault); + } + else { + // page not present, might be a chance to introduce swap file? + ERROR("WIP page fault route"); + } + +done: + return !!(fault->resolve_type & RESOLVE_OK); +} + +void +intr_routine_page_fault(const isr_param* param) +{ + if (param->depth > 10) { + // Too many nested fault! we must messed up something + // XXX should we failed silently? + spin(); + } + + struct fault_context fault = { .ictx = param }; + + if (!__prepare_fault_context(&fault)) { + __fail_to_resolve(&fault); + } + + fault_prealloc_page(&fault); + + if (!__try_resolve_fault(&fault)) { + __fail_to_resolve(&fault); + } + + if ((fault.resolve_type & NO_PREALLOC)) { + if (fault.prealloc_pa) { + pmm_free_page(fault.prealloc_pa); + } + } + + set_pte(fault.fault_ptep, fault.resolving); + + cpu_flush_page(fault.fault_va); + cpu_flush_page((ptr_t)fault.fault_ptep); +} \ No newline at end of file diff --git a/lunaix-os/kernel/mm/mmap.c b/lunaix-os/kernel/mm/mmap.c index 1c3b8b8..3f273c0 100644 --- a/lunaix-os/kernel/mm/mmap.c +++ b/lunaix-os/kernel/mm/mmap.c @@ -6,12 +6,12 @@ #include #include -#include +#include #include // any size beyond this is bullshit -#define BS_SIZE (KERNEL_EXEC - USR_MMAP) +#define BS_SIZE (KERNEL_RESIDENT - USR_MMAP) int mem_has_overlap(vm_regions_t* regions, ptr_t start, ptr_t end) @@ -65,7 +65,7 @@ mmap_user(void** addr_out, struct v_file* file, struct mmap_param* param) { - param->range_end = KERNEL_EXEC; + param->range_end = KERNEL_RESIDENT; param->range_start = USR_EXEC; return mem_map(addr_out, created, addr, file, param); @@ -168,7 +168,7 @@ mem_map(void** addr_out, { assert_msg(addr, "addr can not be NULL"); - ptr_t last_end = USR_EXEC, found_loc = PG_ALIGN(addr); + ptr_t last_end = USR_EXEC, found_loc = va_align(addr); struct mm_region *pos, *n; vm_regions_t* vm_regions = ¶m->pvms->regions; @@ -215,20 +215,7 @@ found: region->proc_vms = param->pvms; region_add(vm_regions, region); - - int proct = param->proct; - int attr = PG_ALLOW_USER; - if ((proct & REGION_WRITE)) { - attr |= PG_WRITE; - } - if ((proct & REGION_KERNEL)) { - attr &= ~PG_ALLOW_USER; - } - - for (size_t i = 0; i < param->mlen; i += PG_SIZE) { - vmm_set_mapping(param->vms_mnt, found_loc + i, 0, attr, 0); - } - + if (file) { vfs_ref_file(file); } @@ -264,22 +251,24 @@ mem_sync_pages(ptr_t mnt, if (!region->mfile || !(region->attr & REGION_WSHARED)) { return; } + + pte_t* ptep = mkptep_va(mnt, start); + ptr_t va = va_align(start); - v_mapping mapping; - for (size_t i = 0; i < length; i += PG_SIZE) { - if (!vmm_lookupat(mnt, start + i, &mapping)) { + for (; va < start + length; va += PAGE_SIZE, ptep++) { + pte_t pte = vmm_tryptep(ptep, LFT_SIZE); + if (pte_isnull(pte)) { continue; } - if (PG_IS_DIRTY(*mapping.pte)) { - size_t offset = mapping.va - region->start + region->foff; + if (pte_dirty(pte)) { + size_t offset = va - region->start + region->foff; struct v_inode* inode = region->mfile->inode; - region->mfile->ops->write_page(inode, (void*)mapping.va, offset); - - *mapping.pte &= ~PG_DIRTY; + region->mfile->ops->write_page(inode, (void*)va, offset); - cpu_flush_page((ptr_t)mapping.pte); + set_pte(ptep, pte_mkclean(pte)); + cpu_flush_page(va); } else if ((options & MS_INVALIDATE)) { goto invalidate; } @@ -291,9 +280,9 @@ mem_sync_pages(ptr_t mnt, continue; invalidate: - *mapping.pte &= ~PG_PRESENT; - pmm_free_page(mapping.pa); - cpu_flush_page((ptr_t)mapping.pte); + set_pte(ptep, null_pte); + pmm_free_page(pte_paddr(pte)); + cpu_flush_page(va); } } @@ -332,13 +321,17 @@ mem_unmap_region(ptr_t mnt, struct mm_region* region) valloc_ensure_valid(region); - size_t len = ROUNDUP(region->end - region->start, PG_SIZE); - mem_sync_pages(mnt, region, region->start, len, 0); + pfn_t pglen = leaf_count(region->end - region->start); + mem_sync_pages(mnt, region, region->start, pglen * PAGE_SIZE, 0); - for (size_t i = region->start; i <= region->end; i += PG_SIZE) { - ptr_t pa = vmm_del_mapping(mnt, i); - if (pa) { - pmm_free_page(pa); + pte_t* ptep = mkptep_va(mnt, region->start); + for (size_t i = 0; i < pglen; i++, ptep++) { + pte_t pte = pte_at(ptep); + ptr_t pa = pte_paddr(pte); + + set_pte(ptep, null_pte); + if (pte_isloaded(pte)) { + pmm_free_page(pte_paddr(pte)); } } @@ -393,22 +386,25 @@ __unmap_overlapped_cases(ptr_t mnt, shrink = vmr->end - seg_start; umps_len = shrink; umps_start = seg_start; - } else if (CASE_HITE(vmr, seg_start, seg_len)) { + } + else if (CASE_HITE(vmr, seg_start, seg_len)) { shrink = vmr->end - seg_start; umps_len = shrink; umps_start = seg_start; - } else if (CASE_HETI(vmr, seg_start, seg_len)) { + } + else if (CASE_HETI(vmr, seg_start, seg_len)) { displ = seg_len - (vmr->start - seg_start); umps_len = displ; umps_start = vmr->start; - } else if (CASE_HETE(vmr, seg_start, seg_len)) { + } + else if (CASE_HETE(vmr, seg_start, seg_len)) { shrink = vmr->end - vmr->start; umps_len = shrink; umps_start = vmr->start; } mem_sync_pages(mnt, vmr, vmr->start, umps_len, 0); - for (size_t i = 0; i < umps_len; i += PG_SIZE) { + for (size_t i = 0; i < umps_len; i += PAGE_SIZE) { ptr_t pa = vmm_del_mapping(mnt, vmr->start + i); if (pa) { pmm_free_page(pa); @@ -434,8 +430,8 @@ __unmap_overlapped_cases(ptr_t mnt, int mem_unmap(ptr_t mnt, vm_regions_t* regions, ptr_t addr, size_t length) { - length = ROUNDUP(length, PG_SIZE); - ptr_t cur_addr = PG_ALIGN(addr); + length = ROUNDUP(length, PAGE_SIZE); + ptr_t cur_addr = va_align(addr); struct mm_region *pos, *n; llist_for_each(pos, n, regions, head) @@ -467,7 +463,7 @@ __DEFINE_LXSYSCALL3(void*, sys_mmap, void*, addr, size_t, length, va_list, lst) ptr_t addr_ptr = (ptr_t)addr; - if (!length || length > BS_SIZE || !PG_ALIGNED(addr_ptr)) { + if (!length || length > BS_SIZE || va_offset(addr_ptr)) { errno = EINVAL; goto done; } @@ -497,7 +493,7 @@ __DEFINE_LXSYSCALL3(void*, sys_mmap, void*, addr, size_t, length, va_list, lst) } struct mmap_param param = { .flags = options, - .mlen = ROUNDUP(length, PG_SIZE), + .mlen = ROUNDUP(length, PAGE_SIZE), .offset = offset, .type = REGION_TYPE_GENERAL, .proct = proct, @@ -519,7 +515,7 @@ __DEFINE_LXSYSCALL2(int, munmap, void*, addr, size_t, length) __DEFINE_LXSYSCALL3(int, msync, void*, addr, size_t, length, int, flags) { - if (!PG_ALIGNED(addr) || ((flags & MS_ASYNC) && (flags & MS_SYNC))) { + if (va_offset((ptr_t)addr) || ((flags & MS_ASYNC) && (flags & MS_SYNC))) { return DO_STATUS(EINVAL); } diff --git a/lunaix-os/kernel/mm/mmio.c b/lunaix-os/kernel/mm/mmio.c index ccb57d7..1e262cc 100644 --- a/lunaix-os/kernel/mm/mmio.c +++ b/lunaix-os/kernel/mm/mmio.c @@ -6,12 +6,12 @@ void* ioremap(ptr_t paddr, u32_t size) { - void* ptr = vmap(paddr, size, PG_PREM_RW | PG_DISABLE_CACHE, 0); + // FIXME implement a page policy interface allow to decouple the + // arch-dependent caching behaviour + void* ptr = (void*)vmap(paddr, size, KERNEL_DATA); if (ptr) { - pmm_mark_chunk_occupied(paddr >> PG_SIZE_BITS, - CEIL(size, PG_SIZE_BITS), - PP_FGLOCKED); + pmm_mark_chunk_occupied(pfn(paddr), leaf_count(size), PP_FGLOCKED); } return ptr; @@ -20,8 +20,12 @@ ioremap(ptr_t paddr, u32_t size) void iounmap(ptr_t vaddr, u32_t size) { - for (size_t i = 0; i < size; i += PG_SIZE) { - ptr_t paddr = vmm_del_mapping(VMS_SELF, vaddr + i); - pmm_free_page(paddr); + pte_t* ptep = mkptep_va(VMS_SELF, vaddr); + for (size_t i = 0; i < size; i += PAGE_SIZE, ptep++) { + pte_t pte = pte_at(ptep); + + set_pte(ptep, null_pte); + if (pte_isloaded(pte)) + pmm_free_page(pte_paddr(pte)); } } \ No newline at end of file diff --git a/lunaix-os/kernel/mm/page.c b/lunaix-os/kernel/mm/page.c deleted file mode 100644 index 400197f..0000000 --- a/lunaix-os/kernel/mm/page.c +++ /dev/null @@ -1 +0,0 @@ -#include \ No newline at end of file diff --git a/lunaix-os/kernel/mm/pmm.c b/lunaix-os/kernel/mm/pmm.c index b1cea96..72b57d7 100644 --- a/lunaix-os/kernel/mm/pmm.c +++ b/lunaix-os/kernel/mm/pmm.c @@ -1,6 +1,7 @@ -#include #include #include +#include +#include // This is a very large array... static struct pp_struct pm_table[PM_BMP_MAX_SIZE]; @@ -12,6 +13,9 @@ export_symbol(debug, pmm, max_pg); void pmm_mark_page_free(ptr_t ppn) { + if ((pm_table[ppn].attr & PP_FGLOCKED)) { + return; + } pm_table[ppn].ref_counts = 0; } @@ -26,6 +30,9 @@ void pmm_mark_chunk_free(ptr_t start_ppn, size_t page_count) { for (size_t i = start_ppn; i < start_ppn + page_count && i < max_pg; i++) { + if ((pm_table[i].attr & PP_FGLOCKED)) { + continue; + } pm_table[i].ref_counts = 0; } } @@ -49,7 +56,7 @@ volatile size_t pg_lookup_ptr; void pmm_init(ptr_t mem_upper_lim) { - max_pg = (PG_ALIGN(mem_upper_lim) >> 12); + max_pg = pfn(mem_upper_lim); pg_lookup_ptr = LOOKUP_START; @@ -112,22 +119,16 @@ pmm_alloc_page(pp_attr_t attr) } int -pmm_free_page(ptr_t page) +pmm_free_one(ptr_t page, pp_attr_t attr_mask) { - struct pp_struct* pm = &pm_table[page >> 12]; - - // Is this a MMIO mapping or double free? - if ((page >> 12) >= max_pg || !(pm->ref_counts)) { - return 0; - } + pfn_t ppfn = pfn(page); + struct pp_struct* pm = &pm_table[ppfn]; - // 如果是锁定页,则不作处理 - if ((pm->attr & PP_FGLOCKED)) { + assert(ppfn < max_pg && pm->ref_counts); + if (pm->attr && !(pm->attr & attr_mask)) { return 0; } - // TODO: 检查权限,保证:1) 只有正在使用该页(包括被分享者)的进程可以释放; - // 2) 内核可释放所有页。 pm->ref_counts--; return 1; } @@ -135,21 +136,29 @@ pmm_free_page(ptr_t page) int pmm_ref_page(ptr_t page) { - u32_t ppn = page >> 12; + u32_t ppn = pfn(page); if (ppn >= PM_BMP_MAX_SIZE) { return 0; } struct pp_struct* pm = &pm_table[ppn]; - if (ppn >= max_pg || !pm->ref_counts) { - return 0; - } + assert(ppn < max_pg && pm->ref_counts); pm->ref_counts++; return 1; } +void +pmm_set_attr(ptr_t page, pp_attr_t attr) +{ + struct pp_struct* pp = &pm_table[pfn(page)]; + + if (pp->ref_counts) { + pp->attr = attr; + } +} + struct pp_struct* pmm_query(ptr_t pa) { diff --git a/lunaix-os/kernel/mm/procvm.c b/lunaix-os/kernel/mm/procvm.c index d9e357a..ea49624 100644 --- a/lunaix-os/kernel/mm/procvm.c +++ b/lunaix-os/kernel/mm/procvm.c @@ -6,13 +6,13 @@ #include #include -#include +#include #include struct proc_mm* procvm_create(struct proc_info* proc) { - struct proc_mm* mm = valloc(sizeof(struct proc_mm)); + struct proc_mm* mm = vzalloc(sizeof(struct proc_mm)); assert(mm); @@ -23,122 +23,247 @@ procvm_create(struct proc_info* proc) { return mm; } - static ptr_t -__dup_vmspace(ptr_t mount_point, bool only_kernel) +vmscpy(ptr_t dest_mnt, ptr_t src_mnt, bool only_kernel) { - ptr_t ptd_pp = pmm_alloc_page(PP_FGPERSIST); - vmm_set_mapping(VMS_SELF, PG_MOUNT_1, ptd_pp, PG_PREM_RW, VMAP_NULL); - - x86_page_table* ptd = (x86_page_table*)PG_MOUNT_1; - x86_page_table* pptd = (x86_page_table*)(mount_point | (0x3FF << 12)); - - size_t kspace_l1inx = L1_INDEX(KERNEL_EXEC); - size_t i = 1; // skip first 4MiB, to avoid bring other thread's stack + pte_t* ptep_dest = mkl0tep(mkptep_va(dest_mnt, 0)); + pte_t* ptep = mkl0tep(mkptep_va(src_mnt, 0)); + pte_t* ptepd_kernel = mkl0tep(mkptep_va(dest_mnt, KERNEL_RESIDENT)); + pte_t* ptep_kernel = mkl0tep(mkptep_va(src_mnt, KERNEL_RESIDENT)); + + // Build the self-reference on dest vms + pte_t* ptep_sms = mkptep_va(VMS_SELF, (ptr_t)ptep_dest); + pte_t* ptep_ssm = mkptep_va(VMS_SELF, (ptr_t)ptep_sms); + pte_t pte_sms = mkpte_prot(KERNEL_DATA); + + pte_sms = vmm_alloc_page(ptep_ssm, pte_sms); + set_pte(ptep_sms, pte_sms); + + cpu_flush_page((ptr_t)dest_mnt); - ptd->entry[0] = 0; if (only_kernel) { - i = kspace_l1inx; - memset(ptd, 0, PG_SIZE); + ptep = ptep_kernel; + ptep_dest += ptep_vfn(ptep_kernel); + } else { + ptep++; + ptep_dest++; } - for (; i < PG_MAX_ENTRIES - 1; i++) { + int level = 0; + while (ptep < ptep_kernel) + { + pte_t pte = *ptep; + ptr_t pa = pte_paddr(pte); + + if (pte_isnull(pte)) { + goto cont; + } + + if (pt_last_level(level) || pte_huge(pte)) { + set_pte(ptep_dest, pte); + + if (pte_isloaded(pte)) + pmm_ref_page(pa); + } + else if (!pt_last_level(level)) { + vmm_alloc_page(ptep_dest, pte); + + ptep = ptep_step_into(ptep); + ptep_dest = ptep_step_into(ptep_dest); + level++; - x86_pte_t ptde = pptd->entry[i]; - // 空或者是未在内存中的L1页表项直接照搬过去。 - // 内核地址空间直接共享过去。 - if (!ptde || i >= kspace_l1inx || !(ptde & PG_PRESENT)) { - ptd->entry[i] = ptde; continue; } + + cont: + if (ptep_vfn(ptep) == MAX_PTEN - 1) { + assert(level > 0); + ptep = ptep_step_out(ptep); + ptep_dest = ptep_step_out(ptep_dest); + level--; + } + + ptep++; + ptep_dest++; + } + + // Ensure we step back to L0T + assert(!level); + assert(ptep_dest == ptepd_kernel); + + // Carry over the kernel (exclude last two entry) + while (ptep_vfn(ptep) < MAX_PTEN - 2) { + pte_t pte = *ptep; + assert(!pte_isnull(pte)); + + set_pte(ptep_dest, pte); + pmm_ref_page(pte_paddr(pte)); + + ptep++; + ptep_dest++; + } + + return pte_paddr(*(ptep_dest + 1)); +} + +static void optimize("O0") +vmsfree(ptr_t vm_mnt) +{ + pte_t* ptep_head = mkl0tep(mkptep_va(vm_mnt, 0)); + pte_t* ptep_kernel = mkl0tep(mkptep_va(vm_mnt, KERNEL_RESIDENT)); + + int level = 0; + volatile pte_t* ptep = ptep_head; + while (ptep < ptep_kernel) + { + pte_t pte = *ptep; + ptr_t pa = pte_paddr(pte); + + if (pte_isnull(pte)) { + goto cont; + } - // 复制L2页表 - ptr_t pt_pp = pmm_alloc_page(PP_FGPERSIST); - vmm_set_mapping(VMS_SELF, PG_MOUNT_2, pt_pp, PG_PREM_RW, VMAP_NULL); + if (!pt_last_level(level) && !pte_huge(pte)) { + ptep = ptep_step_into(ptep); + level++; - x86_page_table* ppt = (x86_page_table*)(mount_point | (i << 12)); - x86_page_table* pt = (x86_page_table*)PG_MOUNT_2; + continue; + } - for (size_t j = 0; j < PG_MAX_ENTRIES; j++) { - x86_pte_t pte = ppt->entry[j]; - pmm_ref_page(PG_ENTRY_ADDR(pte)); - pt->entry[j] = pte; + if (pte_isloaded(pte)) + pmm_free_any(pa); + + cont: + if (ptep_vfn(ptep) == MAX_PTEN - 1) { + ptep = ptep_step_out(ptep); + pmm_free_any(pte_paddr(pte_at(ptep))); + level--; } - ptd->entry[i] = (ptr_t)pt_pp | PG_ENTRY_FLAGS(ptde); + ptep++; } - ptd->entry[PG_MAX_ENTRIES - 1] = NEW_L1_ENTRY(T_SELF_REF_PERM, ptd_pp); + ptr_t self_pa = pte_paddr(ptep_head[MAX_PTEN - 1]); + pmm_free_any(self_pa); +} + +static inline void +__attach_to_current_vms(struct proc_mm* guest_mm) +{ + struct proc_mm* mm_current = vmspace(__current); + if (mm_current) { + assert(!mm_current->guest_mm); + mm_current->guest_mm = guest_mm; + } +} - return ptd_pp; +static inline void +__detach_from_current_vms(struct proc_mm* guest_mm) +{ + struct proc_mm* mm_current = vmspace(__current); + if (mm_current) { + assert(mm_current->guest_mm == guest_mm); + mm_current->guest_mm = NULL; + } } + void -procvm_dup(struct proc_info* proc) { - struct proc_mm* mm = vmspace(proc); - struct proc_mm* mm_current = vmspace(__current); - - mm->heap = mm_current->heap; - mm->vmroot = __dup_vmspace(VMS_SELF, false); +procvm_dupvms_mount(struct proc_mm* mm) { + assert(__current); + assert(!mm->vm_mnt); + + struct proc_mm* mm_current = vmspace(__current); + + __attach_to_current_vms(mm); - region_copy_mm(mm_current, mm); + mm->heap = mm_current->heap; + mm->vm_mnt = VMS_MOUNT_1; + mm->vmroot = vmscpy(VMS_MOUNT_1, VMS_SELF, false); + + region_copy_mm(mm_current, mm); } void -procvm_init_clean(struct proc_info* proc) +procvm_mount(struct proc_mm* mm) { - struct proc_mm* mm = vmspace(proc); - mm->vmroot = __dup_vmspace(VMS_SELF, true); -} + assert(!mm->vm_mnt); + assert(mm->vmroot); + vms_mount(VMS_MOUNT_1, mm->vmroot); -static void -__delete_vmspace(ptr_t vm_mnt) -{ - x86_page_table* pptd = (x86_page_table*)(vm_mnt | (0x3FF << 12)); + __attach_to_current_vms(mm); - // only remove user address space - for (size_t i = 0; i < L1_INDEX(KERNEL_EXEC); i++) { - x86_pte_t ptde = pptd->entry[i]; - if (!ptde || !(ptde & PG_PRESENT)) { - continue; - } + mm->vm_mnt = VMS_MOUNT_1; +} - x86_page_table* ppt = (x86_page_table*)(vm_mnt | (i << 12)); +void +procvm_unmount(struct proc_mm* mm) +{ + assert(mm->vm_mnt); - for (size_t j = 0; j < PG_MAX_ENTRIES; j++) { - x86_pte_t pte = ppt->entry[j]; - // free the 4KB data page - if ((pte & PG_PRESENT)) { - pmm_free_page(PG_ENTRY_ADDR(pte)); - } - } - // free the L2 page table - pmm_free_page(PG_ENTRY_ADDR(ptde)); + vms_unmount(VMS_MOUNT_1); + struct proc_mm* mm_current = vmspace(__current); + if (mm_current) { + mm_current->guest_mm = NULL; } - // free the L1 directory - pmm_free_page(PG_ENTRY_ADDR(pptd->entry[PG_MAX_ENTRIES - 1])); + + mm->vm_mnt = 0; +} + +void +procvm_initvms_mount(struct proc_mm* mm) +{ + assert(!mm->vm_mnt); + + __attach_to_current_vms(mm); + + mm->vm_mnt = VMS_MOUNT_1; + mm->vmroot = vmscpy(VMS_MOUNT_1, VMS_SELF, true); } void -procvm_cleanup(ptr_t vm_mnt, struct proc_info* proc) { +procvm_unmount_release(struct proc_mm* mm) { + ptr_t vm_mnt = mm->vm_mnt; struct mm_region *pos, *n; - llist_for_each(pos, n, vmregions(proc), head) + llist_for_each(pos, n, &mm->regions, head) { mem_sync_pages(vm_mnt, pos, pos->start, pos->end - pos->start, 0); region_release(pos); } - vfree(proc->mm); + vfree(mm); + vmsfree(vm_mnt); + vms_unmount(vm_mnt); + + __detach_from_current_vms(mm); +} + +void +procvm_mount_self(struct proc_mm* mm) +{ + assert(!mm->vm_mnt); + assert(!mm->guest_mm); + + mm->vm_mnt = VMS_SELF; +} + +void +procvm_unmount_self(struct proc_mm* mm) +{ + assert(mm->vm_mnt == VMS_SELF); - __delete_vmspace(vm_mnt); + mm->vm_mnt = 0; } ptr_t procvm_enter_remote(struct remote_vmctx* rvmctx, struct proc_mm* mm, - ptr_t vm_mnt, ptr_t remote_base, size_t size) + ptr_t remote_base, size_t size) { - ptr_t size_pn = PN(size + MEM_PAGE); + ptr_t vm_mnt = mm->vm_mnt; + assert(vm_mnt); + + pfn_t size_pn = pfn(size + MEM_PAGE); assert(size_pn < REMOTEVM_MAX_PAGES); struct mm_region* region = region_get(&mm->regions, remote_base); @@ -147,23 +272,25 @@ procvm_enter_remote(struct remote_vmctx* rvmctx, struct proc_mm* mm, rvmctx->vms_mnt = vm_mnt; rvmctx->page_cnt = size_pn; - remote_base = PG_ALIGN(remote_base); + remote_base = va_align(remote_base); rvmctx->remote = remote_base; rvmctx->local_mnt = PG_MOUNT_4_END + 1; - v_mapping m; - unsigned int pattr = region_ptattr(region); - ptr_t raddr = remote_base, lmnt = rvmctx->local_mnt; - for (size_t i = 0; i < size_pn; i++, lmnt += MEM_PAGE, raddr += MEM_PAGE) + pte_t* rptep = mkptep_va(vm_mnt, remote_base); + pte_t* lptep = mkptep_va(VMS_SELF, rvmctx->local_mnt); + unsigned int pattr = region_pteprot(region); + + for (size_t i = 0; i < size_pn; i++) { - if (vmm_lookupat(vm_mnt, raddr, &m) && PG_IS_PRESENT(m.flags)) { - vmm_set_mapping(VMS_SELF, lmnt, m.pa, PG_PREM_RW, 0); + pte_t pte = vmm_tryptep(rptep, PAGE_SIZE); + if (pte_isloaded(pte)) { + set_pte(lptep, pte); continue; } ptr_t pa = pmm_alloc_page(0); - vmm_set_mapping(VMS_SELF, lmnt, pa, PG_PREM_RW, 0); - vmm_set_mapping(vm_mnt, raddr, pa, pattr, 0); + set_pte(lptep, mkpte(pa, KERNEL_DATA)); + set_pte(rptep, mkpte(pa, pattr)); } return vm_mnt; @@ -179,7 +306,7 @@ procvm_copy_remote_transaction(struct remote_vmctx* rvmctx, } ptr_t offset = remote_dest - rvmctx->remote; - if (PN(offset + sz) >= rvmctx->page_cnt) { + if (pfn(offset + sz) >= rvmctx->page_cnt) { return -1; } @@ -189,11 +316,8 @@ procvm_copy_remote_transaction(struct remote_vmctx* rvmctx, } void -procvm_exit_remote_transaction(struct remote_vmctx* rvmctx) +procvm_exit_remote(struct remote_vmctx* rvmctx) { - ptr_t lmnt = rvmctx->local_mnt; - for (size_t i = 0; i < rvmctx->page_cnt; i++, lmnt += MEM_PAGE) - { - vmm_del_mapping(VMS_SELF, lmnt); - } + pte_t* lptep = mkptep_va(VMS_SELF, rvmctx->local_mnt); + vmm_unset_ptes(lptep, rvmctx->page_cnt); } \ No newline at end of file diff --git a/lunaix-os/kernel/mm/region.c b/lunaix-os/kernel/mm/region.c index a2bf82e..7dead6d 100644 --- a/lunaix-os/kernel/mm/region.c +++ b/lunaix-os/kernel/mm/region.c @@ -1,4 +1,3 @@ -#include #include #include #include @@ -11,8 +10,8 @@ struct mm_region* region_create(ptr_t start, ptr_t end, u32_t attr) { - assert_msg(PG_ALIGNED(start), "not page aligned"); - assert_msg(PG_ALIGNED(end), "not page aligned"); + assert_msg(!va_offset(start), "not page aligned"); + assert_msg(!va_offset(end), "not page aligned"); struct mm_region* region = valloc(sizeof(struct mm_region)); *region = (struct mm_region){ .attr = attr, .start = start, .end = end - 1 }; @@ -22,8 +21,8 @@ region_create(ptr_t start, ptr_t end, u32_t attr) struct mm_region* region_create_range(ptr_t start, size_t length, u32_t attr) { - assert_msg(PG_ALIGNED(start), "not page aligned"); - assert_msg(PG_ALIGNED(length), "not page aligned"); + assert_msg(!va_offset(start), "not page aligned"); + assert_msg(!va_offset(length), "not page aligned"); struct mm_region* region = valloc(sizeof(struct mm_region)); *region = (struct mm_region){ .attr = attr, .start = start, @@ -131,7 +130,7 @@ region_get(vm_regions_t* lead, unsigned long vaddr) struct mm_region *pos, *n; - vaddr = PG_ALIGN(vaddr); + vaddr = va_align(vaddr); llist_for_each(pos, n, lead, head) { diff --git a/lunaix-os/kernel/mm/vmap.c b/lunaix-os/kernel/mm/vmap.c index f805adc..02711fb 100644 --- a/lunaix-os/kernel/mm/vmap.c +++ b/lunaix-os/kernel/mm/vmap.c @@ -2,143 +2,70 @@ #include #include #include +#include #include static ptr_t start = VMAP; +static volatile ptr_t prev_va = 0; -void* -vmap(ptr_t paddr, size_t size, pt_attr attr, int flags) +static pte_t* +__alloc_contig_ptes(pte_t* ptep, size_t base_sz, int n) { - // next fit - assert_msg((paddr & 0xfff) == 0, "vmap: bad alignment"); - size = ROUNDUP(size, PG_SIZE); - - ptr_t current_addr = start; - size_t examed_size = 0, wrapped = 0; - x86_page_table* pd = (x86_page_table*)L1_BASE_VADDR; - - while (!wrapped || current_addr < start) { - size_t l1inx = L1_INDEX(current_addr); - if (!(pd->entry[l1inx])) { - // empty 4mb region - examed_size += MEM_4M; - current_addr = (current_addr & 0xffc00000) + MEM_4M; - } else { - x86_page_table* ptd = (x86_page_table*)(L2_VADDR(l1inx)); - size_t i = L2_INDEX(current_addr), j = 0; - for (; i < PG_MAX_ENTRIES && examed_size < size; i++, j++) { - if (!ptd->entry[i]) { - examed_size += PG_SIZE; - } else if (examed_size) { - // found a discontinuity, start from beginning - examed_size = 0; - j++; - break; - } - } - current_addr += j << 12; - } - - if (examed_size >= size) { - goto done; + int _n = 0; + size_t sz = L0T_SIZE; + ptr_t va = page_addr(ptep_pfn(ptep)); + + ptep = mkl0tep(ptep); + + while (_n < n && va < VMAP_END) { + pte_t pte = *ptep; + if (pte_isnull(pte)) { + _n += sz / base_sz; + } + else if ((sz / LEVEL_SIZE) < base_sz) { + _n = 0; } - - if (current_addr >= VMAP_END) { - wrapped = 1; - examed_size = 0; - current_addr = VMAP; + else { + sz = sz / LEVEL_SIZE; + ptep = ptep_step_into(ptep); + continue; } - } - - return NULL; -done: - ptr_t alloc_begin = current_addr - examed_size; - start = alloc_begin + size; - - if ((flags & VMAP_NOMAP)) { - for (size_t i = 0; i < size; i += PG_SIZE) { - vmm_set_mapping(VMS_SELF, alloc_begin + i, -1, 0, 0); + if (ptep_vfn(ptep) + 1 == LEVEL_SIZE) { + ptep = ptep_step_out(++ptep); + va += sz; + + sz = sz * LEVEL_SIZE; + continue; } - - return (void*)alloc_begin; - } - - for (size_t i = 0; i < size; i += PG_SIZE) { - vmm_set_mapping(VMS_SELF, alloc_begin + i, paddr + i, attr, 0); - pmm_ref_page(paddr + i); + + va += sz; + ptep++; } - return (void*)alloc_begin; -} - -/* - This is a kernel memory region that represent a contiguous virtual memory - address such that all memory allocation/deallocation can be concentrated - into a single big chunk, which will help to mitigate the external - fragmentation in the VMAP address domain. It is significant if our - allocation granule is single page or in some use cases. - - XXX (vmap_area) - A potential performance improvement on pcache? (need more analysis!) - -> In exchange of a fixed size buffer pool. (does it worth?) -*/ - -struct vmap_area* -vmap_varea(size_t size, pt_attr attr) -{ - ptr_t start = (ptr_t)vmap(0, size, attr ^ PG_PRESENT, VMAP_NOMAP); - - if (!start) { + if (va >= VMAP_END) { return NULL; } - struct vmap_area* varea = valloc(sizeof(struct vmap_area)); - *varea = - (struct vmap_area){ .start = start, .size = size, .area_attr = attr }; - - return varea; + va -= base_sz * _n; + assert(prev_va < va); + + prev_va = va; + return mkptep_va(ptep_vm_mnt(ptep), va); } ptr_t -vmap_area_page(struct vmap_area* area, ptr_t paddr, pt_attr attr) +vmap_ptes_at(pte_t pte, size_t lvl_size, int n) { - ptr_t current = area->start; - size_t bound = current + area->size; + pte_t* ptep = mkptep_va(VMS_SELF, start); + ptep = __alloc_contig_ptes(ptep, lvl_size, n); - while (current < bound) { - x86_pte_t* pte = - (x86_pte_t*)(L2_VADDR(L1_INDEX(current)) | L2_INDEX(current)); - if (PG_IS_PRESENT(*pte)) { - current += PG_SIZE; - continue; - } - - *pte = NEW_L2_ENTRY(attr | PG_PRESENT, paddr); - cpu_flush_page(current); - break; - } - - return current; -} - -ptr_t -vmap_area_rmpage(struct vmap_area* area, ptr_t vaddr) -{ - ptr_t current = area->start; - size_t bound = current + area->size; - - if (current > vaddr || vaddr > bound) { + if (!ptep) { return 0; } - x86_pte_t* pte = - (x86_pte_t*)(L2_VADDR(L1_INDEX(current)) | L2_INDEX(current)); - ptr_t pa = PG_ENTRY_ADDR(*pte); - - *pte = NEW_L2_ENTRY(0, -1); - cpu_flush_page(current); + vmm_set_ptes_contig(ptep, pte, lvl_size, n); - return pa; + return page_addr(ptep_pfn(ptep)); } \ No newline at end of file diff --git a/lunaix-os/kernel/mm/vmm.c b/lunaix-os/kernel/mm/vmm.c index 5e09e9d..6b497b8 100644 --- a/lunaix-os/kernel/mm/vmm.c +++ b/lunaix-os/kernel/mm/vmm.c @@ -7,7 +7,7 @@ #include #include -LOG_MODULE("VMM") +LOG_MODULE("VM") void vmm_init() @@ -15,217 +15,138 @@ vmm_init() // XXX: something here? } -x86_page_table* -vmm_init_pd() +pte_t +vmm_alloc_page(pte_t* ptep, pte_t pte) { - x86_page_table* dir = - (x86_page_table*)pmm_alloc_page(PP_FGPERSIST); - for (size_t i = 0; i < PG_MAX_ENTRIES; i++) { - dir->entry[i] = PTE_NULL; + ptr_t pa = pmm_alloc_page(PP_FGPERSIST); + if (!pa) { + return null_pte; } - // 递归映射,方便我们在软件层面进行查表地址转换 - dir->entry[PG_MAX_ENTRIES - 1] = NEW_L1_ENTRY(T_SELF_REF_PERM, dir); + pte = pte_setpaddr(pte, pa); + pte = pte_mkloaded(pte); + set_pte(ptep, pte); - return dir; + mount_page(PG_MOUNT_1, pa); + memset((void*)PG_MOUNT_1, 0, LFT_SIZE); + unmount_page(PG_MOUNT_1); + + cpu_flush_page((ptr_t)ptep); + + return pte; } int -vmm_set_mapping(ptr_t mnt, ptr_t va, ptr_t pa, pt_attr attr, int options) +vmm_set_mapping(ptr_t mnt, ptr_t va, ptr_t pa, pte_attr_t prot) { - assert((ptr_t)va % PG_SIZE == 0); - - ptr_t l1_inx = L1_INDEX(va); - ptr_t l2_inx = L2_INDEX(va); - x86_page_table* l1pt = (x86_page_table*)(mnt | (1023 << 12)); - x86_page_table* l2pt = (x86_page_table*)(mnt | (l1_inx << 12)); - - // See if attr make sense - assert(attr <= 128); - - x86_pte_t* l1pte = &l1pt->entry[l1_inx]; - if (!*l1pte) { - x86_page_table* new_l1pt_pa = - (x86_page_table*)pmm_alloc_page(PP_FGPERSIST); - - // 物理内存已满! - if (!new_l1pt_pa) { - return 0; - } - - // This must be writable - *l1pte = NEW_L1_ENTRY(attr | PG_WRITE | PG_PRESENT, new_l1pt_pa); - - // make sure our new l2 table is visible to CPU - cpu_flush_page((ptr_t)l2pt); - - memset((void*)l2pt, 0, PG_SIZE); - } else { - if ((attr & PG_ALLOW_USER) && !(*l1pte & PG_ALLOW_USER)) { - *l1pte |= PG_ALLOW_USER; - } - - x86_pte_t pte = l2pt->entry[l2_inx]; - if (pte && (options & VMAP_IGNORE)) { - return 1; - } - } + assert(!va_offset(va)); - if (mnt == VMS_SELF) { - cpu_flush_page(va); - } + pte_t* ptep = mkptep_va(mnt, va); + pte_t pte = mkpte(pa, prot); - if ((options & VMAP_NOMAP)) { - return 1; - } + set_pte(ptep, pte); - if (!(options & VMAP_GUARDPAGE)) { - l2pt->entry[l2_inx] = NEW_L2_ENTRY(attr, pa); - } else { - l2pt->entry[l2_inx] = MEMGUARD; - } - return 1; } ptr_t vmm_del_mapping(ptr_t mnt, ptr_t va) { - assert(((ptr_t)va & 0xFFFU) == 0); - - u32_t l1_index = L1_INDEX(va); - u32_t l2_index = L2_INDEX(va); - - // prevent unmap of recursive mapping region - if (l1_index == 1023) { - return 0; - } + assert(!va_offset(va)); - x86_page_table* l1pt = (x86_page_table*)(mnt | (1023 << 12)); + pte_t* ptep = mkptep_va(mnt, va); - x86_pte_t l1pte = l1pt->entry[l1_index]; - - if (l1pte) { - x86_page_table* l2pt = (x86_page_table*)(mnt | (l1_index << 12)); - x86_pte_t l2pte = l2pt->entry[l2_index]; - - cpu_flush_page(va); - l2pt->entry[l2_index] = PTE_NULL; - - return PG_ENTRY_ADDR(l2pte); - } - - return 0; -} - -int -vmm_lookup(ptr_t va, v_mapping* mapping) -{ - return vmm_lookupat(VMS_SELF, va, mapping); -} + pte_t old = *ptep; -int -vmm_lookupat(ptr_t mnt, ptr_t va, v_mapping* mapping) -{ - u32_t l1_index = L1_INDEX(va); - u32_t l2_index = L2_INDEX(va); - - x86_page_table* l1pt = (x86_page_table*)(mnt | 1023 << 12); - x86_pte_t l1pte = l1pt->entry[l1_index]; - - if (l1pte) { - x86_pte_t* l2pte = - &((x86_page_table*)(mnt | (l1_index << 12)))->entry[l2_index]; - - if (l2pte) { - mapping->flags = PG_ENTRY_FLAGS(*l2pte); - mapping->pa = PG_ENTRY_ADDR(*l2pte); - mapping->pn = mapping->pa >> PG_SIZE_BITS; - mapping->pte = l2pte; - mapping->va = va; - return 1; - } - } + set_pte(ptep, null_pte); - return 0; + return pte_paddr(old); } -ptr_t -vmm_v2p(ptr_t va) +pte_t +vmm_tryptep(pte_t* ptep, size_t lvl_size) { - u32_t l1_index = L1_INDEX(va); - u32_t l2_index = L2_INDEX(va); - - x86_page_table* l1pt = (x86_page_table*)L1_BASE_VADDR; - x86_pte_t l1pte = l1pt->entry[l1_index]; - - if (l1pte) { - x86_pte_t* l2pte = - &((x86_page_table*)L2_VADDR(l1_index))->entry[l2_index]; - - if (l2pte) { - return PG_ENTRY_ADDR(*l2pte) | ((ptr_t)va & 0xfff); - } - } - return 0; + ptr_t va = ptep_va(ptep, lvl_size); + pte_t* _ptep = mkl0tep(ptep); + pte_t pte; + + if (pte_isnull(pte = *_ptep) || _ptep == ptep) + return pte; + +#if LnT_ENABLED(1) + _ptep = getl1tep(_ptep, va); + if (_ptep == ptep || pte_isnull(pte = *_ptep)) + return pte; +#endif +#if LnT_ENABLED(2) + _ptep = getl2tep(_ptep, va); + if (_ptep == ptep || pte_isnull(pte = *_ptep)) + return pte; +#endif +#if LnT_ENABLED(3) + _ptep = getl3tep(_ptep, va); + if (_ptep == ptep || pte_isnull(pte = *_ptep)) + return pte; +#endif + _ptep = getlftep(_ptep, va); + return *_ptep; } ptr_t vmm_v2pat(ptr_t mnt, ptr_t va) { - u32_t l1_index = L1_INDEX(va); - u32_t l2_index = L2_INDEX(va); - - x86_page_table* l1pt = (x86_page_table*)(mnt | 1023 << 12); - x86_pte_t l1pte = l1pt->entry[l1_index]; - - if (l1pte) { - x86_pte_t* l2pte = - &((x86_page_table*)(mnt | (l1_index << 12)))->entry[l2_index]; + ptr_t va_off = va_offset(va); + pte_t* ptep = mkptep_va(mnt, va); - if (l2pte) { - return PG_ENTRY_ADDR(*l2pte) | ((ptr_t)va & 0xfff); - } - } - return 0; + return pte_paddr(pte_at(ptep)) + va_off; } ptr_t -vmm_mount_pd(ptr_t mnt, ptr_t pde) +vms_mount(ptr_t mnt, ptr_t vms_root) { - assert(pde); + assert(vms_root); - x86_page_table* l1pt = (x86_page_table*)L1_BASE_VADDR; - l1pt->entry[(mnt >> 22)] = NEW_L1_ENTRY(T_SELF_REF_PERM, pde); + pte_t* ptep = mkl0tep_va(VMS_SELF, mnt); + set_pte(ptep, mkpte(vms_root, KERNEL_DATA)); cpu_flush_page(mnt); return mnt; } ptr_t -vmm_unmount_pd(ptr_t mnt) +vms_unmount(ptr_t mnt) { - x86_page_table* l1pt = (x86_page_table*)L1_BASE_VADDR; - l1pt->entry[(mnt >> 22)] = 0; + pte_t* ptep = mkl0tep_va(VMS_SELF, mnt); + set_pte(ptep, null_pte); cpu_flush_page(mnt); return mnt; } -ptr_t -vmm_dup_page(ptr_t pa) + +void +ptep_alloc_hierarchy(pte_t* ptep, ptr_t va, pte_attr_t prot) { - ptr_t new_ppg = pmm_alloc_page(0); - vmm_set_mapping(VMS_SELF, PG_MOUNT_3, new_ppg, PG_PREM_RW, VMAP_NULL); - vmm_set_mapping(VMS_SELF, PG_MOUNT_4, pa, PG_PREM_RW, VMAP_NULL); + pte_t* _ptep; + + _ptep = mkl0tep(ptep); + if (_ptep == ptep) { + return; + } - asm volatile("movl %1, %%edi\n" - "movl %2, %%esi\n" - "rep movsl\n" ::"c"(1024), - "r"(PG_MOUNT_3), - "r"(PG_MOUNT_4) - : "memory", "%edi", "%esi"); + _ptep = mkl1t(_ptep, va, prot); + if (_ptep == ptep) { + return; + } - vmm_del_mapping(VMS_SELF, PG_MOUNT_3); - vmm_del_mapping(VMS_SELF, PG_MOUNT_4); + _ptep = mkl2t(_ptep, va, prot); + if (_ptep == ptep) { + return; + } + + _ptep = mkl3t(_ptep, va, prot); + if (_ptep == ptep) { + return; + } - return new_ppg; + _ptep = mklft(_ptep, va, prot); + assert(_ptep == ptep); } \ No newline at end of file diff --git a/lunaix-os/kernel/process/fork.c b/lunaix-os/kernel/process/fork.c index f087028..9518c19 100644 --- a/lunaix-os/kernel/process/fork.c +++ b/lunaix-os/kernel/process/fork.c @@ -25,24 +25,21 @@ region_maybe_cow(struct mm_region* region) return; } - ptr_t start_vpn = PN(region->start); - ptr_t end_vpn = PN(region->end); - for (size_t i = start_vpn; i <= end_vpn; i++) { - x86_pte_t* curproc = &PTE_MOUNTED(VMS_SELF, i); - x86_pte_t* newproc = &PTE_MOUNTED(VMS_MOUNT_1, i); + pfn_t start_pn = pfn(region->start); + pfn_t end_pn = pfn(region->end); + + for (size_t i = start_pn; i <= end_pn; i++) { + pte_t* self = mkptep_pn(VMS_SELF, i); + pte_t* guest = mkptep_pn(VMS_MOUNT_1, i); - cpu_flush_page((ptr_t)newproc); + cpu_flush_page(page_addr(ptep_pfn(self))); if ((attr & REGION_MODE_MASK) == REGION_RSHARED) { - // 如果读共享,则将两者的都标注为只读,那么任何写入都将会应用COW策略。 - cpu_flush_page((ptr_t)curproc); - cpu_flush_page((ptr_t)(i << 12)); - - *curproc = *curproc & ~PG_WRITE; - *newproc = *newproc & ~PG_WRITE; + set_pte(self, pte_mkwprotect(*self)); + set_pte(guest, pte_mkwprotect(*guest)); } else { // 如果是私有页,则将该页从新进程中移除。 - *newproc = 0; + set_pte(guest, null_pte); } } } @@ -62,22 +59,24 @@ __dup_fdtable(struct proc_info* pcb) static void __dup_kernel_stack(struct thread* thread, ptr_t vm_mnt) { - ptr_t kstack_pn = PN(current_thread->kstack); + ptr_t kstack_pn = pfn(current_thread->kstack); + kstack_pn -= pfn(KSTACK_SIZE) - 1; // copy the kernel stack - for (size_t i = 0; i < PN(KSTACK_SIZE); i++) { - volatile x86_pte_t* orig_ppte = &PTE_MOUNTED(VMS_SELF, kstack_pn); - x86_pte_t p = *orig_ppte; - ptr_t kstack = kstack_pn * PG_SIZE; + pte_t* src_ptep = mkptep_pn(VMS_SELF, kstack_pn); + pte_t* dest_ptep = mkptep_pn(vm_mnt, kstack_pn); + for (size_t i = 0; i < pfn(KSTACK_SIZE); i++) { + pte_t p = *src_ptep; - if (guardian_page(p)) { - vmm_set_mapping(vm_mnt, kstack, 0, 0, VMAP_GUARDPAGE); + if (pte_isguardian(p)) { + set_pte(dest_ptep, guard_pte); } else { - ptr_t ppa = vmm_dup_page(PG_ENTRY_ADDR(p)); - vmm_set_mapping(vm_mnt, kstack, ppa, p & 0xfff, 0); + ptr_t ppa = vmm_dup_page(pte_paddr(p)); + set_pte(dest_ptep, pte_setpaddr(p, ppa)); } - kstack_pn--; + src_ptep++; + dest_ptep++; } } @@ -166,14 +165,14 @@ dup_proc() } __dup_fdtable(pcb); - procvm_dup(pcb); - vmm_mount_pd(VMS_MOUNT_1, vmroot(pcb)); + struct proc_mm* mm = vmspace(pcb); + procvm_dupvms_mount(mm); struct thread* main_thread = dup_active_thread(VMS_MOUNT_1, pcb); if (!main_thread) { syscall_result(ENOMEM); - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); delete_process(pcb); return -1; } @@ -185,7 +184,7 @@ dup_proc() region_maybe_cow(pos); } - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); commit_process(pcb); commit_thread(main_thread); diff --git a/lunaix-os/kernel/process/process.c b/lunaix-os/kernel/process/process.c index 91fe240..cdaa12a 100644 --- a/lunaix-os/kernel/process/process.c +++ b/lunaix-os/kernel/process/process.c @@ -61,23 +61,22 @@ int spawn_process(struct thread** created, ptr_t entry, bool with_ustack) { struct proc_info* kproc = alloc_process(); + struct proc_mm* mm = vmspace(kproc); - procvm_init_clean(kproc); - - vmm_mount_pd(VMS_MOUNT_1, vmroot(kproc)); + procvm_initvms_mount(mm); - struct thread* kthread = create_thread(kproc, VMS_MOUNT_1, with_ustack); + struct thread* kthread = create_thread(kproc, with_ustack); if (!kthread) { - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); delete_process(kproc); return -1; } commit_process(kproc); - start_thread(kthread, VMS_MOUNT_1, entry); + start_thread(kthread, entry); - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); if (created) { *created = kthread; @@ -92,17 +91,16 @@ spawn_process_usr(struct thread** created, char* path, { // FIXME remote injection of user stack not yet implemented - struct proc_info* proc = alloc_process(); + struct proc_info* proc = alloc_process(); + struct proc_mm* mm = vmspace(proc); assert(!kernel_process(proc)); - procvm_init_clean(proc); - - vmm_mount_pd(VMS_MOUNT_1, vmroot(proc)); + procvm_initvms_mount(mm); int errno = 0; struct thread* main_thread; - if (!(main_thread = create_thread(proc, VMS_MOUNT_1, true))) { + if (!(main_thread = create_thread(proc, true))) { errno = ENOMEM; goto fail; } @@ -114,17 +112,17 @@ spawn_process_usr(struct thread** created, char* path, } commit_process(proc); - start_thread(main_thread, VMS_MOUNT_1, container.exe.entry); + start_thread(main_thread, container.exe.entry); if (created) { *created = main_thread; } - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); return 0; fail: - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount(mm); delete_process(proc); return errno; } diff --git a/lunaix-os/kernel/process/sched.c b/lunaix-os/kernel/process/sched.c index c92cde9..bdd9736 100644 --- a/lunaix-os/kernel/process/sched.c +++ b/lunaix-os/kernel/process/sched.c @@ -23,8 +23,10 @@ #include +struct thread empty_thread_obj; + volatile struct proc_info* __current; -volatile struct thread* current_thread; +volatile struct thread* current_thread = &empty_thread_obj; struct scheduler sched_ctx; @@ -53,6 +55,7 @@ run(struct thread* thread) thread->process->state = PS_RUNNING; thread->process->th_active = thread; + procvm_mount_self(vmspace(thread->process)); set_current_executing(thread); switch_context(); @@ -80,9 +83,11 @@ cleanup_detached_threads() { continue; } - vmm_mount_pd(VMS_MOUNT_1, vmroot(pos->process)); - destory_thread(VMS_MOUNT_1, pos); - vmm_unmount_pd(VMS_MOUNT_1); + struct proc_mm* mm = vmspace(pos->process); + + procvm_mount(mm); + destory_thread(pos); + procvm_unmount(mm); i++; } @@ -173,8 +178,10 @@ schedule() if (!(current_thread->state & ~PS_RUNNING)) { current_thread->state = PS_READY; __current->state = PS_READY; + } + procvm_unmount_self(vmspace(__current)); check_sleepers(); // round-robin scheduler @@ -447,7 +454,7 @@ commit_process(struct proc_info* process) } void -destory_thread(ptr_t vm_mnt, struct thread* thread) +destory_thread(struct thread* thread) { cake_ensure_valid(thread); @@ -458,7 +465,7 @@ destory_thread(ptr_t vm_mnt, struct thread* thread) llist_delete(&thread->sleep.sleepers); waitq_cancel_wait(&thread->waitqueue); - thread_release_mem(thread, vm_mnt); + thread_release_mem(thread); proc->thread_count--; sched_ctx.ttable_len--; @@ -470,6 +477,7 @@ void delete_process(struct proc_info* proc) { pid_t pid = proc->pid; + struct proc_mm* mm = vmspace(proc); assert(pid); // long live the pid0 !! @@ -503,17 +511,15 @@ delete_process(struct proc_info* proc) signal_free_registers(proc->sigreg); - vmm_mount_pd(VMS_MOUNT_1, vmroot(proc)); + procvm_mount(mm); struct thread *pos, *n; llist_for_each(pos, n, &proc->threads, proc_sibs) { // terminate and destory all thread unconditionally - destory_thread(VMS_MOUNT_1, pos); + destory_thread(pos); } - procvm_cleanup(VMS_MOUNT_1, proc); - - vmm_unmount_pd(VMS_MOUNT_1); + procvm_unmount_release(mm); cake_release(proc_pile, proc); } diff --git a/lunaix-os/kernel/process/signal.c b/lunaix-os/kernel/process/signal.c index a132de8..9f5357e 100644 --- a/lunaix-os/kernel/process/signal.c +++ b/lunaix-os/kernel/process/signal.c @@ -11,8 +11,6 @@ #include -// FIXME issues with signal - LOG_MODULE("SIG") extern struct scheduler sched_ctx; /* kernel/sched.c */ diff --git a/lunaix-os/kernel/process/thread.c b/lunaix-os/kernel/process/thread.c index 97e37a2..e74718e 100644 --- a/lunaix-os/kernel/process/thread.c +++ b/lunaix-os/kernel/process/thread.c @@ -3,7 +3,6 @@ #include #include #include -#include #include #include #include @@ -15,12 +14,6 @@ LOG_MODULE("THREAD") -static inline void -inject_guardian_page(ptr_t vm_mnt, ptr_t va) -{ - vmm_set_mapping(vm_mnt, PG_ALIGN(va), 0, 0, VMAP_GUARDPAGE); -} - static ptr_t __alloc_user_thread_stack(struct proc_info* proc, struct mm_region** stack_region, ptr_t vm_mnt) { @@ -43,65 +36,63 @@ __alloc_user_thread_stack(struct proc_info* proc, struct mm_region** stack_regio return 0; } - // Pre-allocate a page contains stack top, to avoid immediate trap to kernel - // upon thread execution - ptr_t pa = pmm_alloc_page(0); - ptr_t stack_top = align_stack(th_stack_top + USR_STACK_SIZE - 1); - if (likely(pa)) { - vmm_set_mapping(vm_mnt, PG_ALIGN(stack_top), - pa, region_ptattr(vmr), 0); - } - - inject_guardian_page(vm_mnt, vmr->start); + set_pte(mkptep_va(vm_mnt, vmr->start), guard_pte); *stack_region = vmr; + ptr_t stack_top = align_stack(th_stack_top + USR_STACK_SIZE - 1); return stack_top; } static ptr_t __alloc_kernel_thread_stack(struct proc_info* proc, ptr_t vm_mnt) { - v_mapping mapping; - ptr_t kstack = PG_ALIGN(KSTACK_AREA_END - KSTACK_SIZE); - while (kstack >= KSTACK_AREA) { + pfn_t kstack_top = leaf_count(KSTACK_AREA_END); + pfn_t kstack_end = pfn(KSTACK_AREA); + pte_t* ptep = mkptep_pn(vm_mnt, kstack_top); + while (ptep_pfn(ptep) > kstack_end) { + ptep -= KSTACK_PAGES; + // first page in the kernel stack is guardian page - if (!vmm_lookupat(vm_mnt, kstack + MEM_PAGE, &mapping) - || !PG_IS_PRESENT(mapping.flags)) - { - break; + pte_t pte = *(ptep + 1); + if (pte_isnull(pte)) { + goto found; } - - kstack -= KSTACK_SIZE; } - if (kstack < KSTACK_AREA) { - WARN("failed to create kernel stack: max stack num reach\n"); - return 0; - } + WARN("failed to create kernel stack: max stack num reach\n"); + return 0; - ptr_t pa = pmm_alloc_cpage(PN(KSTACK_SIZE) - 1, 0); +found:; + ptr_t pa = pmm_alloc_cpage(KSTACK_PAGES - 1, 0); if (!pa) { WARN("failed to create kernel stack: nomem\n"); return 0; } - inject_guardian_page(vm_mnt, kstack); - for (size_t i = MEM_PAGE, j = 0; i < KSTACK_SIZE; i+=MEM_PAGE, j+=MEM_PAGE) { - vmm_set_mapping(vm_mnt, kstack + i, pa + j, PG_PREM_RW, 0); - } + set_pte(ptep, guard_pte); + + pte_t pte = mkpte(pa, KERNEL_DATA); + vmm_set_ptes_contig(ptep + 1, pte, LFT_SIZE, KSTACK_PAGES - 1); - return align_stack(kstack + KSTACK_SIZE - 1); + ptep += KSTACK_PAGES; + return align_stack(ptep_va(ptep, LFT_SIZE) - 1); } void -thread_release_mem(struct thread* thread, ptr_t vm_mnt) +thread_release_mem(struct thread* thread) { - for (size_t i = 0; i < KSTACK_SIZE; i+=MEM_PAGE) { - ptr_t stack_page = PG_ALIGN(thread->kstack - i); - vmm_del_mapping(vm_mnt, stack_page); - } + struct proc_mm* mm = vmspace(thread->process); + ptr_t vm_mnt = mm->vm_mnt; + + // Ensure we have mounted + assert(vm_mnt); + + pte_t* ptep = mkptep_va(vm_mnt, thread->kstack); + + ptep -= KSTACK_PAGES - 1; + vmm_unset_ptes(ptep, KSTACK_PAGES); if (thread->ustack) { if ((thread->ustack->start & 0xfff)) { @@ -112,8 +103,12 @@ thread_release_mem(struct thread* thread, ptr_t vm_mnt) } struct thread* -create_thread(struct proc_info* proc, ptr_t vm_mnt, bool with_ustack) +create_thread(struct proc_info* proc, bool with_ustack) { + struct proc_mm* mm = vmspace(proc); + assert(mm->vm_mnt); + + ptr_t vm_mnt = mm->vm_mnt; struct mm_region* ustack_region = NULL; if (with_ustack && !(__alloc_user_thread_stack(proc, &ustack_region, vm_mnt))) @@ -139,9 +134,12 @@ create_thread(struct proc_info* proc, ptr_t vm_mnt, bool with_ustack) } void -start_thread(struct thread* th, ptr_t vm_mnt, ptr_t entry) +start_thread(struct thread* th, ptr_t entry) { assert(th && entry); + struct proc_mm* mm = vmspace(th->process); + + assert(mm->vm_mnt); struct transfer_context transfer; if (!kernel_addr(entry)) { @@ -157,7 +155,7 @@ start_thread(struct thread* th, ptr_t vm_mnt, ptr_t entry) thread_create_kernel_transfer(&transfer, th->kstack, entry); } - inject_transfer_context(vm_mnt, &transfer); + inject_transfer_context(mm->vm_mnt, &transfer); th->intr_ctx = (isr_param*)transfer.inject; commit_thread(th); @@ -185,12 +183,12 @@ thread_find(struct proc_info* proc, tid_t tid) __DEFINE_LXSYSCALL4(int, th_create, tid_t*, tid, struct uthread_info*, thinfo, void*, entry, void*, param) { - struct thread* th = create_thread(__current, VMS_SELF, true); + struct thread* th = create_thread(__current, true); if (!th) { return EAGAIN; } - start_thread(th, VMS_SELF, (ptr_t)entry); + start_thread(th, (ptr_t)entry); ptr_t ustack_top = th->ustack_top; *((void**)ustack_top) = param; @@ -234,7 +232,7 @@ __DEFINE_LXSYSCALL2(int, th_join, tid_t, tid, void**, val_ptr) *val_ptr = (void*)th->exit_val; } - destory_thread(VMS_SELF, th); + destory_thread(th); return 0; } diff --git a/lunaix-os/kernel/spike.c b/lunaix-os/kernel/spike.c index ad751e1..91fe9f6 100644 --- a/lunaix-os/kernel/spike.c +++ b/lunaix-os/kernel/spike.c @@ -1,19 +1,19 @@ #include #include #include +#include +#include -static char buffer[1024]; +LOG_MODULE("spike") void noret __assert_fail(const char* expr, const char* file, unsigned int line) { - ksprintf(buffer, "%s (%s:%u)", expr, file, line); - - // Here we load the buffer's address into %edi ("D" constraint) - // This is a convention we made that the LUNAIX_SYS_PANIC syscall will - // print the panic message passed via %edi. (see - // kernel/asm/x86/interrupts.c) - cpu_trap_panic(buffer); + // Don't do another trap, print it right-away, allow + // the stack context being preserved + cpu_disable_interrupt(); + ERROR("assertion fail (%s:%u)\n\t%s", file, line, expr); + trace_printstack(); spin(); // never reach } @@ -24,15 +24,3 @@ panick(const char* msg) cpu_trap_panic(msg); spin(); } - -void -panickf(const char* fmt, ...) -{ - va_list args; - va_start(args, fmt); - ksnprintfv(buffer, fmt, 1024, args); - va_end(args); - - asm("int %0" ::"i"(LUNAIX_SYS_PANIC), "D"(buffer)); - spin(); -} diff --git a/lunaix-os/link/linker.ld b/lunaix-os/link/linker.ld index a409542..34833f0 100644 --- a/lunaix-os/link/linker.ld +++ b/lunaix-os/link/linker.ld @@ -2,6 +2,7 @@ ENTRY(start_) SECTIONS { . = 0x100000; + __kboot_start = .; /* 这里是我们的高半核初始化代码段和数据段 */ .boot.text BLOCK(4K) : @@ -29,7 +30,7 @@ SECTIONS { { *(.boot.rodata) } - __kexec_boot_end = ALIGN(4K); + __kboot_end = ALIGN(4K); /* ---- boot end ---- */ diff --git a/lunaix-os/makeinc/qemu.mkinc b/lunaix-os/makeinc/qemu.mkinc index e81f3fa..1fc8a91 100644 --- a/lunaix-os/makeinc/qemu.mkinc +++ b/lunaix-os/makeinc/qemu.mkinc @@ -2,6 +2,7 @@ QEMU_MON_TERM := gnome-terminal QEMU_MON_PORT := 45454 get_qemu_options = -s -S -m 1G \ + -smp 1 \ -rtc base=utc \ -no-reboot \ -machine q35 \ @@ -14,4 +15,4 @@ get_qemu_options = -s -S -m 1G \ -drive id=cdrom,file="$(1)",readonly=on,if=none,format=raw \ -device ahci,id=ahci \ -device ide-cd,drive=cdrom,bus=ahci.0 \ - -monitor telnet::$(QEMU_MON_PORT),server,nowait & \ No newline at end of file + -monitor telnet::$(QEMU_MON_PORT),server,nowait,logfile=qm.log & \ No newline at end of file diff --git a/lunaix-os/makeinc/toolchain.mkinc b/lunaix-os/makeinc/toolchain.mkinc index c63f846..7421f45 100644 --- a/lunaix-os/makeinc/toolchain.mkinc +++ b/lunaix-os/makeinc/toolchain.mkinc @@ -9,7 +9,8 @@ STRIP_OSDEP_LD := -nostdlib -nolibc -z noexecstack -no-pie -Wl,--build-id=none ARCH_OPT := -m32 -D__ARCH_IA32 O := -O2 -W := -Wall -Wextra -Werror -Wno-unknown-pragmas \ +W := -Wall -Wextra -Werror \ + -Wno-unknown-pragmas \ -Wno-unused-function \ -Wno-unused-variable\ -Wno-unused-but-set-variable \ diff --git a/lunaix-os/scripts/gdb/lunadbg/__init__.py b/lunaix-os/scripts/gdb/lunadbg/__init__.py index 0bd147c..b1bf38a 100644 --- a/lunaix-os/scripts/gdb/lunadbg/__init__.py +++ b/lunaix-os/scripts/gdb/lunadbg/__init__.py @@ -7,9 +7,12 @@ from .region_dump import MemoryRegionDump from .sched_dump import ProcessDump, SchedulerDump from .mem import MMStats from .syslog import SysLogDump +from .pte_utils import PteInterpreter, PtepInterpreter MemoryRegionDump() SchedulerDump() ProcessDump() SysLogDump() MMStats() +PtepInterpreter() +PteInterpreter() \ No newline at end of file diff --git a/lunaix-os/scripts/gdb/lunadbg/arch/x86/__init__.py b/lunaix-os/scripts/gdb/lunadbg/arch/x86/__init__.py index 68cf165..2ee9e2d 100644 --- a/lunaix-os/scripts/gdb/lunadbg/arch/x86/__init__.py +++ b/lunaix-os/scripts/gdb/lunadbg/arch/x86/__init__.py @@ -1,6 +1,6 @@ import os -if os.environ["LUNADBG_ARCH"] == 'x86_32': - from .pte import PageTableHelper32 as PageTableHelper -else: +if os.environ["LUNADBG_ARCH"] == 'x86_64': from .pte import PageTableHelper64 as PageTableHelper +else: + from .pte import PageTableHelper32 as PageTableHelper diff --git a/lunaix-os/scripts/gdb/lunadbg/arch/x86/pte.py b/lunaix-os/scripts/gdb/lunadbg/arch/x86/pte.py index 5785376..444cb3f 100644 --- a/lunaix-os/scripts/gdb/lunadbg/arch/x86/pte.py +++ b/lunaix-os/scripts/gdb/lunadbg/arch/x86/pte.py @@ -38,23 +38,31 @@ class PageTableHelperBase: @staticmethod def vaddr_width(): raise NotImplementedError() + + @staticmethod + def pte_size(): + raise NotImplementedError() class PageTableHelper32(PageTableHelperBase): @staticmethod def translation_level(level = -1): return [0, 1][level] + @staticmethod + def pgtable_len(): + return (1 << 10) + @staticmethod def translation_shift_bits(level): - return [9, 0][level] + 12 + return [10, 0][level] + 12 @staticmethod def mapping_present(pte): return bool(pte & 1) @staticmethod - def huge_page(pte): - return bool(pte & (1 << 7)) + def huge_page(pte, po): + return bool(pte & (1 << 7)) and po @staticmethod def protections(pte): @@ -96,6 +104,10 @@ class PageTableHelper32(PageTableHelperBase): @staticmethod def vaddr_width(): return 32 + + @staticmethod + def pte_size(): + return 4 class PageTableHelper64(PageTableHelperBase): pass \ No newline at end of file diff --git a/lunaix-os/scripts/gdb/lunadbg/commands.py b/lunaix-os/scripts/gdb/lunadbg/commands.py index 9df5983..ec3fe82 100644 --- a/lunaix-os/scripts/gdb/lunadbg/commands.py +++ b/lunaix-os/scripts/gdb/lunadbg/commands.py @@ -1,6 +1,7 @@ from gdb import Command, COMMAND_USER import argparse +import shlex class LunadbgCommand(Command): def __init__(self, name: str) -> None: @@ -8,10 +9,21 @@ class LunadbgCommand(Command): self._parser = argparse.ArgumentParser() def _parse_args(self, gdb_argstr: str): - args, argv = self._parser.parse_known_args(gdb_argstr.strip().split(' '), None) - if argv: - print('unrecognized arguments: %s'%(' '.join(argv))) - print(self._parser.format_usage()) - print(self._parser.format_help()) - return None - return args \ No newline at end of file + try: + args, argv = self._parser.parse_known_args(shlex.split(gdb_argstr), None) + if argv: + print('unrecognized arguments: %s'%(' '.join(argv))) + else: + return args + except SystemExit: + pass + return None + + def invoke(self, argument: str, from_tty: bool) -> None: + parsed = self._parse_args(argument) + if not parsed: + return + self.on_execute(parsed, argument, from_tty) + + def on_execute(self, parsed, gdb_args, from_tty): + raise NotImplementedError() \ No newline at end of file diff --git a/lunaix-os/scripts/gdb/lunadbg/mem.py b/lunaix-os/scripts/gdb/lunadbg/mem.py index 8829e1b..558ad3a 100644 --- a/lunaix-os/scripts/gdb/lunadbg/mem.py +++ b/lunaix-os/scripts/gdb/lunadbg/mem.py @@ -1,7 +1,7 @@ from .commands import LunadbgCommand from .pp import MyPrettyPrinter from .profiling.pmstat import PhysicalMemProfile -from .structs.pagetable import PageTable, PageTableEntry +from .structs.pagetable import PageTable class MMStats(LunadbgCommand): def __init__(self) -> None: @@ -27,16 +27,17 @@ class MMStats(LunadbgCommand): pmem.rescan_pmem(optn.granule) pp.printf("Total: %dKiB (%d@4K)", - pmem.max_mem_sz, pmem.max_mem_pg) + pmem.max_mem_sz / 1024, pmem.max_mem_pg) pp.printf("Used: %dKiB (%d@4K) ~%.2f%%", - pmem.consumed_pg * 4096, + pmem.consumed_pg * 4096 / 1024, pmem.consumed_pg, pmem.utilisation * 100) pp.printf("Fragmentations: %d ~%.2f%%", pmem.discontig, pmem.fragmentation * 100) pp.print() pp.print("Distribution") + pp.print("( . = empty, * = full, [0-9]0% full )") pp2 = pp.next_level(2) row = [] for i in range(0, len(pmem.mem_distr)): @@ -45,7 +46,7 @@ class MMStats(LunadbgCommand): if ratio == 0: row.append('.') elif ratio == 1: - row.append('F') + row.append('*') else: row.append(str(cat)) @@ -79,8 +80,7 @@ class MMStats(LunadbgCommand): else: print("unknow mem type:", optn.state_type) - def invoke(self, argument: str, from_tty: bool) -> None: - optn = self._parse_args(argument) + def on_execute(self, optn, gdb_args, from_tty) -> None: pp = MyPrettyPrinter() if optn.cmd == 'stats': diff --git a/lunaix-os/scripts/gdb/lunadbg/pte_utils.py b/lunaix-os/scripts/gdb/lunadbg/pte_utils.py new file mode 100644 index 0000000..051bb48 --- /dev/null +++ b/lunaix-os/scripts/gdb/lunadbg/pte_utils.py @@ -0,0 +1,98 @@ +from .commands import LunadbgCommand +from .structs.pagetable import PageTable, PageTableEntry +from .pp import MyPrettyPrinter +from gdb import parse_and_eval, lookup_type + +class PteInterpreter(LunadbgCommand): + def __init__(self) -> None: + super().__init__("pte") + + self._parser.description = "Interpret the PTE based on give raw value or ptep" + self._parser.add_argument("val") + self._parser.add_argument("--va", action='store_true', default=False, + help="treat the given as virtual address") + + self._parser.add_argument("--ptep", action='store_true', default=False, + help="treat the given as ptep") + + self._parser.add_argument('-l', "--at-level", type=int, default=-1, + help="translation level that the given virtual address located") + + self._parser.add_argument('-m', "--mnt", default=-1, + help="vms mount point that the given virtual address located") + + @staticmethod + def print_pte(pp, pte_val, level): + pte = PageTableEntry.from_pteval(pte_val, level) + pp.print(pte) + + @staticmethod + def print_ptep(pp, ptep, level): + pte = PageTableEntry(ptep, level) + pp.print(pte) + + def on_execute(self, parsed, gdb_args, from_tty): + pp = MyPrettyPrinter() + + val = int(parse_and_eval(parsed.val)) + lvl = parsed.at_level + if not parsed.va: + PteInterpreter.print_pte(pp, val, lvl) + return + + if not parsed.ptep: + ptep = PageTable.mkptep_at(parsed.mnt, val, lvl) + PteInterpreter.print_ptep(pp, ptep, lvl) + return + + PteInterpreter.print_ptep(pp, val, lvl) + +class PtepInterpreter(LunadbgCommand): + def __init__(self) -> None: + super().__init__("ptep") + + self._parser.description = "Manipulate the pte pointer" + self._parser.add_argument("ptep") + self._parser.add_argument("--pfn", action='store_true', default=False, + help="get the pfn (relative to mount point) implied by this ptep") + + self._parser.add_argument("--vfn", action='store_true', default=False, + help="get the vfn implied by this ptep") + + self._parser.add_argument("--level", action='store_true', default=False, + help="estimate the translation level implied by this ptep") + + self._parser.add_argument("--to-level", type=int, default=None, + help="convert given ptep to specified level before any other processing") + + self._parser.add_argument("--sn", action='store_true', default=False, + help="shift the ptep to next translation level") + + self._parser.add_argument("--sp", action='store_true', default=False, + help="shift the ptep to previous translation level") + + + def on_execute(self, parsed, gdb_args, from_tty): + pp = MyPrettyPrinter() + + ptep = int(parse_and_eval(parsed.ptep)) + if parsed.to_level is not None: + ptep = PageTable.get_lntep(ptep, parsed.to_level) + pp.printf("ptep: 0x%016x", ptep) + + if parsed.pfn: + ptep = PageTable.get_pfn(ptep) + pp.set_prefix("pfn: ") + elif parsed.vfn: + ptep = PageTable.get_vfn(ptep) + pp.set_prefix("vfn: ") + elif parsed.level: + l, m = PageTable.ptep_infer_level(ptep) + pp.printf("Level %d ptep (mnt=0x%016x, vfn=%d)", l, m, PageTable.get_vfn(ptep)) + return + elif parsed.sn: + ptep = PageTable.shift_ptep_nextlevel(ptep) + elif parsed.sp: + ptep = PageTable.shift_ptep_prevlevel(ptep) + + pp.printf("0x%016x", ptep) diff --git a/lunaix-os/scripts/gdb/lunadbg/sched_dump.py b/lunaix-os/scripts/gdb/lunadbg/sched_dump.py index 9621f02..4dce934 100644 --- a/lunaix-os/scripts/gdb/lunadbg/sched_dump.py +++ b/lunaix-os/scripts/gdb/lunadbg/sched_dump.py @@ -10,9 +10,9 @@ class ProcessDump(LunadbgCommand): def __init__(self) -> None: super().__init__("proc") - def invoke(self, argument: str, from_tty: bool) -> None: + def execute(self, parsed, gdb_args, from_tty): pp = MyPrettyPrinter() - ProcInfo.process_at(argument).print_detailed(pp) + ProcInfo.process_at(gdb_args).print_detailed(pp) class SchedulerDump(LunadbgCommand): @@ -23,11 +23,7 @@ class SchedulerDump(LunadbgCommand): self._parser.add_argument("-l", "--long-list", required=False, default=False, action='store_true') - def invoke(self, argument: str, from_tty: bool) -> None: - args = self._parse_args(argument) - if args is None: - return - + def on_execute(self, args, gdb_args, from_tty): sched_context = gdb.parse_and_eval("&sched_ctx") sched = Scheduler(sched_context) diff --git a/lunaix-os/scripts/gdb/lunadbg/structs/pagetable.py b/lunaix-os/scripts/gdb/lunadbg/structs/pagetable.py index 1ffc062..58ed0c8 100644 --- a/lunaix-os/scripts/gdb/lunadbg/structs/pagetable.py +++ b/lunaix-os/scripts/gdb/lunadbg/structs/pagetable.py @@ -3,22 +3,33 @@ from . import KernelStruct from ..arch import PageTableHelper as TLB class PageTableEntry(KernelStruct): - def __init__(self, gdb_inferior: Value, level, va) -> None: + def __init__(self, ptep, level, pte=None) -> None: self.level = level - self.pg_mask = self.get_page_mask() - self.va = va & ~self.pg_mask self.base_page_order = TLB.translation_shift_bits(-1) + self.ptep = ptep - ptep = gdb_inferior[va // (self.pg_mask + 1)].address - super().__init__(ptep, PageTableEntry) - - try: - self.pte = int(self._kstruct.dereference()) - except: - self.pte = 0 + super().__init__(Value(ptep), PageTableEntry) + + if pte: + self.pte = pte + self.va = None + else: + self.va = PageTable.va_at(ptep, level) + try: + self.pte = int(self._kstruct['val']) + except: + self.pte = 0 self.pa = TLB.physical_pfn(self.pte) << self.base_page_order + self.page_order = TLB.translation_shift_bits(self.level) + self.page_size = 1 << self.page_order + self.page_order -= self.base_page_order + + @staticmethod + def from_pteval(pte_val, level): + return PageTableEntry(0, level, pte=pte_val) + def print_abstract(self, pp, *args): self.print_detailed(pp, *args) @@ -30,24 +41,29 @@ class PageTableEntry(KernelStruct): pp.print("") return - page_order = TLB.translation_shift_bits(self.level) - page_order -= self.base_page_order - pp.printf("Level %d Translation", TLB.translation_level(self.level)) pp2 = pp.next_level() - pp2.printf("Entry value: 0x%x", self.pte) - pp2.printf("Virtual address: 0x%x (ptep=0x%x)", self.va, int(self._kstruct)) - pp2.printf("Mapped physical: 0x%x (order %d page)", self.pa, page_order) + pp2.printf("PTE raw value: 0x%016x", self.pte) + + if not self.va: + pp2.printf("Virtual address: (ptep=)") + else: + pp2.printf("Virtual address: 0x%016x (ptep=0x%016x)", self.va, int(self._kstruct)) + + pp2.printf("Mapped physical: 0x%016x (order %d page)", self.pa, self.page_order) pp2.printf("Page Protection: %s", self.get_page_prot()) pp2.printf("Present: %s", self.present()) - pp2.printf("Huge: %s", TLB.huge_page(self.pte)) + pp2.printf("Huge: %s", TLB.huge_page(self.pte, self.page_order)) pp2.print("Attributes:") pp2.next_level().print(self.get_attributes()) + + def leaf(self): + return TLB.huge_page(self.pte, self.page_order) or not self.page_order @staticmethod def get_type() -> Type: - return lookup_type("unsigned int").pointer() + return lookup_type("pte_t").pointer() def get_page_mask(self): return PageTableEntry.get_level_shift(self.level) - 1 @@ -60,7 +76,8 @@ class PageTableEntry(KernelStruct): def get_attributes(self): attrs = [ self.get_page_prot(), - *TLB.other_attributes(self.level, self.pte) ] + *TLB.other_attributes(self.level, self.pte), + "leaf" if self.leaf() else "root" ] return ', '.join(attrs) def null(self): @@ -76,22 +93,108 @@ class PageTableEntry(KernelStruct): @staticmethod def max_page_count(): return 1 << (TLB.vaddr_width() - TLB.translation_shift_bits(-1)) + + def pfn(self): + return (self.ptep & (((1 << TLB.translation_shift_bits(0)) - 1))) // TLB.pte_size() + + def vfn(self): + return (self.ptep & (((1 << TLB.translation_shift_bits(-1)) - 1))) // TLB.pte_size() class PageTable(): def __init__(self) -> None: - self.levels = [ - Value(0xFFFFF000).cast(PageTableEntry.get_type()), - Value(0xFFC00000).cast(PageTableEntry.get_type()) - ] + pass - def get_pte(self, va, level=-1) -> PageTableEntry: - return PageTableEntry(self.levels[level], level, va) + @staticmethod + def get_pfn(ptep): + pfn_mask = ((1 << TLB.translation_shift_bits(0)) - 1) + return ((ptep & pfn_mask) // TLB.pte_size()) + + @staticmethod + def get_vfn(ptep): + vfn_mask = ((1 << TLB.translation_shift_bits(-1)) - 1) + return ((ptep & vfn_mask) // TLB.pte_size()) + + @staticmethod + def mkptep_for(mnt, va): + mnt_mask = ~((1 << TLB.translation_shift_bits(0)) - 1) + offset = (TLB.physical_pfn(va) * TLB.pte_size()) & ~mnt_mask + + return (mnt & mnt_mask) | offset + + @staticmethod + def ptep_infer_level(ptep): + l = 0 + pfn = PageTable.get_pfn(ptep) + pfn = pfn << TLB.translation_shift_bits(-1) + vfn = (TLB.pgtable_len() - 1) + msk = vfn << TLB.translation_shift_bits(l) + max_l = TLB.translation_level() + + mnt = ptep & msk + + while (pfn & msk) == msk: + l+=1 + msk = vfn << TLB.translation_shift_bits(l) + if l == max_l: + break + + return (max_l - l, mnt) + + @staticmethod + def va_at(ptep, level): + vms_mask = ((1 << TLB.vaddr_width()) - 1) + + ptep = PageTable.get_pfn(ptep) << TLB.translation_shift_bits(level) + return ptep & vms_mask + + @staticmethod + def get_l0tep(ptep): + return PageTable.get_lntep(ptep, 0) + + @staticmethod + def get_lntep(ptep, level): + lnmask = (1 << TLB.translation_shift_bits(level)) - 1 + size = (1 << TLB.translation_shift_bits(-1)) + vpfn = (lnmask * size) & lnmask + offset = ((ptep // TLB.pte_size()) * size // (lnmask + 1)) & (size - 1) + + return (ptep & ~lnmask) | vpfn | (offset * TLB.pte_size()) + + @staticmethod + def mkptep_at(mnt, va, level): + lfmask = (1 << TLB.translation_shift_bits(-1)) - 1 + lsize = (1 << TLB.translation_shift_bits(level)) + offset = (va // lsize) * TLB.pte_size() + + return mnt | ((lsize - 1) & ~lfmask) | offset + + @staticmethod + def shift_ptep_nextlevel(ptep): + mnt_mask = ~((1 << TLB.translation_shift_bits(0)) - 1) + size = (1 << TLB.translation_shift_bits(-1)) + mnt = ptep & mnt_mask + vpfn = ((ptep // TLB.pte_size()) * size) & ~mnt_mask + + return mnt | vpfn + + @staticmethod + def shift_ptep_prevlevel(ptep): + mnt_mask = ~((1 << TLB.translation_shift_bits(0)) - 1) + self_mnt = (TLB.pgtable_len() - 1) * (~mnt_mask + 1) + unshifted = PageTable.get_pfn(ptep) << TLB.translation_shift_bits(-1) + unshifted = PageTable.mkptep_for(self_mnt, unshifted) + return PageTable.mkptep_for(ptep & mnt_mask, unshifted) def __print_pte_ranged(self, pp, pte_head, pte_tail): start_va = pte_head.va - end_va = pte_tail.va + + if pte_head == pte_tail: + end_va = pte_head.va + pte_head.page_size + else: + end_va = pte_tail.va + sz = end_va - start_va - if not (pte_head.null() and pte_tail.null()): + if not pte_head.null(): pp.printf("0x%016x...0x%016x, 0x%016x [0x%08x] %s", start_va, end_va - 1, pte_head.pa, sz, pte_head.get_attributes()) @@ -99,25 +202,64 @@ class PageTable(): pp.printfa("0x{:016x}...0x{:016x}, {:^18s} [0x{:08x}] ", start_va, end_va - 1, "n/a", sz) - def print_ptes_between(self, pp, va, va_end, level=-1): - shift = PageTableEntry.get_level_shift(level) - n = (va_end - va) // shift - self.print_ptes(pp, va, n, level) - - def print_ptes(self, pp, va, pte_num, level=-1): - head_pte = PageTableEntry(self.levels[level], level, va) - curr_pte = head_pte - va = head_pte.va + def __scan_pagetable(self, pp, start_ptep, end_ptep, max_level = -1): + ptep = PageTable.get_l0tep(start_ptep) + level = 0 + max_level = TLB.translation_level(max_level) + va_end = PageTable.va_at(end_ptep, -1) + 1 + head_pte = None + prev_pte = None pp.printfa("{:^18s} {:^18s} {:^18s} {:^10s} {:^20s}", "va-start", "va-end", "physical", "size", "attributes") - for i in range(1, pte_num): - va_ = va + i * PageTableEntry.get_level_shift(level) - curr_pte = PageTableEntry(self.levels[level], level, va_) + while PageTable.va_at(ptep, level) <= va_end: + pte = PageTableEntry(ptep, level) + if head_pte == None: + head_pte = pte + prev_pte = pte + + if pte.null(): + if not head_pte.null(): + self.__print_pte_ranged(pp, head_pte, prev_pte) + head_pte = pte + elif not pte.leaf() and level < max_level: + ptep = PageTable.shift_ptep_nextlevel(ptep) + level+=1 + continue + else: + if head_pte.null(): + self.__print_pte_ranged(pp, head_pte, prev_pte) + head_pte = pte + else: + n = pte.pfn() - head_pte.pfn() + pa = head_pte.pa + (n << pte.base_page_order) + if pa != pte.pa or not pte.same_kind_to(head_pte): + self.__print_pte_ranged(pp, head_pte, prev_pte) + head_pte = pte + + prev_pte = pte + if pte.vfn() == TLB.pgtable_len() - 1: + if level != 0: + ptep = PageTable.shift_ptep_prevlevel(ptep + TLB.pte_size()) + level-=1 + continue + break - if not curr_pte.same_kind_to(head_pte): - self.__print_pte_ranged(pp, head_pte, curr_pte) - head_pte = curr_pte + ptep += TLB.pte_size() + - if curr_pte != head_pte: - self.__print_pte_ranged(pp, head_pte, curr_pte) \ No newline at end of file + self.__print_pte_ranged(pp, head_pte, prev_pte) + + def print_ptes_between(self, pp, va, va_end, level=-1, mnt=0xFFC00000): + ptep_start = PageTable.mkptep_for(mnt, va) + ptep_end = PageTable.mkptep_for(mnt, va_end) + self.__scan_pagetable(pp, ptep_start, ptep_end, level) + + def get_pte(self, va, level=-1, mnt=0xFFC00000) -> PageTableEntry: + ptep = PageTable.mkptep_at(mnt, va, level) + return PageTableEntry(ptep, level) + + def print_ptes(self, pp, va, pte_num, level=-1, mnt=0xFFC00000): + ptep_start = PageTable.mkptep_for(mnt, va) + ptep_end = ptep_start + pte_num * TLB.pte_size() + self.__scan_pagetable(pp, ptep_start, ptep_end, level) \ No newline at end of file diff --git a/lunaix-os/scripts/gdb/lunadbg/syslog.py b/lunaix-os/scripts/gdb/lunadbg/syslog.py index e8a56c4..b7948d4 100644 --- a/lunaix-os/scripts/gdb/lunadbg/syslog.py +++ b/lunaix-os/scripts/gdb/lunadbg/syslog.py @@ -21,4 +21,4 @@ class SysLogDump(gdb.Command): head = log_recs.deref_and_access("kp_ents.ents").address ent_type = gdb.lookup_type("struct kp_entry").pointer() - llist_foreach(head, ent_type, "ents", lambda a,b: self.syslog_entry_callback(a, b)) \ No newline at end of file + llist_foreach(head, ent_type, "ents", lambda a,b: self.syslog_entry_callback(a, b), inclusive=False) \ No newline at end of file diff --git a/lunaix-os/scripts/templates/i386/config.json b/lunaix-os/scripts/templates/i386/config.json index 911bb0e..4b2a186 100644 --- a/lunaix-os/scripts/templates/i386/config.json +++ b/lunaix-os/scripts/templates/i386/config.json @@ -72,18 +72,11 @@ "stk_align": 16 }, { - "name": "kernel_exec", + "name": "kernel_img", "start": "3@1G", "size": "16@4M", "block": "1@page" }, - { - "$type": "list", - "$range": "[1..*vms_mnts]", - "name": "vms_mount_{index}", - "size": "1@4M", - "block": "1@huge" - }, { "$type": "list", "$range": "[1..*page_mnts]", @@ -94,6 +87,13 @@ "name": "vmap", "block": "1@huge" }, + { + "$type": "list", + "$range": "[1..*vms_mnts]", + "name": "vms_mount_{index}", + "size": "1@4M", + "block": "1@huge" + }, { "name": "pd_ref", "start": "1023@4M",