Lunaixsky's Repo - lunaix-os.git/commitdiff

This patch brings a functional multi-threading support to Lunaix kernel
together with essential syscalls to support POSIX's pthread interfacing.

About the threading model in Lunaix
Like the Linux kernel, the threading feature is built upon the existing
multi-processing infrastructure. However, unlike Linux which uses a more
lazy yet clever approach to implement threads as a specialized process,
Lunaix implements threading that perfectly reflects its orthodox definition.
Which requires a from-scratch and massive refactoring of the existing process
model. Doing this allows us to make things clearer and pursue a true
lightweightness of what threads are supposed to be.

Kernel thread and preemptive kernel
As a natural result of our implementation, we have implemented the concept
of kernel threads, which are subsidiaries of a special process (pid=0) that runs
under kernel mode. Treating the kernel as a dedicated process rather than a
process parasite, enables us to implement an advanced feature of a preemptive
kernel. Unlike in Linux, where the kernel is preemptive anywhere; Things were
different in Lunaix, where only functions called directly from the kernel thread can
be preemptive, which allows us to perform more fine-grand control. This reduces
the effort of refactoring and eases the writing of new kernel code, for which the
non-preemptive assumption can be kept.

Spawning and forking
This patch introduces a set of tools for performing remote virtual memory space
transaction, allow the kernel to inject data into another address space. And will
be used as infrastructure for kernel-level support on the `posix-spawn`
interface, which creates a process from scratch rather than fork from another,
allows us to skip duplicating the process's VM space and reduce overhead.

LunaDBG
LunaDBG has been refactored for modularization and arch-agnostic. New set of
commands are being added:

        mm: a sophisticated tool for examining page table mapping and performing
            physical memory profiling (detailed usage see up-coming documentation)
     sched: tools for examining the scheduler context, listing all threads and
            processes

--------------
All changes included in this patch:

* * Signal mechanism refactor

   The sigctx of proc_info is changed to a pointer reference as well as the
   sigact array is now in favour of storing references. Therefore we can keep
   the overall proc_info and sigctx size small thus to avoid internal fragmentation
   of allocating large cake piece.

   Some refactoring also done on signal related method/struct to improve
   overall readability

* Temporary removal of x87 state from context switching until a space-efficient
  workaround emerged

* Add check on kernel originated seg-fault and halt the kernel (for debugging).
  As by assumption kernel mapping will always present (at least for now, as
  page swapping and stagging is not implemented in Lunaix yet).

* Re-group the fork related functions to a dedicated fork.c file

* Fix a incorrect checking on privilege level of interrupt context when
  printing tracing

* * Make proc_mm as a pointer reference to further reduce the single allocation size
  as well as making things more flexible

* Remove the need of pid when allocating the physicall memory. Due the complexity and
  the dynamics in the ownership of physical page, there is no point to do such checking
  and tracking.

* Add some short-cut for accessing some commonly used proc_mm field, to avoid nasty
  chain of cascading '->' for sake of readbility.

* * Introducing struct thread to represent a light-weighted schedulable element.

  The `struct thread` abstract the execution context out of the process, while the
  latter now composed only descriptors to program resources (e.g., file, memory
  installed signal handlers). This made possible of duplicating concurrent
  execution flow while impose a rather less kernel overhead (e.g., cost to context
  switch, old-fashioned fork()-assisted concurrency).

  Such change to process model require vast amount of refactoring to almost every
  subsystem involving direct use of process. As well as introducing additional
  tools to create the initial process. This commit only contains a perliminary
  refactoring works, some parts require additional work-around is commented out and
  marked with `FIXME`

* Other refactoring and cleaning has been done to improve the semantics of certain
  pieces of code.

* * Process and thread spawning. Allow to reduce the system overhead
  introduced by invoking fork to create process. However, creating
  a process housed executable image is not implemented as it require
  remote injection of user stack for which is still under consideration

* Introducing kernel process and kernel threads. Prior to the threading
  patch, the dummy process is a terrible minick of kernel process
  and used as merely a fallback when no schdulable process can be found.
  This makes it rather useless and a waste of kernel object pool space.
  The introducing of thread and new scheduler deisgn promote Lunaix
  to a full functioning kernel thread, it's preemptiveness enable the
  opportunity to integrating advanced, periodical, event driven kernel
  task (such as memory region coalescing, lightweight irq handler)

* Some minor refactorings are also performed to make things more clean

* Update the virtual memory layout to reflect the current development

* * Fix the issue of transfer context being inject into wrong address
  as the page offset was some-how not considered

* Fix the refactoring and various compile time error

* Adjust the lunadbg to work with latest threading refactoring.

* Also fix the issue that lunadbg's llist iterator had made false
  assumption on the offset of embeded llist_header.

* Rename spawn_thread -> create_thread. And introduce spawn_kthread
  to spawn a kernel thread within kernel process.

* Fix the issue in vmm_lookupat that ignore the present bit when
  doing pte probing

* Leaves some holes for later investigations

* * Make threading feature works properly

* Fixed left-over issues as well as new issues found:

    1. incorrect checking on thread state in 'can_schedule', causing
       scheduler unable to select suitable thread even though there
       exists one

    2. double free struct v_file when destorying process. Which caused
       by a missing vfs_ref_file in elf32_openat

    3. destory_thread removed wrong thread from global thread list

    4. thread-local kernel and user thread don't get released when
       destorying thread

    5. lunad should spawn a new process for user space initd rather than
       kexec on current kernel process

    6. guarding the end of thread entry function with 'thread_exit'
       to prevent run-over into abyss.

    7. fix tracing.c prints wrong context entring-leaving order

    8. zero fill the first pde when duplicating the vm space to avoid
       garbage interfering the vmm

* * Allow each process to store their executable arguments

* Refactor the lunadbg toolset (done: process, thread and scheduler)

* * Fix can_schedule() should check against thread's state rather than process state

* Remove the hack of using ebp in 'syscall_hndlr', thus to prevent it for
  interferencing the stack-walker

* Find tune the output of tracer when incountering unknown symbol

* (LunaDBG) Add implementation for examing sigctx

* * Add related syscall to manipulate threads

* Factorise the access of frame pointer and return address to abi.h

* Shrink the default pre-thread user stack size to 256K, to account
  the shortage on 32-bit virtual address space.

* Add check to kernel preemptible function context

* Add different test cases to exercise various pthread_* api

* * (My Little Pthread Test) Fix the all sorts of issues found in current threading model implementation
  with a set of simple pthread tests.

* Add more sanity checks on tracing and pfault handler, to avoid them spamming the output stream when
  the failure is severe enough to cause infinite nesting (e.g., when vm mapping of kernel stack get
  messed up)

* Add guardian page at the end of thread-local kernel and user stack to detect stack overflow

* Remove an unwanted interrupt enablement in ps2kbd.c (which behaviour is undefined in the booting
  stage)

* Temporary fix issues with vmr dump py utils (need to adapt the new design sooner or later)

* Specify a cpu model for QEMU, which make things more detrerministic

* * Change the mmap flag for creating thread-local user stack to non-FIXED.
  As a previous experiment shows that during high concurrency situtaion,
  the calculation of ustack location for new thread will be affected and
  had risk of smashing existing thread's ustack causing undefined bevhaiour
  when return from kernel (as the stack address is implied from
  proc_info::threads_count) for which reason it should treated as
  hint to mem_map rather than a hard requirement.

* Re-implement the VMR allocation alogirthm that will takes the vicinity of
  the hinted address as high priority search area, rather than dumbly start
  from beginning.

* Remove the undesired pmm_free_page from __dup_kernel_stack. As we
  now skipped the first 4MiB when duplicating the page table. Thus the
  ref counters for these physical page are already 1 after fork. This
  has been identify the root cause of a randomly appearing segfault
  during memory intensive task such as test_pthread, as these falsely
  released physical page will get repurposed. However, this also lead
  to a question in Lunaix's memory utilisation, as the next-free strategy
  is unlikely to visit the previously allocated page when plenty of free
  space ahead. More efforts should be taken into investigating memory
  performance.

* Added more assertion and checks to enhance the robustness and ease
  the debugging experience.

* Adjust some output format and refactor the test_pthread code.

* * (lunadbg) `mm` command for probing page table and physical memory profiling

* Add missing syscall-table doc

* Add more test cases related to pthread

* * (LunaDBG) decouple the pte related operation as arch-dependent feature

* (LunaDBG) adjust the output format

* * (LunaDBG) Refactor VMR dump

* * Adjust the thread id generation to avoid duplication ratio

* Capped the thread limit per process

* (LunaDBG) Fix the issue with display of percentage in pmem profiling

No differences found

author	Lunaixsky <lunaixsky@qq.com>
	Mon, 5 Feb 2024 17:20:02 +0000 (17:20 +0000)
committer	GitHub <noreply@github.com>
	Mon, 5 Feb 2024 17:20:02 +0000 (17:20 +0000)