###################################################### Assignments for an xv6 OS Class: Kernel Thread Support ###################################################### .. sidebar:: Contents .. contents:: :depth: 2 :local: This is a modified version of an assignment that ran first in Fall 2015 at JHU. See http://gaming.jhu.edu/~phf/2015/fall/cs318/assignment-uthreads.shtml and the `Changelog`_ below. If you are interested in using this assignment in your course, please contact me for reference implementations and test cases! The first half of this assignment is :doc:`osassign-uthreads`. Overview ######## Oh yes, we're back! Now that you have a working ``uthr`` library, it's time for you to kick off the training wheels! By that, I mean, essentially, ``mv kernel kernel.provided`` and make things work again! As before, this is almost entirely taken from `Carnegie Mellon's Operating Systems class `_. *This assignment has been modified from its original version. It has been* `reformatted to fit this semester `_. You'll start with the xv6 from `nwf's git repository `_ at commit a7934c7cd236bab8e431e7c9cde93256dd40371b for this assignment. This commit includes no test cases, but does include some useful fixes to the kernel code. Any additional test cases will be distributed via the course Piazza page. System Calls ============ The kernel system calls you have to support are *as described* in the :doc:`previous assignment `. To quickly review: Scheduling ---------- * ``void yield(int tid)`` will invoke the scheduler again with a *hint* that it would be good for ``tid`` to be on core *soon* (ideally, *now*), giving up the remainder of this thread's timeslice. Note that if there are no other running threads, the kernel may return to this one without context switching. If ``tid`` is negative, the kernel will simply run the next ``RUNNABLE`` thread. Yielding to one's self is *not an error*, but *may* return immediately rather than actually entering the scheduler. * ``void desch(int *guard)`` will put the current thread to sleep (that is, suspend its execution) if ``*guard`` is nonzero. If the process ``exit()``-s or is subject to ``kill()`` while this thread remains so suspended, this thread will be destroyed, too. * ``int mkrun(int tid)`` should awaken a thread that is sleeping in ``desch()``. If there is no such thread, it will return a negative number; otherwise, it will return ``0``. ``mkrun(tid)`` will return failure if there is no thread identified by ``tid``, and will return success if ``tid`` exists and was asleep due to ``desch()``. For the purpose of this exercise, its return value is undefined in other cases (though obviously the kernel blob does *something*). It is not an error to call ``mkrun()`` on a thread that is not sleeping due to ``desch()``, but the return value is not specified (it could be viewed as success -- the target thread is, after all, now not blocked having ``desch()``-ed itself... but it could also be viewed as failure -- the target thread wasn't in ``desch()`` to be made runnable, now was it?). A thread's invocation of ``desch()`` is *atomic* with respect to any ``mkrun()`` targeting that thread. That is, the decision to go to deschedule happens *entirely before* or *entirely after* any decision to wake up; dually, the decision to wake up happens *entirely before* or *entirely after* any decision to deschedule. Threads ------- * ``int gettid(void)`` will return the current thread's identifier. * ``int tspawn(void *stktop, void (*f)(void *), void *a)`` starts a new thread running on the stack indicated by ``stktop`` running the function ``f`` with argument ``a``. If it is unable to do so, it will return a negative number; otherwise, it will return the thread identifier of the spawned thread (somewhat like ``fork()``). ``tspawn()`` requires two words *below* ``stktop``. * ``void texit(int *where, int what)`` ends the execution of the current thread after validating that ``where`` is a valid pointer and *atomically* committing ``what`` to ``where``. Miscellany ---------- * ``exit()`` and ``kill()`` will terminate (or schedule for prompt termination) *all* threads in a process. Thankfully, mechanism already exists in xv6 for signaling this. * ``wait()`` returns only when a child *process* has no more *threads* in it; that is, when *all threads* have exited (or been killed). * ``fork()`` and ``exec()`` are permitted to fail if there are more than one thread running in the current process. Memory Safety Problems ###################### One of the biggest challenges of a multi-threaded system is that the kernel and userland can now race each other. In stock ``xv6``, there's only one CPU accessing a given (non-kernel) page at a time, so the only locking that has to happen is between kernel functions. We punted on this in the ``shm`` assignment by telling you to make the kernel consider shared memory segments invalid userland pointers, which was sort of OK; maybe the kernel doesn't have a whole lot of business looking at your shared memory anyway, and if you want to ``read()`` or ``write()`` (for example) something in shared memory, you can use a **bounce buffer** that isn't in shared memory. Unfortunately, with our thread model, we can no longer punt: *all* memory in a process is shared between threads. There's just no way around it any more. Representative of various classes of challenge are "boring" syscalls like ``sleep()``, "straightforward" calls like ``sbrk()`` and ``close()``, "asynchronously memory-accessing" calls like ``write()``, and "memory-access-with-parsing" calls like ``open()``. Let's look in detail at some of the challenges before we talk about any solutions. Use-Site Validation =================== It is informative to begin by thinking about the following case, assuming that ``sbrk(0)`` returns a *page-aligned* value. +---------------------------+--------------------------+ | Thread (CPU) 1 | Thread (CPU) 2 | +===========================+==========================+ | Call ``ptr = sbrk(0)`` | | +---------------------------+--------------------------+ | Call ``write(0,ptr,1)`` | Call ``sbrk(-PGSIZE)`` | +---------------------------+--------------------------+ | Load ``sz = proc->sz`` | ``sz = proc->sz`` | +---------------------------+--------------------------+ | Check that ``ptr < sz`` | | +---------------------------+--------------------------+ | | ``growproc(-PGSIZE)`` | +---------------------------+--------------------------+ | | ``proc->sz = sz-PGSIZE`` | +---------------------------+--------------------------+ | Do file stuff... | Return to userland | +---------------------------+--------------------------+ | Read from checked ``ptr`` | | +---------------------------+--------------------------+ What happens? One answer (there are, sadly, others; keep reading) is that CPU 1 ``panic()``-s because it takes a page fault while in the kernel! Well that stinks! What happened? We have introduced, by adding threads to the system, a `Time-Of-Check to Time-of-Use ("TOCTOU") `_ window wherein our check (that ``ptr`` was below ``proc->sz``, which it *was*!) is no longer relevant by the time we *use* the data we checked. Any fix will require careful auditing of the entire xv6 syscall vocabulary to determine when ``proc->sz`` is checked vs. when these observations are actually depended upon. Thankfully, there are not many syscalls! File Descriptors ================ A similar race happens between two threads accessing the collection of open file descriptors: a ``read()`` could race with a ``close()``, for example. If we were not careful, ``read()`` could end up with a pointer to a closed file structure, and madness would ensue. One must also guard the management of ``proc->cwd``, but thankfully, that is easy. Buffer Content Validity ======================= In many cases, because it "knows" that there is no possibility of concurrent mutation, the xv6 kernel checks that a buffer is "real" (backed by memory) and then proceeds to engage (potentially) *multiple* operations of that buffer. (See, for example, the behavior of ``namex()``: the ``path`` parameter might be a validated pointer to user memory via, e.g., ``sys_unlink`` and ``nameiparent``.) This all works fine when pages are not accessible by other mutators, but we run into problems here, now that things can change out from under us! This is even worse than the earlier problems because the other things that might "interfere" are threads running *userland* code, rather than being in the kernel... so we're gonna have to get creative. Other CPUs' TLBs ================ There's another problem, though, I'm sad to say: ``growproc()`` knows how to shrink a process and knows how to ``kfree()`` the frame that it just extracted from a ``pte``, but... it's not safe to do that any more. In a multi-*processor* system, that other CPU is going to have that PTE cached in its TLB and so your freed memory is now exposed to userland and **OOPS**! In tabular form, +------------------------------------+-----------------------------------+ | Thread (CPU) 1 | Thread (CPU) 2 | +====================================+===================================+ | Call ``ptr = sbrk(0) - PGSIZE`` | | +------------------------------------+-----------------------------------+ | ``x = *ptr``; this pulls the | | | PTE backing ``ptr`` into the TLB | | +------------------------------------+-----------------------------------+ | | Call ``sbrk(-PGSIZE)`` | +------------------------------------+-----------------------------------+ | | ``growproc(-PGSIZE)`` changes | | | ``proc->pgdir`` to remove the PTE | | | backing ``ptr`` and ``kfree()``-s | | | that page | +------------------------------------+-----------------------------------+ | | ``growproc()`` calls | | | ``switchuvm()`` to flush the TLB | | | of **this** CPU! | +------------------------------------+-----------------------------------+ | Read and write above ``ptr`` to | | | snoop the kernel's internals! | | +------------------------------------+-----------------------------------+ In particular, this process would have free access to ``kalloc()``'s *free list*. which would enable it with relatively little effort to take over the kernel. (Details left as an exercise for the reader, but the basic idea is: you can, with high confidence, control what the ``kalloc()``-after-next will return, since the freshly freed page is quite likely to be at the head of the free list. If you can cause ``sbrk()`` to map a page of kernel code for you...) Isn't that **scary**?! Memory Safety Solutions ####################### What's to be done? Let's start with the most terrifying, TLB shootdown issues. Broadly speaking, there are two things we needed to get that problem: 1. the same address space (``pgdir``) mapped on multiple CPUs, and 2. a lack of synchronization between CPUs when the address space was updated, which would have ensured that there were no other references to the removed page given to ``kfree()``. We're going to require different things of different students: **Undergraduate students** are going to take a *large hammer* approach and eliminate the problem by preventing multiple CPUs from sharing a context at once. **Graduate students** will be required to live with multiple CPUs running in the same address space, and so will attack the problem by addressing the lack of synchronization. That is, only graduate kernels will actually offer the benefit of multiple cores to multi-threaded processes. (Though enthusiastic undergrads are invited, *having solved the problem using our stipulated approach first*, to submit a delta showing the second approach. We will grade the first and will look at the second for a little extra credit.) Undergraduates: Two-Tier Scheduling =================================== While real kernels schedule *threads* and let the *process* components come along for the ride, we're going to stick to a *historically real* approach of scheduling *processes* and *threads within processes*. That is, just like the current xv6 scheduler, your scheduler will pick a *process* to bring on core and prevent any other CPU from scheduling that process while it's claimed by this one. Having made that decision, the scheduler will *then* pick, in a round-robin fashion, a thread from the runnable threads within this process. (There should be no *running* threads at this point, right?) It is up to you to decide how often to switch between processes versus threads within a process; options range from "run every thread before moving to the next process" to "run only one thread before considering a different process". Note that your scheduler should not favor some subset of threads in a process, so even if it is switching processes before exhausting all threads within a process, every thread should still get a chance to run even assuming none of the other threads ever voluntarily sleep. Interestingly, this approach also extends naturally to handling our buffer validity problems: it suffices to *prevent intra-process inter-thread scheduling* while manipulating a buffer. This gives the buffer manipulation code the illusion that nothing has changed, because there will be no concurrent mutators in this address space! Isn't that cute? It is *OK* if you make certain other operations done by the kernel prevent intra-process inter-thread scheduling, for some period of time, even if they do not (and should not) prevent inter-process scheduling. Please try to keep the number of affected subsystems to a minimum. Part of keeping this to a minimum will involve allowing inter-thread (esp. intra-process) scheduling to occur even with, e.g., an ``sbrk()`` call in progress. (That is, we expect the kernel to be maximally "preemptive" -- it should allow a thread that is running *purely userland code* to be scheduled and run normally even while other threads (in the same process or not) are contending over various resources and going out to disk and... so on.) Two-tier scheduling does *not*, however, completely solve the TOCTOU and open files issues; see below. Graduates: CPU Synchronization Tricks ===================================== Alright, we're going to actually *solve* this problem, so that multi-threaded workloads actually get a benefit from having multiple cores available! One option -- one frequently employed in the real world, even -- is to use the LAPIC and things called Inter-Processor Interrupts to cause other processors to promptly evict entries from their TLB. *You are welcome to take this approach and ignore what follows, but please* **carefully document** *anything and everything you study to write your code*. However, this being ``xv6``, we're never really in all that much of a hurry, so it's OK if reducing ``proc->sz`` takes a while, especially if it keeps code simpler. Besides, how often does a multi-threaded application *reduce* its memory footprint. **One way** to do this is to borrow a technique from some really cool technology called `Read-Copy-Update Mutual Exclusion ("RCU") `_. The chain of reasoning goes like this: 1. Imagine we knew when all CPUs in the system had flushed their TLBs. Concretely, imagine a magic function ``wait_everyone_tlb_flush(void)`` that somehow knows just how long to wait until this happens. 2. If we had removed the PTE before that, but not freed the page, *after* ``wait_everyone_tlb_flush()`` returns, we would know that no CPU would have that entry cached in their TLBs, by construction. 3. When do processors running ``xv6`` flush their TLBs? Rarely, but: *at least every time* they return to the ``scheduler()`` function, which calls ``switchkvm()`` when it takes a process off-core. 4. How could we know when *every* CPU has gone through ``scheduler()``? What if the CPUs formed a "bucket-brigade ring", passing threads to the next one in line? 5. Ah, so a thread could put itself *into* the bucket-brigade ring at the "next" CPU, saying "my stop is my current CPU". (**Think carefully** about any atomicity requirements on "current" and "next"!) Then, around the ring it goes until the current CPU says "Oh, look, a thread for me!" and makes it run again. In tabular form, +----------------------+----------------------+----------------------+ | CPU 1 | CPU 2 | CPU 3 | +======================+======================+======================+ | Decreasing ``sz``, | | | | want to | | | | ``kfree(page)`` | | | +----------------------+----------------------+----------------------+ | Thread x puts itself | | | | into the ring at CPU | | | | 2 and goes off-core. | | | +----------------------+----------------------+----------------------+ | ``schedule()`` | | | | someone else | | | +----------------------+----------------------+----------------------+ | | Timer interrupt! | | | | ``trap()`` calls | | | | ``yield()`` gets us | | | | to ``schedule()`` | | +----------------------+----------------------+----------------------+ | | Pass x to CPU 3 | | +----------------------+----------------------+----------------------+ | | | ``schedule()`` | | | | pass x to CPU 1 | +----------------------+----------------------+----------------------+ | ``schedule()`` sees | | | | the thread come home | | | | and makes it | | | | runnable again | | | +----------------------+----------------------+----------------------+ After this point, the thread will come on core again and will be secure in the knowledge that all other CPUs have flushed their TLBs. So: if we had a function like ``waitscheds(...)`` and a coopratively modified ``scheduler()``, then we'd just have to 1. Remove the PTE from the process's page directory 2. wait for ``waitscheds(...)`` to return 3. call ``kfree()``. Isn't that awesome? **You probably want to write and test this RCU ring stand-alone**! P.S. There may be some funny locking involved! In particular, the thread must go to sleep when it puts itself on the bucket brigade, and so... P.S. Do we have to do this RCU game when we're cleaning up a page directory? Why or why not? P.S. The story told here is a per-freed-page story. Is there something better you can think of? P.S. In a real system, these kinds of bucket brigades often have a *callback* function (typically the common ``void f(void *)`` and ``void *a`` pattern) that's run on *each CPU* during (the equivalent of) ``schedule()``. The description above elided these callbacks as we do not *need* them (if we had them, we'd just be using ``return;``). You may, however, find that callbacks are useful for testing the bucket brigade or that they could be interesting for other reasons. Graduate Students: Buffer Safety ================================ Despite the tricks above, we are still faced with the problem that userspace memory could be mutated by multiple threads running on multiple CPUs. What should we do? There are **at least** two solutions I can think of: * One answer is to use **bounce-buffers**: that is, to make a *private copy* of a userland string and then use *that*, rather than the userland copy. The *copy* operation inherently potentially races with other threads, but thereafter all the kernel's attention is focused on the in-kernel copy. This will involve auditing and modification of ``sysfile.c`` but not too much else! * A *combination* of the RCU-esque trick outlined above and the undergrad one-thread-per-process approach could also be viable! If we could *temporarily* declare our intention to run only one thread per process, *and ensure that the other CPUs knew*... we'd be in good shape! This will likely involve the introduction and use of two new kernel primitive operations, maybe named something like ``gost()`` and ``gomt()`` which transition the current process between single-threaded and multi-threaded modes. These will probably need to use ``xchg()`` internally to ensure that CPU store buffers are flushed, as with ``release()``. ``gost()`` will, obviously, take some time. This approach is probably less ideal than the above, but involves maybe less machinery. Note that buffers that are treated *opaquely* by the kernel, such as file data for ``write()`` and the landing area for ``read()``, do not need this kind of careful handling. That is, userland can only hurt itself if it races threads writing to ``buf`` and ``write(fd, buf, n)``, right? Everyone: TOCTOU and ofiles =========================== OK, finally, the simplest problems. There are a few approaches available, but the most readily available options for the ``proc->sz`` problem, perhaps, are: * Narrow the TOCTOU window so that the check and use may be made atomic. While possibly the most *efficient* option, this required modification in some or all of ``console.c``, ``file.c``, ``fs.c``, and ``pipe.c``, among others. Notably, it required the introduction of so-called "safe-copy" primitives within the kernel. * Serialize all uses of and changes to ``proc->sz``. This is less efficient, since it implies a great deal more contention between threads, but it is by far more expedient. Unfortuately, the sole encapsulated primitive offered by ``xv6`` for serialization, ``spinlock``-s, is not sufficient, as a thread cannot sleep with a spinlock held. You will have to use ``spinlock``-s to build something better. (Graduate students should note that a ``gost()``-like approach here is *not acceptable*, since often the TOC and TOU are separated by blocking operations like disk access and we would dearly love to let some other thread onto the core!) For ``proc->ofiles``, we again have a few options: * Make the ``ofile`` array per-*thread* instead of per-*process*. While not ideal, it's at least one option to consider! (Maybe if we are serious about this option, we would introduce a way of passing files between threads.) We will **forbid** this approach this time around. * Take advantage of ``xv6``'s built-in file reference counting and have a small critical section around the process's ``ofile`` array which bumps the reference count then drops the lock and refers to the open file structure obtained from the ``ofile`` array directly, rather than subsequently through ``ofile``. Don't forget to drop the reference when done! **Graduate** submissions should aim for this and expect a few points loss if the solution falls short of this level of minimal serialization. * Serialize all calls within a process through the ``sysfile`` interfaces. As might be imagined, this is far simpler, if slower. Again, ``acquire()`` will be insufficient for the serialization at hand, as file operations often block. **Undergraduate** submissions will receive full credit for a correct solution of this flavor. * Serialize ``sysfile`` calls *and* all manipulations of ``proc->sz`` in the same way, as very often the two interact. This is slower still, since some pairs of operations, like ``sysclose()`` and ``sbrk()``, will now somewhat artificially serialize. ``proc->cwd`` can be simply managed by sprinking some spinlocks in the appropriate places; note that the only thing that needs to be guarded in ``sys_chdir`` is the *replacement* of ``proc->cwd`` with ``ip``. If one grabs the old value of ``proc->cwd`` prior to that replacement as ``oldip``, locks may be dropped before operating on ``oldip``. Threads ####### Errant Behavior =============== Upstream ``xv6`` kills a *process* when it misbehaves (e.g. pagefaults). Your new thread-aware kernel should kill only a *thread* when it pagefaults, not the whole process. Thread Identifiers ================== As before, *thread identifiers* and *process identifiers* are still in the same namespace, and the first thread in a process will have equal identifiers. Kernel Stacks ============= Each thread will require its own kernel stack. Unfortunately, ``xv6`` at present does not free a kstack until ``wait()``; that's sort of OK for processes, but it's not OK now that we have ``tspawn()`` and ``texit()`` and nothing like ``twait()``. You will have to design a mechanism for reclaiming kstacks *promptly*. An ideal mechanism will bound the number of stale kstacks in the system by some *constant* (even if I vary ``NTHR``). (0 would be really good, but, while possible, is perhaps tricky to achieve. 1 is good and has a cute solution. ``NCPU`` would be fine and might be arguably better than 1 on a real system, which xv6 is not exactly. ``NPROC`` would be OK but strange. ``NTHR`` may appear to be a constant, but it isn't.) Allocations =========== This is xv6 and we haven't asked you to write an allocator and we're not going to give you one. Your system should support ``NPROC`` (i.e. ``64``) processes and a *global pool* of ``NTHR = 256`` threads, which are to be allocated to processes on a first-come-first-served basis. You should be able to handle the case when one process has grabbed all 256 threads (though admittedly, it would take a modified init to do that). You must gracefully handle the "out of threads" case in both ``fork()`` and ``tspawn()`` (i.e. must not ``panic()`` the system) but note that graceful handling need not include attempting to get more! (This is just in line with ``fork()`` already handling the out-of-processes case!) The Assignment Itself ##################### Part 1: Study, Design, Log (15%) ================================ The usual request for a detailed ``README`` with your initial plans, your software's evolution, and all that good stuff. Part 2: xxvv66: xv6 with threads (70%) ====================================== Replace the shipped kernel blob we gave you with something that works for your thread library and *behold all that you have made*. Grade Breakdown --------------- In an effort to reduce the surprise and potential stress of being graded, this section's large impact will be broken up as follows: * Automated testing: coming up to a shell and passing ``usertest`` on both 1 and 2 CPU configurations will be worth 30% of this section (21% of the total grade). Note that ``usertest`` does not exercise any threading at all, and the upstream xv6 reliably passes it. You should be regression testing early and often! * Additional automated testing: passing all of ``318_ts1.c``, ``318_dmr.c``, and ``318_lots_thr.c``, on both 1 and 2 CPU configurations, will be worth 40% of this section (or 28% of the total grade). **Note that none of these tests use your thread library**! However, you should be using your thread library and your new knowledge to test parts of the kernel: probing the scheduler with spinlocks is a good way to exercise ``yield()``. Semaphores exercise ``desch()`` and ``mkrun()``. Thread tests exercise a whole lot of pieces! * Hidden test cases will count for another 10% of this section (7% total). **None of these tests will use your thread library either**! They might use *my* thread library, but that's not your problem. * Manual audit of the submission will count for the remaining 20% of this section (14% total). Part 3: Test Cases! (15%) ========================= This assignment is *so very complicated* that I am going to insist that you write your own test cases. The ones given to you are a good place to start, and of course you're welcome to share test cases with each other, but you must write **your own** tests, ideally that *meaningfully test different things*. Why yes, I might be crowd-sourcing a threaded xv6 test harness, why do you ask? Please **document** what each test is supposed to test in your ``README`` and/or at the top of each test file (maybe a quick summary in ``README`` and more verbose in the file itself?). Note that good tests come in a few flavors: * Regression-like *limited* probes that touch minimal subsets of functionality and attempt to probe very specific conditions. Ideally, these tests *reliably* hit their target. (Lots of things in ``usertests``, and things like ``318_ts1.c`` or ``318_dmr.c``.) * Stress tests which do something to exhaust, or nearly-exhaust, some resource(s) and then try to exercise some condition(s). (``usertests`` does this a few times) * Reliability tests, sometimes also called Continuous Hours of Operation ("CHO") tests: fling lots of work at the system and see what happens. Keep it up for as long as possible. Sometimes CHO tests try to recreate (sped up?) real conditions, or sometimes they're stress-test like, keeping the system low on resources, or sometimes they repeatedly take the system in and out of resource scarcity. * Random testing: given a palette of possible operations, do a random sequence of them! This includes things like "fuzzing" and "monkey testing". In many of these cases, the goal is to probe at the *failure cases* of the system. The *success* paths get lots of exercise. The *failure* paths, on the other hand... Please *additionally document* your contributions to the communal test pool -- linking to posts on Piazza is probably sufficient. Any tests posted to Piazza **must** have your name(s) in a comment near the top; any code derived from tests on Piazza **must leave the original authors' names there and add your own**. Do not feel obligated to use the TAPish test macros -- I certainly will not judge you for not liking my style -- though you may find them convenient, especially in combination with the ``runme`` driver. A shell-script could get you pretty good regression testing! (You are also welcome to develop your own TAPish test "framework"!) Incidentally, some collected ideas of things to test: * Wild pointer handling. Should kill a thread, not the process, and not the kernel! * ``kill()`` a thread ID that isn't a process ID. Right thing happen? * ``yield()``! Make sure that a few processes can cheerfully bounce back and forth "forever" ``yield()``-ing to each other. Can you yield to yourself? ( ``;)`` ) * Graduates: that RCU bucket brigade. You can even write tests for this before you have any thread support at all -- just push processes around the ring at first. * ``tspawn()`` until you can't any more, making sure that allocation failure is graceful! * ``tspawn()`` until you can't any more, *then* (have another process) ``fork()``! (That is: test what happens when there are free ``struct proc`` but no free ``struct thread``!) * TOCTOU vulnerabilities and TLB shootdown: try to crash your kernel by rapidly wiggling ``sbrk()`` and having threads race the kernel! * What *happens*, anyway, if you race a ``write()`` and a ``for``-loop paving over the same buffer (possibly repeatedly? backwards?). For best results, try with ``CPUS=2``. * Memory leaks! How big can you push sbrk() after the system has just started up? Bring it back down and go do a bunch of things and wait for them all to finish and clean up. Can you still push sbrk() that high? No? Something's gone missing! Find it! * "Continuous Hours of Operation" style tests: spawn a whole lot of tests at once and keep them going! Keep the kernel and all CPUs as busy as possible with as many different tasks as you can think of! Terrify the kernel with all of your userland-y demands! One more clever trick someone once taught me, by the way, is to *change the tick frequency*. You'll get a lot of different interleavings of threads if you switch at different speeds! Suggested Plan Of Attack ######################## 1. Mechanically separate ``struct proc`` into ``struct proc`` and ``struct thread``. Initially, allow each ``struct proc`` to have *exactly one* ``struct thread``, (and for each ``struct thread`` to belong to only one ``struct proc``, but that part isn't surprising). You will probably want to revise the "per-CPU" data by either adding a notion of "current thread" or by replacing the existing "proc" with the same. Note that if you wish to change the size of the per-CPU data, vm.c's ``seginit()`` function will need to know. Make the requisite changes to ``trap.c`` for errant behavior, even if just in skeletal form. (Maybe fatal traps become infinite loops, since we don't know how to kill just the current thread yet?) (Though while here, you can probably start to guess what's in ``texit()``.) Chase compilation failures through the whole system until you have something that reliably passes ``usertests``. Congratulations, you have just made the foundation of all subsequent efforts! 2. Change the data structure! Allow (but do not give a mechanism for creating!) more than one thread in a process. You will encounter ample food for thought in this process; make note of every question you encounter! **Graduate** students probably want to change what ``schedule()`` scans! **Undergrad** students can write an initial version of their two-tier scheduler now. Chase compilation failures again and then debug until ``usertests`` is happy. Either here or at the previous step, it will probably do you some good to make ``procdump()`` more than superficially aware of this new threading thing. Calling ``procdump()`` at choice moments can be a huge boon to debugging! (And hitting ``Ctrl-P`` on the keyboard is always satisfying!) 3. Sometime around now might be the right time to add ``desch`` and ``mkrun``. ``318_dmr.c`` should help you get going. 4. Think about how to guard ``sz`` (and ``ofile``). You can *add* those guards now and the system should continue to pass ``usertests``! (The guards will be guarding against something that cannot happen!) 5. **Graduate** students: think about that RCU bucket brigade. Get it working (maybe add some system calls and ``cprintf()``-s around the system to observe that things are behaving as you expect). (This is an excellent opportunity to demonstrate your test-writing skills as *part of* the test-case part of the assignment.) 6. **Graduate** students: wire up the RCU bucket brigade where it needs to go. The system should continue to reliably pass ``usertests``. 7. New kstack reclaim: design and move your kernel to use a new technique. As per usual, ``usertests`` should continue to pass! The ``kfree(p->kstack)`` from ``wait()`` should likely *be gone*, now. 8. The moment of truth! Add ``tspawn()`` support! ``318_ts1.c`` and ``318_lots_thr.c`` should help you get on your feet and believe that things are working! (It's OK not to have ``texit()`` yet!) 9. Finish the assignment. (That means ``texit()``, ``yield()``, ``gettid()`` and everything I might have left out). 10. Test, test, and test some more. (Rather like Portal!) While here, marvel at what you are doing: you have added *parallel threads* (*concurrent*, too, in the case of grad students) with *shared-by-default* memory to UNIX. This, historically, is a big f****** deal, as Vice President Joe Biden might say. Good luck! ########## Recall that one last bit of advice: if something really isn't working, get up and stretch. Go for a walk around the building. Relax. Do other work. Come back later and you'll see things you didn't see before. Changelog ######### * Unlike the CMU original version, this version uses ``tspawn()`` instead of a much more raw ``thread_fork()`` which spawns a new thread with an *identical register file*, requiring the use of assembler in the analog of ``uthr_create()``. * Updated to ``void texit(int *where, int what)`` in line with :doc:`osassign-uthreads`. * Added commentary about ``proc->cwd``.