########################################################### Assignments for an xv6 OS Class: Miniature pthreads Library ########################################################### .. sidebar:: Contents .. contents:: :depth: 2 :local: This is a modified version of an assignment that ran first in Fall 2015 at JHU. See http://gaming.jhu.edu/~phf/2015/fall/cs318/assignment-uthreads.shtml and the `Changelog`_ below. If you are interested in using this assignment in your course, please contact me for reference implementations and test cases! The second half of this assignment is :doc:`osassign-kthreads`. Overview ######## This assignment is *entirely in userspace*. We *provide* for you a compiled, modified xv6 kernel (without sources) for you to run against. Despite that, this is a potentially big assignment! This is almost entirely taken from `Carnegie Mellon's Operating Systems class `_. *This assignment has been modified from its original version. It has been* `reformatted to fit this semester `_. **Words of Wisdom**: First, please **start early**! Second, if you're working as a pair, please **work together** and don't split up the tasks, you'll get yourself into trouble that way! Third, please make **minimal changes** to xv6; if you make it hard for us to grade, we'll dock points. .. You'll start with the xv6 from **nwf**'s git repository at commit .. ``6823e64109832d247d19acd0a120c47e2760f4d2`` for this assignment. (That's a .. different commit than the one we used before, and it includes a lot of .. handout code and test code; you'll be sad if you start from the wrong one!) .. Pull from https://github.com/nwf/xv6-public . What Are Threads? ################# Threads generalize the notion of an xv6 process. An xv6 process is, at once, several different things: * A description of an address space (which virtual addresses map to which physical addresses?) * A description of file descriptors (which fds are open? What files are they pointing to?) * A description of a CPU, sometimes called a "register file": what are the contents of each (user-level) CPU register right now? A *thread* is the name we give to *just the last* of these things. Obviously we need the other things when we're running instructions on xv6, so threads are said to be *in a process*: that is, a process may have one or more threads. Every xv6 process you have ever created to date has had exactly one. Let's change that! Changes To The xv6 Kernel ######################### *These changes have been made for you, though you do not have the source of the modified kernel*. It will, however, be beneficial for you to think about how the kernel might offer this functionality! Thread Identifiers ------------------ Since threads are things managed by the kernel, we need some way of identifying a particular thread, just like we identify a particular open file or a particular process. For this assignment, we put *process* and *thread* identifiers in the same name-space: the *process* identifier as returned by ``fork()`` will *also* be the *thread* identifier of the first thread in that process. In particular, thread identifiers, like process identifiers, are *globally unique*. That is, every thread on the system will have a unique thread identifier, just as every process on the system has a unique process identifier. The first thread in a process will have *equal* thread and process identifiers; subsequent threads spawned in a process will not. Scheduling ---------- * ``void yield(int tid)`` will invoke the scheduler again with a *hint* that it would be good for ``tid`` to be on core *soon* (ideally, *now*), giving up the remainder of this thread's timeslice. Note that if there are no other running threads, the kernel may return to this one without context switching. If ``tid`` is negative, the kernel will simply run the next ``RUNNABLE`` thread. * ``void desch(int *guard)`` will put the current thread to sleep (that is, suspend its execution) if ``*guard`` is nonzero. If the process ``exit()``-s or is subject to ``kill()`` while this thread remains so suspended, this thread will be destroyed, too. * ``int mkrun(int tid)`` should awaken a thread that is sleeping in ``desch()``. If there is no such thread, it will return a negative number; otherwise, it will return ``0``. A thread's invocation of ``desch()`` is *atomic* with respect to any ``mkrun()`` targeting that thread. That is, the decision to go to deschedule happens *entirely before* or *entirely after* any decision to wake up; dually, the decision to wake up happens *entirely before* or *entirely after* any decision to deschedule. Creating A New Thread --------------------- We defined a new system call, ``int tspawn(void *stktop, void (*f)(void *), void *a)`` which, creatively enough, spawns a new thread running the function ``f`` with argument ``a`` with stack built (`downwards `_, as this is x86, not, `for example `_, some ARM systems) at ``stktop``. ``stktop`` must point to (or immediately above, at least) previously allocated memory; the kernel will need two words to build the initial call frame for ``f``, but there should, in general, be more space available for the use of the function ``f``. ``tspawn`` returns the *thread identifier* of the spawned thread, or a negative number if it fails. .. note:: As with process identifiers on xv6, thread identifiers are monotonically increasing and we do not concern ourselves with wrap-around. This is something of an embarrassing situation which can be blushingly acknowledged with "educational OS". The kernel we ship you supports ``NTHR = 256`` threads, distributed in any way between the up-to-``NPROC`` (i.e. 64) processes. Your submission should, *ideally*, not depend upon either of these numbers (and especially not as *numbers*). Thread Exits ------------ The new system call, ``void texit(int *where, int what)``, is sort of like ``exit()`` except that it simply stops the execution of the *current thread* after writing ``what`` to ``where`` (atomically, using ``xchg``). If (and only if) there are no other threads in the process, then ``texit()`` will behave like ``exit()`` does in xv6: it will result in the process being torn down and its parent *process* being notified. If there are other threads, then only the invoking thread will be terminated. Put another way, ``wait()`` will only return a process identifier when *all* threads in that process have exited. ``wait()`` will never return a thread identifier that isn't also a process identifier! (This should make sense if you think about about ``fork()`` and ``wait()`` are supposed to work together.) Note that ``texit()`` is permitted to *return* if ``where`` is not a valid pointer. This is the *only* case in which ``texit()`` may return. If, on the other hand, *any* thread in a process calls ``exit()``, *all* threads in that process will be terminated (or scheduled for termination prior to their next return to userland). That is, ``texit()`` is the polite "I'm done, may I be excused." while ``exit()`` is the table-flipping "We're done here!" ((╯°□°)╯︵ ┻━┻). Similarly, when a *process identifier* is ``kill()``-ed, *all* threads in that process will be terminated (or scheduled for termination). It is an error to call ``kill()`` with a *thread identifier* that is not also a process identifier. Miscellany ---------- * The new system call ``int gettid(void)`` will return the invoking thread's identifier. The existing ``int getpid(void)`` will return the process identifier (i.e. the thread identifier of the first thread that was in this process, *even if that thread has subsequently exited*) whenever any thread therein asks. * File descriptors are shared by all threads within a process, so one could imagine building a multi-threaded application that worked with files. * ``fork()`` will fail if there are multiple threads in the invoking process, as it is not clear what should happen otherwise. (For the very curious, POSIX introduced an ``atfork()`` mechanism to help with these issues.) * ``exec()`` will fail if there are multiple threads in the invoking process. While it is clear what to do here, I do not imagine we will need the functionality and so would rather not implement it. ;) Part 1: Study, Design, Log (20%) ################################ There's not a whole lot of material to study, but do *carefully* review the diffs we have made to the user-visible xv6 header files. You can do this however you want, but ``git log -p`` may be of utility. I would suggest that you read this *entire assignment* at least twice before writing *any part of it*. Please keep a detailed log in ``README``, including your initial plans and what actually happened. :) Part 2: usem: Better Userspace Synchronization ############################################## Port Spinlocks -------------- Based on the in-kernel spinlocks, please write: * ``struct uspin { ... }``. Up to you what goes in the ``...``. This lives in ``uthrdefs.h``. * ``int uspin_init(struct uspin *sl);`` should get a spinlock ready (in its unlocked configuration, of course). Should getting one ready prove impossible, this function should return a negative number; otherwise, it should return ``0``. * ``void uspin_destroy(struct uspin *sl);`` should free any resources required by ``uspin_init``. You *do not need to handle* the case when there are threads making use of this spinlock and someone calls ``uspin_destroy``. By that, I mean, the onus is on the user of the API to make sure that doesn't happen. * ``void uspin_acquire(struct uspin *sl);`` will acquire the lock. That is, upon return, it will have arranged to prevent the return of any other thread's ``uspin_acquire(sl)`` calls until *any* thread calls... * ``void uspin_release(struct uspin *sl);`` drops the lock. After this call, another ``uspin_acquire(sl)`` can complete. You *do not need to handle* the case when a thread calls ``uspin_release()`` on an already-``released`` spinlock. By that, I mean, the onus is on the user of the API to make sure that doesn't happen. Calling ``uspin_acquire(sl)`` on an already acquired spinlock ``sl`` should never return unless some other thread calls ``uspin_release(sl)``. *Note that it may be a different thread that releases*. ``uspin_acquire(sl)`` must not use ``while(condition){;}`` to burn CPU time, but should instead use ``yield();``. You'll be happier, trust me, and I'll be happier too. All of these functions are given in skeletal form in ``uspin.c``. Semaphores ---------- In addition to spinlocks, we'll define a thing called a "counting semaphore". A counting semaphore is, essentially, a counter that refuses to go negative. If an attempt to decrement it would make it negative, that attempt will *wait* for the number to be big enough so that the decrement yields a non-negative value. Another way of thinking about this: there are *N* instances of some resource available; the semaphore's job is to make sure that there are only ever at most *N* instances claimed for use. As before, you will need to fill out a structure definition: * ``struct usem { ... }``. Your job to fill in the ``...``; whatever needs to go in there should go in there. This lives in ``uthrdefs.h``. The steady-state calls are rather like ``uspin``: * ``void usem_acquire(struct usem *sem);`` decrements the counter, if possible, or *blocks* until such time as the decrement is possible. Blocking is to be done by ``desch()`` rather than ``yield()``, ``sleep()``, or any other mechanism, for full credit. * ``void usem_release(struct usem *sem);`` increments the counter and notifies at least one sleeping thread of this condition. The constructor and destructor functions are: * ``int usem_init(struct usem *sem, unsigned int initial_count);`` should initialize a semaphore into a state so that ``initial_count``-many ``usem_acquire(sem)`` calls would succeed without waiting and the next would block. ``usem_init()`` should reject any initial count that is greater than ``INT_MAX`` (i.e. ``2147483647``, as defined in ``limits.h``). If getting a semaphore ready fails, this call should return a negative number; otherwise, it should return ``0``. * ``void usem_destroy(struct usem *sem);`` should destroy the semaphore (that is, release whatever resources were required for its construction). If there are threads sleeping on the semaphore, their fate is up to you, but be prepared to defend your decision. (Extreme policies are fair game!) The implementation need not be prepared to deal with the internal counter going above ``INT_MAX``; that is, it is, again, on the user to avoid that. *Note that we anticipate semaphores using spinlocks internally!* **STOP AND CONSIDER**: The use of ``desch()`` poses an interesting challenge: in ``usem_acquire()``, one may need to *atomically unlock and deschedule*. To be concrete: one may be holding a lock to guard the counter, observe that the counter is in a state where ``usem_acquire()`` must ``desch()``, and need to drop the lock before ``desch()``-ing (since holding the lock while asleep is unlikely to work out well!). But! After dropping the lock but before ``usem_acquire()`` calls ``desch()``, another thread could ``usem_release()``, invalidating the observation made of the counter! In other words, "Why *does* ``desch()`` take an argument, anyway?" In tabular form, the case to consider is something like +--------------------------+--------------------------+ | Thread 1 | Thread 2 | +==========================+==========================+ | Enter ``usem_acquire()`` | | +--------------------------+--------------------------+ | Lock internals | | +--------------------------+--------------------------+ | Observe counter too low | | +--------------------------+--------------------------+ | Drop lock | | +--------------------------+--------------------------+ | | Enter ``usem_release()`` | +--------------------------+--------------------------+ | | Lock, increment, unlock | +--------------------------+--------------------------+ | | Leave ``usem_release()`` | +--------------------------+--------------------------+ | ``desch`` anyway? ``:(`` | | +--------------------------+--------------------------+ A full-credit implementation of ``usem`` will, in addition to being *correct* as per the above API, be *fair* in a **FIFO ordering** sense: * If two threads, ``T1`` and ``T2``, call ``usem_acquire()`` on a ``usem`` whose counter is currently ``0``, with ``T1`` acquiring before ``T2``, then the next ``usem_release()`` will wake ``T1`` and not ``T2``. That is, the intended behavior, assuming the semaphore is initialized to ``0``, is: +------------------------------+------------------------------+--------------------------+ | Thread 1 | Thread 2 | Thread 3 | +==============================+==============================+==========================+ | ``usem_acquire()`` ``desch`` | | | +------------------------------+------------------------------+--------------------------+ | | ``usem_acquire()`` ``desch`` | | +------------------------------+------------------------------+--------------------------+ | | | ``usem_release()`` | +------------------------------+------------------------------+--------------------------+ | ``usem_acquire()`` finishes | | | +------------------------------+------------------------------+--------------------------+ * Moreover, in the above scenario, another thread calling ``usem_acquire()`` on this spinlock, even if after the ``usem_release()`` call indicated, will only return *after* ``T2`` returns. That is, this outcome, again assuming that the semaphore is initialized to ``0``, is *to be avoided*: +------------------------------+------------------------------+--------------------------+ | Thread 1 | Thread 2 | Thread 3 | +==============================+==============================+==========================+ | ``usem_acquire()`` ``desch`` | | | +------------------------------+------------------------------+--------------------------+ | | | ``usem_release()`` | +------------------------------+------------------------------+--------------------------+ | | Call ``usem_acquire()`` | | +------------------------------+------------------------------+--------------------------+ | | ``usem_acquire()`` returns | | +------------------------------+------------------------------+--------------------------+ All of these functions are given in skeletal form in ``usem.c``. Part 3: uthr: Thread Library Functions ###################################### ``uthr`` provides functionality akin to (a small subset of) the `POSIX threads `_ library functions ``pthread_create``, ``pthread_exit``, and ``pthread_join`` on a real UNIX system. It defines the following functions: * ``int uthr_init(unsigned int stack_size);`` initializes the thread library and sets the stack size of all created threads to ``stack_size`` bytes. *Unfortunately*, despite the symmetry of threads in general, the library as we are designing it is unable to ensure that the main thread (i.e. the caller of ``thr_init``) will have a stack of ``stack_size`` bytes, as its stack comes from the xv6 kernel. * ``void *uthr_malloc(int size);`` is a *thread-safe* wrapper around the existing ``umalloc`` ``malloc()`` function. * ``void uthr_free(void *what);`` is a *thread-safe* wrapper around ``free()``, by analogy. Because of the way ``umalloc`` is written, at most one thread per process can safely be manipulating its data structures; it is the job of these wrapper functions to ensure that this holds! * ``int uthr_create(void (*f)(void *), void *arg);`` creates a new thread, with its own stack, running the code ``f(arg)``. Returns to the parent the *thread identifier* of this thread, or a negative number on error (in which case, no thread was created). * **Undergraduates**: it is OK if you wish to require ``f`` to call ``uthr_exit(...)``, just like xv6 requires our program to call ``exit``. * **Graduates**: your submission must interpret a return from ``f`` as ``uthr_exit(0)`` (as well as handling any ``uthr_exit(...)`` calls that it may make!). * ``void uthr_exit(void *baton);`` ends the execution of the current thread, arranging for the ``baton`` to be passed to whichever thread calls... * ``int uthr_join(int tid, void **hand);``, which waits for the thread identified by ``tid`` to call ``uthr_exit(baton)`` and then arranges for ``baton`` to be placed in ``*hand``. You may, at your discretion, drop the ``baton`` if ``hand`` is ``0``. If the thread identified by ``tid`` has already been ``uthr_join()``-ed (regardless of whether or not it has exited and had its ``baton`` collected), ``uthr_join()`` returns a negative number and does not mutate ``*hand``. The effect of ``uthr_join(gettid(), ...)`` is *either* to hang forever or to return an error (negative result). The latter is more friendly to the API user, to be sure. .. **318** students *may* use ``uspin_acquire()`` or similar mechanisms (i.e. .. ``yield()``-based busy-waiting) to synchronize and wait for the targeted .. thread to ``thr_exit()``. **418** students **MUST** use a mechanism, .. *such as* ``desch()`` or ``usem_acquire()``, that causes the waiting .. thread to not be ``RUNNABLE`` while it is waiting. ``uthr_create()`` and ``uthr_exit()`` will need to use the ``tspawn`` and ``texit`` system calls introduced above. Note that there is no need for kernel-side support for ``uthr_join()``! Isn't that interesting... ``uthr_exit()`` and ``uthr_join()`` must not suffer from "the terminal irony" whereby a failure to allocate memory could imply that a thread could not exit or make progress through ``uthr_join()`` (and thereby free up memory). Because it may be a long while between ``uthr_join()`` and ``uthr_exit()``, ``uthr_join()`` *must not poll* (*spin until*) a thread exits. The kernel offers all the functionality required to avoid such inefficiencies. Moreover, if the target of ``uthr_join()`` has *already* ``uthr_exit()``-ed, ``uthr_join()`` must return *promptly*. The main thread is permitted to call ``uthr_exit()`` and must be ``uthr_join()``-able. That is, once the main thread has called ``uthr_init()``, it is *just another thread* as far as the library is concerned. Note that this deviates from some platforms' interpretation of threading, whereby the main thread shutting down is grounds to terminate the entire process. All of these functions are given in skeletal form in ``uthr.c``. *It is expected that the userland thread library will depend on the userland malloc() functionality; there is no need to write your own allocator for any of this*! (Though of course even here invasive data structures are possibly worth considering!) **BIG HINT**: You may wish to carefully consider the interaction of ``uthr``'s internal use of ``malloc`` and ``free``, if any, and the exported ``uthr_malloc`` and ``uthr_free`` wrappers! Thread Exits And Stacks ----------------------- Note that there is a tricky difficulty with ``free()``-ing the stack associated with a thread that is ``uthr_exit()``-ing. There is a reason that ``texit()`` takes the parameters it does. As a **big** hint, you may find yourself wishing for a ``uspin_release_and_texit()`` or even ``usem_release_and_texit()`` operation. Feel free to write it/them! Suggested Plan Of Attack ######################## As I said at the beginning, this assignment has a lot of pieces. Here's how I might (and mostly did) approach this problem: 1. User-level locking primitives (``uspin`` or ``usem`` API) can be designed and written before anything else, given the kernel we give out. You will probably want to look at ``318_usem1.c`` and ``2`` and ``3`` too; look at ``318_dmr.c`` for an example of using ``desch`` and ``mkrun``. 2. Review ``tspawn`` and ``texit`` calls. Look at ``318_ts1.c`` for an example of their usage. 3. Write a minimal ``uthr_create`` around ``tspawn``. ``318_tc1.c`` does not depend on any other ``uthr`` function, and can help make sure the basics are in a good place. Modifying ``318_usem3.c`` to use ``uthr_create`` will also give you additional test cases (though the continued use of ``texit`` rather than ``uthr_exit`` will make them somewhat rude, but that's OK!). 4. Now implement ``uthr_exit()`` and ``uthr_join()``. ``318_tcj1.c`` and ``318_tcj2.c`` depend on the entirety of the user thread library and might be termed "basic" and "advanced" handling tests. 5. ``318_main_te.c`` and ``318_join_main.c`` test the special case of the main (i.e. initial) thread exiting and being joined upon. 6. Write your own tests! Debug! Go back and fix things! 7. When the deadline hits, or when you decide you're happy with how things are, make your submission and go enjoy the outside. In case you're interested in line counts of my solution (just to estimate the workload, perhaps); all sloc counts are generated using David A. Wheeler's ``SLOCCount``: * a little library of helper functions I wrote is 33 sloc long (61 literal newlines) (I'll leave it to your imagination what might be in there.); * ``uthrdefs.h`` is 9 (13); * ``uspin.c`` is 19 (26); * ``usem.c`` is 49 (64); * and ``uthr.c`` is 188 (240). By comparison, all the tests given add up to 401 sloc (546)! README, Scoring, ... #################### As usual, please include a ``README`` file with discussion of your submission and its design, including rationale for design decisions and things you struggled with and all that other good stuff. I would appreciate, though you do not need to include, opinions of the assignment; I promise not to dock points if you tell me the assignment stinks. ;) The ``usem`` locking primitives will be worth about 20% of the grade, about evenly divided between functionality when subjected to tests and manual review. The ``uthr`` library will be roughly 60%, again about evenly divided. The remaining 20% is, of course, for the README. Good luck! ########## One last bit of advice: if something really isn't working, get up and stretch. Go for a walk around the building. Relax. Do other work. Come back later and you'll see things you didn't see before. Changelog ######### * Unlike the CMU original version, this version uses ``tspawn()`` instead of a much more raw ``thread_fork()`` which spawns a new thread with an *identical register file*, requiring the use of assembler in the analog of ``uthr_create()``. * Updated ``texit(void)`` to ``texit(int *where, int what)`` to indicate positive possession of a thread by the kernel. This alleviates the possible-but-tricky need for userland to write things like ``uspin_release_and_texit()`` in assembler. With the new API, ``uspin_release_and_texit()`` can be entirely in C. Given that this is a 1-week assignment, that seems entirely reasonable. * TODO: When and if mainline xv6 takes patches to fix the process ID wraparound issue, this assignment can be revised to address the thread ID wraparound. The proper fix, I believe, is to have threads linger until joined, only ``texit()``-ing once that happens. Relatedly, there is now some new prose about tids being global. For The Curious --------------- There are, of course, other threading xv6 assignments than this one. For examples, * http://pages.cs.wisc.edu/~remzi/Classes/537/Spring2011/Projects/p6.html * https://www.cs.uic.edu/bin/view/CS385/Homework6 * http://moss.cs.iit.edu/cs450/assign02-xv6-threading.html