###########################################################
Assignments for an xv6 OS Class: Miniature pthreads Library
###########################################################

.. sidebar:: Contents

   .. contents::
      :depth: 2
      :local:


This is a modified version of an assignment that ran first in Fall 2015 at
JHU.  See
http://gaming.jhu.edu/~phf/2015/fall/cs318/assignment-uthreads.shtml and the
`Changelog`_ below.  If you are interested in using this assignment in your
course, please contact me for reference implementations and test cases!

The second half of this assignment is :doc:`osassign-kthreads`.

Overview
########

This assignment is *entirely in userspace*.  We *provide* for you a
compiled, modified xv6 kernel (without sources) for you to run against.
Despite that, this is a potentially big assignment!

This is almost entirely taken from `Carnegie Mellon's Operating Systems
class <http://www.cs.cmu.edu/~410>`_.  *This assignment has been modified
from its original version.  It has been* `reformatted to fit this semester
<https://en.wikipedia.org/wiki/Film_modification>`_.

**Words of Wisdom**: First, please **start early**! Second, if you're
working as a pair, please **work together** and don't split up the tasks,
you'll get yourself into trouble that way! Third, please make **minimal
changes** to xv6; if you make it hard for us to grade, we'll dock points.

.. You'll start with the xv6 from **nwf**'s git repository at commit
.. ``6823e64109832d247d19acd0a120c47e2760f4d2`` for this assignment. (That's a
.. different commit than the one we used before, and it includes a lot of
.. handout code and test code; you'll be sad if you start from the wrong one!)
.. Pull from https://github.com/nwf/xv6-public .

What Are Threads?
#################

Threads generalize the notion of an xv6 process.  An xv6 process is, at
once, several different things:

* A description of an address space (which virtual addresses map to which
  physical addresses?)

* A description of file descriptors (which fds are open?  What files are
  they pointing to?)

* A description of a CPU, sometimes called a "register file": what are
  the contents of each (user-level) CPU register right now?

A *thread* is the name we give to *just the last* of these things.
Obviously we need the other things when we're running instructions on xv6,
so threads are said to be *in a process*: that is, a process may have one or
more threads.  Every xv6 process you have ever created to date has had
exactly one.  Let's change that!


Changes To The xv6 Kernel
#########################

*These changes have been made for you, though you do not have the source of
the modified kernel*.  It will, however, be beneficial for you to think
about how the kernel might offer this functionality!

Thread Identifiers
------------------

Since threads are things managed by the kernel, we need some way of
identifying a particular thread, just like we identify a particular open
file or a particular process.

For this assignment, we put *process* and *thread* identifiers in the same
name-space: the *process* identifier as returned by ``fork()`` will *also*
be the *thread* identifier of the first thread in that process.

In particular, thread identifiers, like process identifiers, are *globally
unique*.  That is, every thread on the system will have a unique thread
identifier, just as every process on the system has a unique process
identifier.  The first thread in a process will have *equal* thread and
process identifiers; subsequent threads spawned in a process will not.

Scheduling
----------

* ``void yield(int tid)`` will invoke the scheduler again with a *hint*
  that it would be good for ``tid`` to be on core *soon* (ideally, *now*),
  giving up the remainder of this thread's timeslice.  Note that if there
  are no other running threads, the kernel may return to this one without
  context switching.  If ``tid`` is negative, the kernel will simply run the
  next ``RUNNABLE`` thread.

* ``void desch(int *guard)`` will put the current thread to sleep
  (that is, suspend its execution) if ``*guard`` is nonzero.  If the process
  ``exit()``-s or is subject to ``kill()`` while this thread remains so
  suspended, this thread will be destroyed, too.

* ``int mkrun(int tid)`` should awaken a thread that is sleeping in
  ``desch()``.  If there is no such thread, it will return a negative
  number; otherwise, it will return ``0``.

  A thread's invocation of ``desch()`` is *atomic* with respect to any
  ``mkrun()`` targeting that thread.  That is, the decision to go to
  deschedule happens *entirely before* or *entirely after* any decision to
  wake up; dually, the decision to wake up happens *entirely before* or
  *entirely after* any decision to deschedule.

Creating A New Thread
---------------------

We defined a new system call, ``int tspawn(void *stktop, void (*f)(void *),
void *a)`` which, creatively enough, spawns a new thread running the
function ``f`` with argument ``a`` with stack built (`downwards
<http://www.stackgrowsdown.com/>`_, as this is x86, not, `for example
<http://stackgrowsup.com/>`_, some ARM systems) at ``stktop``.  ``stktop``
must point to (or immediately above, at least) previously allocated memory;
the kernel will need two words to build the initial call frame for ``f``,
but there should, in general, be more space available for the use of the
function ``f``.  ``tspawn`` returns the *thread identifier* of the spawned
thread, or a negative number if it fails.

.. note:: As with process identifiers on xv6, thread identifiers are
   monotonically increasing and we do not concern ourselves with
   wrap-around.  This is something of an embarrassing situation which
   can be blushingly acknowledged with "educational OS".

The kernel we ship you supports ``NTHR = 256`` threads, distributed in any
way between the up-to-``NPROC`` (i.e. 64) processes.  Your submission
should, *ideally*, not depend upon either of these numbers (and especially
not as *numbers*).

Thread Exits
------------

The new system call, ``void texit(int *where, int what)``, is sort of like
``exit()`` except that it simply stops the execution of the *current thread*
after writing ``what`` to ``where`` (with release atomic semantics).  If (and
only if) there are no other threads in the process, then ``texit()`` will
behave like ``exit()`` does in xv6: it will result in the process being torn
down and its parent *process* being notified.  If there are other threads,
then only the invoking thread will be terminated.  Put another way,
``wait()`` will only return a process identifier when *all* threads in that
process have exited.  ``wait()`` will never return a thread identifier that
isn't also a process identifier!  (This should make sense if you think about
about ``fork()`` and ``wait()`` are supposed to work together.)

Note that ``texit()`` is permitted to *return* if ``where`` is not a valid
pointer.  This is the *only* case in which ``texit()`` may return.

If, on the other hand, *any* thread in a process calls ``exit()``, *all*
threads in that process will be terminated (or scheduled for termination
prior to their next return to userland).  That is, ``texit()`` is the polite
"I'm done, may I be excused." while ``exit()`` is the table-flipping "We're
done here!" ((╯°□°)╯︵ ┻━┻).

Similarly, when a *process identifier* is ``kill()``-ed, *all* threads in
that process will be terminated (or scheduled for termination).  It is an
error to call ``kill()`` with a *thread identifier* that is not also a
process identifier.

Miscellany
----------

* The new system call ``int gettid(void)`` will return the invoking thread's
  identifier.  The existing ``int getpid(void)`` will return the process
  identifier (i.e. the thread identifier of the first thread that was in
  this process, *even if that thread has subsequently exited*) whenever any
  thread therein asks.

* File descriptors are shared by all threads within a process, so one could
  imagine building a multi-threaded application that worked with files.

* ``fork()`` will fail if there are multiple threads in the invoking
  process, as it is not clear what should happen otherwise.  (For the very
  curious, POSIX introduced an ``atfork()`` mechanism to help with these
  issues.)

* ``exec()`` will fail if there are multiple threads in the invoking
  process.  While it is clear what to do here, I do not imagine we will
  need the functionality and so would rather not implement it. ;)

Part 1: Study, Design, Log (20%)
################################

There's not a whole lot of material to study, but do *carefully* review the
diffs we have made to the user-visible xv6 header files.  You can do this
however you want, but ``git log -p`` may be of utility.

I would suggest that you read this *entire assignment* at least twice before
writing *any part of it*.

Please keep a detailed log in ``README``, including your initial plans and
what actually happened. :)

Part 2: usem: Better Userspace Synchronization
##############################################

Port Spinlocks
--------------

Based on the in-kernel spinlocks, please write:

* ``struct uspin { ... }``.  Up to you what goes in the ``...``.  This lives
  in ``uthrdefs.h``.

* ``int uspin_init(struct uspin *sl);`` should get a spinlock ready
  (in its unlocked configuration, of course).  Should getting one ready
  prove impossible, this function should return a negative number;
  otherwise, it should return ``0``.

* ``void uspin_destroy(struct uspin *sl);`` should free any resources
  required by ``uspin_init``.  You *do not need to handle* the case
  when there are threads making use of this spinlock and someone calls
  ``uspin_destroy``.  By that, I mean, the onus is on the user of
  the API to make sure that doesn't happen.

* ``void uspin_acquire(struct uspin *sl);`` will acquire the lock.  That is,
  upon return, it will have arranged to prevent the return of any other
  thread's ``uspin_acquire(sl)`` calls until *any* thread calls...

* ``void uspin_release(struct uspin *sl);`` drops the lock.  After this
  call, another ``uspin_acquire(sl)`` can complete.  You *do not need to
  handle* the case when a thread calls ``uspin_release()`` on an
  already-``released`` spinlock.  By that, I mean, the onus is on the user of
  the API to make sure that doesn't happen.

Calling ``uspin_acquire(sl)`` on an already acquired spinlock ``sl``
should never return unless some other thread calls ``uspin_release(sl)``.
*Note that it may be a different thread that releases*.

``uspin_acquire(sl)`` must not use ``while(condition){;}`` to burn CPU time,
but should instead use ``yield();``.  You'll be happier, trust me, and I'll
be happier too.

All of these functions are given in skeletal form in ``uspin.c``.

Semaphores
----------

In addition to spinlocks, we'll define a thing called a "counting
semaphore".  A counting semaphore is, essentially, a counter that refuses to
go negative.  If an attempt to decrement it would make it negative, that
attempt will *wait* for the number to be big enough so that the decrement
yields a non-negative value.  Another way of thinking about this: there are
*N* instances of some resource available; the semaphore's job is to make
sure that there are only ever at most *N* instances claimed for use.

As before, you will need to fill out a structure definition:

* ``struct usem { ... }``.  Your job to fill in the ``...``; whatever needs
  to go in there should go in there.  This lives in ``uthrdefs.h``.

The steady-state calls are rather like ``uspin``:

* ``void usem_acquire(struct usem *sem);`` decrements the counter, if possible,
  or *blocks* until such time as the decrement is possible.  Blocking is to
  be done by ``desch()`` rather than ``yield()``, ``sleep()``, or any other
  mechanism, for full credit.

* ``void usem_release(struct usem *sem);`` increments the counter and notifies
  at least one sleeping thread of this condition.

The constructor and destructor functions are:

* ``int usem_init(struct usem *sem, unsigned int initial_count);`` should
  initialize a semaphore into a state so that ``initial_count``-many
  ``usem_acquire(sem)`` calls would succeed without waiting and the
  next would block.  ``usem_init()`` should reject any initial count that is
  greater than ``INT_MAX`` (i.e. ``2147483647``, as defined in ``limits.h``).
  If getting a semaphore ready fails, this call should return a negative
  number; otherwise, it should return ``0``.

* ``void usem_destroy(struct usem *sem);`` should destroy the semaphore (that is,
  release whatever resources were required for its construction).  If there
  are threads sleeping on the semaphore, their fate is up to you, but be
  prepared to defend your decision.  (Extreme policies are fair game!)

The implementation need not be prepared to deal with the internal counter
going above ``INT_MAX``; that is, it is, again, on the user to avoid that.

*Note that we anticipate semaphores using spinlocks internally!*

**STOP AND CONSIDER**: The use of ``desch()`` poses an interesting
challenge: in ``usem_acquire()``, one may need to *atomically unlock and
deschedule*.  To be concrete: one may be holding a lock to guard the
counter, observe that the counter is in a state where ``usem_acquire()``
must ``desch()``, and need to drop the lock before ``desch()``-ing (since
holding the lock while asleep is unlikely to work out well!).  But!  After
dropping the lock but before ``usem_acquire()`` calls ``desch()``, another
thread could ``usem_release()``, invalidating the observation made of the
counter!  In other words, "Why *does* ``desch()`` take an argument, anyway?"
In tabular form, the case to consider is something like

+--------------------------+--------------------------+
| Thread 1                 | Thread 2                 |
+==========================+==========================+
| Enter ``usem_acquire()`` |                          |
+--------------------------+--------------------------+
| Lock internals           |                          |
+--------------------------+--------------------------+
| Observe counter too low  |                          |
+--------------------------+--------------------------+
| Drop lock                |                          |
+--------------------------+--------------------------+
|                          | Enter ``usem_release()`` |
+--------------------------+--------------------------+
|                          | Lock, increment, unlock  |
+--------------------------+--------------------------+
|                          | Leave ``usem_release()`` |
+--------------------------+--------------------------+
| ``desch`` anyway? ``:(`` |                          |
+--------------------------+--------------------------+

A full-credit implementation of ``usem`` will, in addition to being *correct*
as per the above API, be *fair* in a **FIFO ordering** sense:

* If two threads, ``T1`` and ``T2``, call ``usem_acquire()`` on a ``usem``
  whose counter is currently ``0``, with ``T1`` acquiring before ``T2``,
  then the next ``usem_release()`` will wake ``T1`` and not ``T2``.  That is,
  the intended behavior, assuming the semaphore is initialized to ``0``, is:

  +------------------------------+------------------------------+--------------------------+
  | Thread 1                     | Thread 2                     | Thread 3                 |
  +==============================+==============================+==========================+
  | ``usem_acquire()`` ``desch`` |                              |                          |
  +------------------------------+------------------------------+--------------------------+
  |                              | ``usem_acquire()`` ``desch`` |                          |
  +------------------------------+------------------------------+--------------------------+
  |                              |                              | ``usem_release()``       |
  +------------------------------+------------------------------+--------------------------+
  | ``usem_acquire()`` finishes  |                              |                          |
  +------------------------------+------------------------------+--------------------------+

* Moreover, in the above scenario, another thread calling ``usem_acquire()``
  on this spinlock, even if after the ``usem_release()`` call indicated,
  will only return *after* ``T2`` returns.  That is, this outcome, again assuming
  that the semaphore is initialized to ``0``, is *to be avoided*:

  +------------------------------+------------------------------+--------------------------+
  | Thread 1                     | Thread 2                     | Thread 3                 |
  +==============================+==============================+==========================+
  | ``usem_acquire()`` ``desch`` |                              |                          |
  +------------------------------+------------------------------+--------------------------+
  |                              |                              | ``usem_release()``       |
  +------------------------------+------------------------------+--------------------------+
  |                              | Call ``usem_acquire()``      |                          |
  +------------------------------+------------------------------+--------------------------+
  |                              | ``usem_acquire()`` returns   |                          |
  +------------------------------+------------------------------+--------------------------+


All of these functions are given in skeletal form in ``usem.c``.

Part 3: uthr: Thread Library Functions
######################################

``uthr`` provides functionality akin to (a small subset of) the `POSIX
threads <https://en.wikipedia.org/wiki/POSIX_Threads>`_ library functions
``pthread_create``, ``pthread_exit``, and ``pthread_join`` on a real UNIX
system.  It defines the following functions:

* ``int uthr_init(unsigned int stack_size);`` initializes the thread library
  and sets the stack size of all created threads to ``stack_size`` bytes.
  *Unfortunately*, despite the symmetry of threads in general, the library
  as we are designing it is unable to ensure that the main thread (i.e. the
  caller of ``thr_init``) will have a stack of ``stack_size`` bytes, as
  its stack comes from the xv6 kernel.

* ``void *uthr_malloc(int size);`` is a *thread-safe* wrapper around the
  existing ``umalloc`` ``malloc()`` function.

* ``void uthr_free(void *what);`` is a *thread-safe* wrapper around
  ``free()``, by analogy.

  Because of the way ``umalloc`` is written, at most one thread per process
  can safely be manipulating its data structures; it is the job of these
  wrapper functions to ensure that this holds!

* ``int uthr_create(void (*f)(void *), void *arg);`` creates a new thread,
  with its own stack, running the code ``f(arg)``.  Returns to the parent
  the *thread identifier* of this thread, or a negative number on error
  (in which case, no thread was created).
  
  * **Undergraduates**: it is OK if you wish to require ``f`` to call
    ``uthr_exit(...)``, just like xv6 requires our program to call ``exit``.

  * **Graduates**: your submission must interpret a return from ``f`` as
    ``uthr_exit(0)`` (as well as handling any ``uthr_exit(...)`` calls that
    it may make!).

* ``void uthr_exit(void *baton);`` ends the execution of the current thread,
  arranging for the ``baton`` to be passed to whichever thread calls...

* ``int uthr_join(int tid, void **hand);``, which waits for the thread
  identified by ``tid`` to call ``uthr_exit(baton)`` and then arranges for
  ``baton`` to be placed in ``*hand``.  You may, at your discretion, drop
  the ``baton`` if ``hand`` is ``0``.

  If the thread identified by ``tid`` has already been ``uthr_join()``-ed
  (regardless of whether or not it has exited and had its ``baton``
  collected), ``uthr_join()`` returns a negative number and does not mutate
  ``*hand``.

  The effect of ``uthr_join(gettid(), ...)`` is *either* to hang forever or
  to return an error (negative result).  The latter is more friendly to the
  API user, to be sure.

  .. **318** students *may* use ``uspin_acquire()`` or similar mechanisms (i.e.
  .. ``yield()``-based busy-waiting) to synchronize and wait for the targeted
  .. thread to ``thr_exit()``.  **418** students **MUST** use a mechanism,
  .. *such as* ``desch()`` or ``usem_acquire()``, that causes the waiting
  .. thread to not be ``RUNNABLE`` while it is waiting.

``uthr_create()`` and ``uthr_exit()`` will need to use the ``tspawn``
and ``texit`` system calls introduced above.  Note that there is no
need for kernel-side support for ``uthr_join()``!  Isn't that interesting...

``uthr_exit()`` and ``uthr_join()`` must not suffer from "the terminal
irony" whereby a failure to allocate memory could imply that a thread could
not exit or make progress through ``uthr_join()`` (and thereby free up memory).

Because it may be a long while between ``uthr_join()`` and ``uthr_exit()``,
``uthr_join()`` *must not poll* (*spin until*) a thread exits.  The kernel
offers all the functionality required to avoid such inefficiencies.  Moreover,
if the target of ``uthr_join()`` has *already* ``uthr_exit()``-ed,
``uthr_join()`` must return *promptly*.

The main thread is permitted to call ``uthr_exit()`` and must be
``uthr_join()``-able.  That is, once the main thread has called
``uthr_init()``, it is *just another thread* as far as the library is
concerned.  Note that this deviates from some platforms' interpretation of
threading, whereby the main thread shutting down is grounds to terminate the
entire process.

All of these functions are given in skeletal form in ``uthr.c``.

*It is expected that the userland thread library will depend on the userland
malloc() functionality; there is no need to write your own allocator for any
of this*!  (Though of course even here invasive data structures are possibly
worth considering!)  **BIG HINT**: You may wish to carefully consider the
interaction of ``uthr``'s internal use of ``malloc`` and ``free``, if any,
and the exported ``uthr_malloc`` and ``uthr_free`` wrappers!

Thread Exits And Stacks
-----------------------

Note that there is a tricky difficulty with ``free()``-ing the stack
associated with a thread that is ``uthr_exit()``-ing.  There is a reason
that ``texit()`` takes the parameters it does.  As a **big** hint, you may
find yourself wishing for a ``uspin_release_and_texit()`` or even
``usem_release_and_texit()`` operation.  Feel free to write it/them!

Suggested Plan Of Attack
########################

As I said at the beginning, this assignment has a lot of pieces.  Here's how
I might (and mostly did) approach this problem:

1. User-level locking primitives (``uspin`` or ``usem`` API) can be designed
   and written before anything else, given the kernel we give out.  You will
   probably want to look at ``318_usem1.c`` and ``2`` and ``3`` too; look at
   ``318_dmr.c`` for an example of using ``desch`` and ``mkrun``.

2. Review ``tspawn`` and ``texit`` calls.  Look at ``318_ts1.c``
   for an example of their usage.

3. Write a minimal ``uthr_create`` around ``tspawn``.  ``318_tc1.c``
   does not depend on any other ``uthr`` function, and can help make sure
   the basics are in a good place.  Modifying ``318_usem3.c`` to use
   ``uthr_create`` will also give you additional test cases (though the
   continued use of ``texit`` rather than ``uthr_exit`` will make them somewhat
   rude, but that's OK!).

4. Now implement ``uthr_exit()`` and ``uthr_join()``.  ``318_tcj1.c`` and
   ``318_tcj2.c`` depend on the entirety of the user thread library and might
   be termed "basic" and "advanced" handling tests.

5. ``318_main_te.c`` and ``318_join_main.c`` test the special case of the
   main (i.e. initial) thread exiting and being joined upon.

6. Write your own tests!  Debug!  Go back and fix things!

7. When the deadline hits, or when you decide you're happy with how things
   are, make your submission and go enjoy the outside.

In case you're interested in line counts of my solution (just to estimate
the workload, perhaps); all sloc counts are generated using David A.
Wheeler's ``SLOCCount``:

  * a little library of helper functions I wrote is 33 sloc long (61 literal newlines)
    (I'll leave it to your imagination what might be in there.);
  * ``uthrdefs.h`` is 9   (13);
  * ``uspin.c``    is 19  (26);
  * ``usem.c``     is 49  (64);
  * and ``uthr.c`` is 188 (240).

By comparison, all the tests given add up to 401 sloc (546)!

README, Scoring, ...
####################

As usual, please include a ``README`` file with discussion of your
submission and its design, including rationale for design decisions and
things you struggled with and all that other good stuff.  I would
appreciate, though you do not need to include, opinions of the assignment; I
promise not to dock points if you tell me the assignment stinks. ;)

The ``usem`` locking primitives will be worth about 20% of the grade, about
evenly divided between functionality when subjected to tests and manual
review.  The ``uthr`` library will be roughly 60%, again about evenly
divided.  The remaining 20% is, of course, for the README.

Good luck!
##########

One last bit of advice: if something really isn't working, get up and
stretch.  Go for a walk around the building.  Relax.  Do other work.  Come
back later and you'll see things you didn't see before.

Changelog
#########

* Unlike the CMU original version, this version uses ``tspawn()`` instead
  of a much more raw ``thread_fork()`` which spawns a new thread with
  an *identical register file*, requiring the use of assembler in the
  analog of ``uthr_create()``.

* Updated ``texit(void)`` to ``texit(int *where, int what)`` to indicate
  positive possession of a thread by the kernel.  This alleviates the
  possible-but-tricky need for userland to write things like
  ``uspin_release_and_texit()`` in assembler.  With the new API,
  ``uspin_release_and_texit()`` can be entirely in C.  Given that this
  is a 1-week assignment, that seems entirely reasonable.

* TODO:  When and if mainline xv6 takes patches to fix the process ID
  wraparound issue, this assignment can be revised to address the thread
  ID wraparound.  The proper fix, I believe, is to have threads linger
  until joined, only ``texit()``-ing once that happens.

  Relatedly, there is now some new prose about tids being global.

For The Curious
---------------

There are, of course, other threading xv6 assignments than this one.
For examples,

* http://pages.cs.wisc.edu/~remzi/Classes/537/Spring2011/Projects/p6.html

* https://www.cs.uic.edu/bin/view/CS385/Homework6

* http://moss.cs.iit.edu/cs450/assign02-xv6-threading.html