Dinghy index concept papers



Drift
.

This paper is part of a series. Each entry in the series presents and
evaluates a strategy towards a concept called Dinghy.

The strategy discussed in this paper is called Drift.

Drift is an operating system is that is designed to run under the hypervisor
of a modern amd64 desktop system.


Model

    The user configures a fresh hypervisor and allocates two processor cores
    to it. The user points the hypervisor instance towards a disk image
    containing a build of Drift.

    There is a Drift Boot Loader. During startup, it coordinates the shift
    from real mode into protected/long mode and launches Kmain/C. Kmain/C
    launches Kmain/W.

        -------------------                      -------------------
        |Thread/C         |                      |Thread/W         |
        |(control)        |                      |(worker)         |
        |                 |                      |                 |
        | Kmain/C         |                      | KMain/W         |
        | ..............  |                      | ............... |
        | .            .}-------------------------}.             . |
        | . event loop .  |  stream pair         | . event loop  . |
        | .            .{-------------------------{.             . |
        | ..............  |                      | ............... |
        | +memory map     |                      |      ^.         |
        | +int handlers   |                      |      .. stream pair
        | +tcp stack      |                      |      .v         |
        | +repl context   |                      | ............... |
        |                 |                      | .             . |
        |                 |                      | . application . |
        |                 |                      | . code        . |
        |                 |                      | .             . |
        |                 |                      | ............... |
        |                 |                      |                 |
        -------------------                      -------------------
      
        ------------------------------------------------------------
        | hypervisor                                               |
        ------------------------------------------------------------
      
        -------------------                      -------------------
        | logical core    |                      | logical core    |
        -------------------                      -------------------
    
    Drift runs one thread per logical core. Each hosts an event loop.
    
    Thread/C acts as a coordinator for the system. For example, it masters
    access to memory.
    
    Anything that is interrupt-heavy should happen in Thread/C. (The reasons
    for this are discussed below)
    
    Thread/W is designed to host "application code" (see below) in a way that
    fits the model.

    Application Code is code that lives in Thread/W, but which is not part of
    core code. It runs in the same memory space and privilege level as
    Thread/W, and is driven by the Thread/W event loop.

    In general, application code does not make system calls. For most things,
    it will directly write to memory (e.g. VGA).

    There will be some things that application code does not want to do via
    direct memory calls. For example, networking. For these needs, it shares a
    channel pair with its Thread/W.

    Thread/C hosts a shell/repl.


This design allows Thread/W to operate without ever needing to handle
interrupts.

    Motive: this gives us latitude to enable SIMD instructions at the
    (simulated) Ring 0 level in Thread/W.
    
    Mainstream operating systems avoid SIMD in their Ring-0 code, because it
    adds overhead to the cost of swapping out process and handling interrupts.

    Thread/W does not do need to swap out processes or handle interrupts.
    Hence, we enable SIMD registers in Thread/W, and application code can
    access it.


Once the model is stable, we could extend it,

    Add more cores, and more instances of Thread/W.
    
    Let these instances communicate with one another over new channels.
    
For some domains, such a system should have access to better performance than
a conventional Yacht platform.

This system dodges the overhead of preemptive multitasking. When you control
the whole codebase, cooperative multitasking is useful.


Q: What is the separation between kernel and application?

    Effectively, there is none. But we use the term /kernel/ to indicate the
    Drift platform consisting of Thread/C and Thread/W. The term
    /application/ refers to code hosted within Thread/W.

    When the code is built, it all gets put into a single binary, and it
    all runs in a 

    Application code can directly change memory, including for devices.

    Application code communicates with the core of Thread/W using a private
    pair of channels. In this way, no system calls are necessary.


Q: How do the queues work?

    They are bilateral buffers, with pointers to indicate the current read
    and write points. They act as a ring. When the pointers reach the end
    of the allocation, they wrap around to the start.


Q: Without interrupts, how does Thread/W know about incoming messages?

    Via polling and a stand-off algorithm, and potential for regular sleeps.
    Drift supplies sensible defaults, but allows configuration.

    The aversion to interrupts in Thread/W is not a hard ban. We could
    introduce them if we wanted. But the design of the system should allow
    give developers an option to run Thread/W entirely without them.


Q: How does Drift manage memory?

    Thread/C has a map of available memory. Thread/W requests pages,
    Thread/C issues and tracks them.

    There is an option to configure protection on pages. i.e. this is opt-in
    memory protection.


Q: What to do if Thread/W misbehaves? (e.g. an endless loop)

    It will be possible for Thread/C to forcibly evict Thread/W. (I do
    not understand the mechanics of this at the time of writing.)

    When you have two kernels as in the diagram at top, it will usually be
    simpler to restart everything. If you were running a large number of
    Thread/W instances, at times you may want to evict one without losing
    state across the others.


Q: Do processes exist in this model?

    Not in a conventional sense.

    Each logical core in a Drift system has one execution thread.

    The way page allocation works is not consistent with the way that memory
    access works in a traditional yacht system.


Q: What is the platform position on programming languages?

    The core will be written in some mix of nasm and C++ to start with. C++ is
    useful for prototyping. That will probably evolve to nasm and C.

    Rust. It should be possible to implement application logic in Rust. I have
    had problems validating this. My issues are probably the result of quirks
    in recent nightly builds. There will need to be some enabling code written
    in C.

    Lua. This will be possible, again with enabling code written in C.

    Libraries may not translate easily. The memory allocation model of Drift
    is different to yacht conventions. Consider changes that would be
    necessary to allow a user to access the /new/ keyword in C++ code.


Q: Will it self-host?

    Not initially, maybe never.
    
    It is convenient to use a yacht for developing software, and then
    deploying it to the hypervisor.


Q: How does video interaction work?

    There is a convention of passing custody of memory ranges between threads.

    The VGA memory range will be one of these.

    The thread with custody can write directly to VGA ram.


Q: The Dinghy concept asks for sound. How could this work here?

    I don't have any experience with sound programming, but am confident that
    this will be straightforward, although interrupt-heavy. Hence, my
    expectation is that this should be done from Thread/C, with the
    application code instructing it via the message backbone.


Q: What about support for 3d graphics hardware?

    There are two parts to this,

        Accessing the native hardware from within the VM. This is essentially
        a solved problem, although still fiddly. It is now common for youtube
        and twitch types to coordinate their streaming from linux, whilst
        showing a Windows desktop that is hosted within a VM, and speaking to
        a graphics card. I think there are extra complications if Windows is
        your host OS, related to the Hyper V virtualisation platform. Also,
        Nvidia go out of their way to prevent you from doing it. AMD cards are
        fine once your kernel and mesa have caught up to their driver level.

        Effectively communicating with the 3d card. I do not expect this to be
        feasible from the design above. Rather, we would need to create a
        "Thread U" which will be a stripped down Linux that can live alongside
        the other threads. In this way, we could leverage their driver
        support, and communicate with them over shmem channels.

    Routing 3d-card interaction through the message backbone will create
    latency. Will this be significant?

    What is the performance cost of needing to go via inter-core shmem
    channels in order to talk to 3d hardware?

    If anyone reading knows how to model this, please ping us in the channel.


Q: Could this run headless?

    Potentially, yes. You could have a pair of streams connecting from the
    hypervisor host to Thread/C in the quest. Qemu offers a mechanism for
    coordinating shared memory between hypervisor hosts and guests. Under
    linux, the host presents devices under /dev.