# Mu: a human-scale computer
Mu is a minimal-dependency hobbyist computing stack (everything above the
processor and OS kernel).
Mu is not designed to operate in large clusters providing services for
millions of people. Mu is designed for _you_, to run one computer. (Or a few.)
Running the code you want to run, and nothing else.
```sh
$ git clone https://github.com/akkartik/mu
$ cd mu
$ ./translate_mu apps/ex2.mu # emits a.elf
$ ./a.elf # adds 3 and 4
$ echo $?
7
```
[![Build Status](https://api.travis-ci.org/akkartik/mu.svg?branch=master)](https://travis-ci.org/akkartik/mu)
Mu requires just a Unix-like OS and nothing else. The Mu translator is built
up from machine code. You can also bootstrap it from C++. Both C++ and
self-hosted versions emit identical binaries. The generated binaries require
just a Unix-like kernel and nothing else. (There are more details in [this paper](http://akkartik.name/akkartik-convivial-20200315.pdf).)
## Goals
In priority order:
* [Reward curiosity.](http://akkartik.name/about)
* Easy to build, easy to run. [Minimal dependencies](https://news.ycombinator.com/item?id=16882140#16882555),
so that installation is always painless.
* All design decisions comprehensible to a single individual. (On demand.)
* All design decisions comprehensible without needing to talk to anyone.
(I always love talking to you, but I try hard to make myself redundant.)
* [A globally comprehensible _codebase_ rather than locally clean code.](http://akkartik.name/post/readable-bad)
* Clear error messages over expressive syntax.
* Safe.
* Thorough test coverage. If you break something you should immediately see
an error message. If you can manually test for something you should be
able to write an automated test for it.
* Memory leaks over memory corruption.
* Teach the computer bottom-up.
## Non-goals
* Speed. Staying close to machine code should naturally keep Mu fast enough.
* Efficiency. Controlling the number of abstractions should naturally keep Mu
using far less than the gigabytes of memory modern computers have.
* Portability. Mu will run on any computer as long as it's x86. I will
enthusiastically contribute to support for other processors -- in separate
forks. Readers shouldn't have to think about processors they don't have.
* Compatibility. The goal is to get off mainstream stacks, not to perpetuate
them. Sometimes the right long-term solution is to [bump the major version number](http://akkartik.name/post/versioning).
* Syntax. Mu code is meant to be comprehended by [running, not just reading](http://akkartik.name/post/comprehension).
For now it's a thin veneer over machine code. I'm working on memory safety
before expressive syntax.
## Source Language
Mu's main source language is [still being built](http://akkartik.name/post/mu-2019-2).
When completed, it will be type- and memory-safe. At the moment it performs no
checks. Here's the program we translated above:
There are no expressions; functions consist of only statements that operate on
variables. Most statements in Mu translate to [a single machine code instruction](http://akkartik.github.io/mu/html/mu_instructions.html).
Variables reside in memory by default. Programs must specify registers when
they want to use them. Functions must return results in registers. Execution
begins at the function `main`, which always returns its result in register
`ebx`. [The paper](http://akkartik.name/akkartik-convivial-20200315.pdf) has
more details, and there's a [summary](mu_summary) of all supported instructions.
## SubX
Mu is written in [a notation for a subset of x86 machine code called SubX](http://akkartik.name/post/mu-2019-1).
Here's a SubX program (`apps/ex1.subx`) that returns 42:
```sh
bb/copy-to-ebx 0x2a/imm32 # 42 in hex
b8/copy-to-eax 1/imm32/exit
cd/syscall 0x80/imm8
```
You can generate tiny zero-dependency ELF binaries from SubX that run on Linux.
```sh
$ ./bootstrap translate init.linux apps/ex1.subx -o apps/ex1 # on Linux or BSD or Mac
$ ./apps/ex1 # only on Linux
$ echo $?
42
```
(Running `bootstrap` requires a C++ compiler, transparently invoking it as
necessary.)
You can run the generated binaries on an interpreter/VM for better error
messages.
```sh
$ ./bootstrap run apps/ex1 # on Linux or BSD or Mac
$ echo $?
42
```
Emulated runs can generate a trace that permits [time-travel debugging](https://github.com/akkartik/mu/blob/master/tools/browse_trace.readme.md).
```sh
$ ./bootstrap --debug translate init.linux apps/factorial.subx -o apps/factorial
saving address->label information to 'labels'
saving address->source information to 'source_lines'
$ ./bootstrap --trace run apps/factorial
saving trace to 'last_run'
$ tools/browse_trace last_run # text-mode debugger UI
```
You can write tests for your programs. The entire stack is thoroughly covered
by automated tests. SubX's tagline: tests before syntax.
```sh
$ ./bootstrap test
$ ./bootstrap run apps/factorial test
```
You can use SubX to translate itself. For example, running natively on Linux:
```sh
# generate translator phases using the C++ translator
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/hex.subx -o hex
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/survey.subx -o survey
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/pack.subx -o pack
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/assort.subx -o assort
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/dquotes.subx -o dquotes
$ ./bootstrap translate init.linux 0*.subx apps/subx-params.subx apps/tests.subx -o tests
$ chmod +x hex survey pack assort dquotes tests
# use the generated translator phases to translate SubX programs
$ cat init.linux apps/ex1.subx |./tests |./dquotes |./assort |./pack |./survey |./hex > a.elf
$ chmod +x a.elf
$ ./a.elf
$ echo $?
42
# or, automating the above steps
$ ./translate_subx init.linux apps/ex1.subx
$ ./a.elf
$ echo $?
42
```
Or, running in a VM on other platforms (much slower):
```sh
$ ./translate_subx_emulated init.linux apps/ex1.subx # generates identical a.elf to above
$ ./bootstrap run a.elf
$ echo $?
42
```
You can package up SubX binaries with the minimal hobbyist OS [Soso](https://github.com/ozkl/soso)
and run them on Qemu. (Requires graphics and sudo access. Currently doesn't
work on a cloud server.)
```sh
# dependencies
$ sudo apt install build-essential util-linux nasm xorriso # maybe also dosfstools and mtools
# package up a "hello world" program with a third-party kernel into mu_soso.iso
# requires sudo
$ tools/iso/soso init.soso apps/ex6.subx
# try it out
$ qemu-system-i386 -cdrom mu_soso.iso
```
You can also package up SubX binaries with a Linux kernel and run them on
either Qemu or [a cloud server that supports custom images](http://akkartik.name/post/iso-on-linode).
(Takes 12 minutes with 2GB RAM. Requires 12 million LoC of C for the Linux
kernel; that number will gradually go down.)
```sh
$ sudo apt install build-essential flex bison wget libelf-dev libssl-dev xorriso
$ tools/iso/linux init.linux apps/ex6.subx
$ qemu-system-x86_64 -m 256M -cdrom mu_linux.iso -boot d
```
## The syntax of SubX instructions
Here is the above SubX example again:
```sh
bb/copy-to-ebx 0x2a/imm32 # 42 in hex
b8/copy-to-eax 1/imm32/exit
cd/syscall 0x80/imm8
```
Every line contains at most one instruction. Instructions consist of words
separated by whitespace. Words may be _opcodes_ (defining the operation being
performed) or _arguments_ (specifying the data the operation acts on). Any
word can have extra _metadata_ attached to it after `/`. Some metadata is
required (like the `/imm32` and `/imm8` above), but unrecognized metadata is
silently skipped so you can attach comments to words (like the instruction
name `/copy-to-eax` above, or the `/exit` argument).
What do all these numbers mean? SubX supports a small subset of the 32-bit x86
instruction set that likely runs on your computer. (Think of the name as short
for "sub-x86".) The instruction set contains instructions like `89/copy`,
`01/add`, `3d/compare` and `51/push-ecx` which modify registers and a byte-addressable
memory. For a complete list of supported instructions, run `bootstrap help opcodes`.
The registers instructions operate on are as follows:
* Six general-purpose 32-bit registers: `0/eax`, `1/ebx`, `2/ecx`, `3/edx`,
`6/esi` and `7/edi`.
* Two additional 32-bit registers: `4/esp` and `5/ebp`. (I suggest you only
use these to manage the call stack.)
(SubX doesn't support floating-point registers yet. Intel processors support
an 8-bit mode, 16-bit mode and 64-bit mode. SubX will never support them.
There are also _many_ more instructions that SubX will never support.)
While SubX doesn't provide the usual mnemonics for opcodes, it _does_ provide
error-checking. If you miss an argument or accidentally add an extra argument,
you'll get a nice error. SubX won't arbitrarily interpret bytes of data as
instructions or vice versa.
It's worth distinguishing between an instruction's arguments and its _operands_.
Arguments are provided directly in instructions. Operands are pieces of data
in register or memory that are operated on by instructions.
Intel processors typically operate on no more than two operands, and at most
one of them (the 'reg/mem' operand) can access memory. The address of the
reg/mem operand is constructed by expressions of one of these forms:
* `%reg`: operate on just a register, not memory
* `*reg`: look up memory with the address in some register
* `*(reg + disp)`: add a constant to the address in some register
* `*(base + (index << scale) + disp)` where `base` and `index` are registers,
and `scale` and `disp` are 2- and 32-bit constants respectively.
Under the hood, SubX turns expressions of these forms into multiple arguments
with metadata in some complex ways. See [SubX-addressing-modes.md](SubX-addressing-modes.md).
That covers the complexities of the reg/mem operand. The second operand is
simpler. It comes from exactly one of the following argument types:
- `/r32`
- displacement: `/disp8` or `/disp32`
- immediate: `/imm8` or `/imm32`
Putting all this together, here's an example that adds the integer in `eax` to
the one at address `edx`:
```
01/add %edx 0/r32/eax
```
## The syntax of SubX programs
SubX programs map to the same ELF binaries that a conventional Linux system
uses. Linux ELF binaries consist of a series of _segments_. In particular, they
distinguish between code and data. Correspondingly, SubX programs consist of a
series of segments, each starting with a header line: `==` followed by a name
and approximate starting address.
All code must lie in a segment called 'code'.
Segments can be added to.
```sh
== code 0x09000000 # first mention requires starting address
...A...
== data 0x0a000000
...B...
== code # no address necessary when adding
...C...
```
The `code` segment now contains the instructions of `A` as well as `C`.
Within the `code` segment, each line contains a comment, label or instruction.
Comments start with a `#` and are ignored. Labels should always be the first
word on a line, and they end with a `:`.
Instructions can refer to labels in displacement or immediate arguments, and
they'll obtain a value based on the address of the label: immediate arguments
will contain the address directly, while displacement arguments will contain
the difference between the address and the address of the current instruction.
The latter is mostly useful for `jump` and `call` instructions.
Functions are defined using labels. By convention, labels internal to functions
(that must only be jumped to) start with a `$`. Any other labels must only be
called, never jumped to. All labels must be unique.
Functions are called using the following syntax:
```
(func arg1 arg2 ...)
```
Function arguments must be either literals (integers or strings) or a reg/mem
operand using the syntax in the previous section.
A special label is `Entry`, which can be used to specify/override the entry
point of the program. It doesn't have to be unique, and the latest definition
will override earlier ones.
(The `Entry` label, along with duplicate segment headers, allows programs to
be built up incrementally out of multiple [_layers_](http://akkartik.name/post/wart-layers).)
Another special pair of labels are the block delimiters `{` and `}`. They can
be nested, and jump instructions can take arguments `loop` or `break` that
jump to the enclosing `{` and `}` respectively.
The data segment consists of labels as before and byte values. Referring to
data labels in either `code` segment instructions or `data` segment values
yields their address.
Automatic tests are an important part of SubX, and there's a simple mechanism
to provide a test harness: all functions that start with `test-` are called in
turn by a special, auto-generated function called `run-tests`. How you choose
to call it is up to you.
I try to keep things simple so that there's less work to do when implementing
SubX in SubX. But there _is_ one convenience: instructions can provide a
string literal surrounded by quotes (`"`) in an `imm32` argument. SubX will
transparently copy it to the `data` segment and replace it with its address.
Strings are the only place where a SubX word is allowed to contain spaces.
That should be enough information for writing SubX programs. The `apps/`
directory provides some fodder for practice in the `apps/ex*.subx` files,
giving a more gradual introduction to SubX features. In particular, you should
work through `apps/factorial4.subx`, which demonstrates all the above ideas in
concert.
This repo includes binaries for all examples. At any commit, an example's
binary should be identical bit for bit with the result of translating the
corresponding `.subx` file. The binary should also be natively runnable on a
Linux system running on Intel x86 processors, either 32- or 64-bit. If either
of these invariants is broken it's a bug on my part.
## Running
`bootstrap` currently has the following sub-commands:
* `bootstrap help`: some helpful documentation to have at your fingertips.
* `bootstrap test`: runs all automated tests.
* `bootstrap translate -o