README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273

# Mu: a human-scale computer

Mu is a minimal-dependency hobbyist computing stack (everything above the
processor).

Mu is not designed to operate in large clusters providing services for
millions of people. Mu is designed for _you_, to run one computer. (Or a few.)
Running the code you want to run, and nothing else.

Here's the Mu computer running [Conway's Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life).

```sh
git clone https://github.com/akkartik/mu
cd mu
./translate apps/life.mu  # emit a bootable code.img
qemu-system-i386 code.img
```

<img alt='screenshot of Game of Life running on the Mu computer' src='html/life.png'>

([Colorized sources.](http://akkartik.github.io/mu/html/apps/life.mu.html)
This is memory-safe code, and most statements map to a single instruction of
machine code.)

Rather than start from some syntax and introduce layers of translation to
implement it, Mu starts from the processor's instruction set and tries to get
to _some_ safe and clear syntax with as few layers of translation as possible.
The emphasis is on internal consistency at any point in time rather than
compatibility with the past. ([More details.](http://akkartik.name/akkartik-convivial-20200607.pdf))

Tests are a key mechanism here for creating a computer that others can make
their own. I want to encourage a style of active and interactive reading with
Mu. If something doesn't make sense, try changing it and see what tests break.
Any breaking change should cause a failure in some well-named test somewhere.

Mu requires a 32-bit x86 processor. It supports a short list of generic
hardware. There's no networking support yet. Development has slowed, but I
still care about it. Feedback, bug reports and other forms of contribution
continue to be appreciated.

## Goals

In priority order:

- [Reward curiosity.](http://akkartik.name/about)
  - Easy to build, easy to run. [Minimal dependencies](https://news.ycombinator.com/item?id=16882140#16882555),
    so that installation is always painless.
  - All design decisions comprehensible to a single individual. (On demand.)
  - All design decisions comprehensible without needing to talk to anyone.
    (I always love talking to you, but I try hard to make myself redundant.)
  - [A globally comprehensible _codebase_ rather than locally clean code.](http://akkartik.name/post/readable-bad)
  - Clear error messages over expressive syntax.
- Safe.
  - Thorough test coverage. If you break something you should immediately see
    an error message.
  - Memory leaks over memory corruption.
- Teach the computer bottom-up.

Thorough test coverage in particular deserves some elaboration. It implies
that any manual test should be easy to turn into a reproducible automated
test. Mu has some unconventional methods for providing this guarantee. It
exposes testable interfaces for hardware using dependency injection so that
tests can run on -- and make assertions against -- fake hardware. It also
performs [automated white-box testing](http://akkartik.name/post/tracing-tests)
which enables robust tests for performance, concurrency, fault-tolerance, etc.

## Non-goals

- Speed. Staying close to machine code should naturally keep Mu fast enough.
- Efficiency. Controlling the number of abstractions should naturally keep Mu
  using far less than the gigabytes of memory modern computers have.
- Portability. Mu will run on any computer as long as it's x86. I will
  enthusiastically contribute to support for other processors -- in separate
  forks. Readers shouldn't have to think about processors they don't have.
- Compatibility. The goal is to get off mainstream stacks, not to perpetuate
  them. Sometimes the right long-term solution is to [bump the major version number](http://akkartik.name/post/versioning).
- Syntax. Mu code is meant to be comprehended by [running, not just reading](http://akkartik.name/post/comprehension).
  It will always be just a thin memory-safe veneer over machine code.
  I don't know how to make higher-level notations both fast and
  comprehensible, so they are likely to remain slow and comprehensible, useful
  for prototyping but invariably needing to be rewritten in statements that
  map 1:1 with machine code. The goal of a prototype should be a risk-free
  rewrite, thanks to tests that capture all the details of lessons learned.

## Toolchain

The Mu stack consists of:
- the Mu type-safe and memory-safe language;
- SubX, an unsafe notation for a subset of x86 machine code; and
- _bare_ SubX, a more rudimentary form of SubX without certain syntax sugar.

All Mu programs get translated through these layers into tiny zero-dependency
binaries that run natively. The translators for most levels are built out of
lower levels. The translator from Mu to SubX is written in SubX, and the
translator from SubX to bare SubX is built in bare SubX. There is also an
emulator for Mu's supported subset of x86, that's useful for [debugging SubX
programs](linux/subx_debugging.md).

Mu programs build natively either on Linux or on Windows using [WSL 2](https://docs.microsoft.com/en-us/windows/wsl/install-win10).
For Macs and other Unix-like systems, use the (much slower) emulator:

```sh
./translate_emulated apps/ex2.mu  # 2-5 minutes to emit code.img
```

(Mac OS may require either editing `translate_emulated` or installing GNU
coreutils. Look in the script if you get an error about `stat`.)

Mu programs can be written for two very different environments:

* At the top-level, Mu programs emit a bootable image that runs without an OS
  (under emulation; I haven't tested on native hardware yet). There's rudimentary
  support for some core peripherals: a 1024x768 screen, a keyboard with some
  key-combinations, a PS/2 mouse that must be polled, a slow ATA disk drive.
  No hardware acceleration, no virtual memory, no process separation, no
  multi-tasking, no network. Boot always runs all tests, and only gets to
  `main` if all tests pass.

* The top-level is built using tools created under the `linux/` sub-directory.
  This sub-directory contains an entirely separate set of libraries intended
  for building programs that run with just a Linux kernel, reading from stdin
  and writing to stdout. The Mu compiler is such a program, at `linux/mu.subx`.
  Individual programs typically run tests if given a command-line argument
  called `test`.

The largest program built in Mu today is its prototyping environment for
writing slow, interpreted programs in a Lisp-based high-level language.

<img alt='screenshot of the Mu shell' src='html/20210624-shell.png'>

(For more details, see [the `shell/` directory.](https://github.com/akkartik/mu/tree/main/shell#readme))

While I currently focus on programs without an OS, the `linux/` sub-directory
is fairly ergonomic. There's a couple of dozen example programs to try out
there. It is likely to be the option for a network stack in the foreseeable
future; I have no idea how to interact on the network without Linux.

## Syntax

The entire stack shares certain properties and conventions. Programs consist
of functions and functions consist of statements, each performing a single
operation. Operands to statements are always variables or constants. You can't
perform `a + b*c` in a single statement; you have to break it up into two.
Variables can live in memory or in registers. Registers must be explicitly
specified. There are some shared lexical rules. Comments always start with
'#'. Numbers are always written in hex. Many terms can have context-dependent
_metadata_ attached after '/'.

Here's an example program in Mu:

<img alt='ex2.mu' src='html/ex2.mu.png' width='400px'>

More resources on Mu:

* [Mu Syntax reference](mu.md)

* [Library reference.](vocabulary.md) Mu programs can transparently call
  low-level functions written in SubX.

Here's an example program in SubX:

```sh
== code
Entry:
  # ebx = 1
  bb/copy-to-ebx  1/imm32
  # increment ebx
  43/increment-ebx
  # exit(ebx)
  e8/call  syscall_exit/disp32
```

More resources on SubX:

* [SubX syntax reference](subx.md)

* [Some starter exercises for learning SubX](https://github.com/akkartik/mu/pulls)
  (labelled `hello`). Feel free to [ping me](mailto:ak@akkartik.com) with any
  questions.

* The [list of x86 opcodes](subx_opcodes) supported in SubX: `linux/bootstrap/bootstrap help opcodes`.

* [Some tips for debugging SubX programs.](linux/subx_debugging.md)

## Mirrors and Forks

As of 2022-01, updates to Mu can be downloaded from the following mirrors:
* https://github.com/akkartik/mu
* https://repo.or.cz/mu.git
* https://codeberg.org/akkartik/mu
* https://tildegit.org/akkartik/mu
* https://git.tilde.institute/akkartik/mu
* https://git.sr.ht/~akkartik/mu

Forks of Mu are encouraged. If you don't like something about this repo, feel
free to make a fork. If you show it to me, I'll link to it here. I might even
pull features upstream!

- [uCISC](https://github.com/grokthis/ucisc): a 16-bit processor being
  designed from scratch by [Robert Butler](https://www.youtube.com/channel/UCh4OpfF7T7UtezGejRTLxCw)
  and programmed with a SubX-like syntax.
- [subv](https://git.s-ol.nu/subv): experimental SubX-like syntax by [s-ol
  bekic](https://mmm.s-ol.nu) for the RISC-V instruction set.
- [mu-x86\_64](https://git.sr.ht/~akkartik/mu-x86_64): experimental fork for
  64-bit x86 in collaboration with [Max Bernstein](https://bernsteinbear.com).
  It's brought up a few concrete open problems that I don't have good solutions
  for yet.
- [mu-normie](https://git.sr.ht/~akkartik/mu-normie): with a more standard
  build system for the `linux/bootstrap/` directory that organizes the repo by
  header files and compilation units. Stays in sync with this repo.

## Desiderata

If you're still reading, here are some more things to check out:

- [A slow guided tour of Mu.](tutorial/index.md)

- [How to get your text editor set up for Mu and SubX programs.](editor/editor.md)

- [Some videos demonstrating Mu programs and features.](https://archive.org/details/@kartik_agaram)

- [A summary](mu_instructions) of how the Mu compiler translates statements
  to SubX. Most Mu statements map to a single x86 instruction.
  ([colorized version](http://akkartik.github.io/mu/html/mu_instructions.html))

- A prototype live-updating programming environment for a postfix language
  that I might work on again one day:

  ```sh
  cd linux
  ./translate tile/*.mu
  ./a.elf screen
  ```

- Previous prototypes: [mu0](https://github.com/akkartik/mu0), [mu1](https://github.com/akkartik/mu1).

## Credits

Mu builds on many ideas that have come before, especially:

- [Peter Naur](http://akkartik.name/naur.pdf) for articulating the paramount
  problem of programming: communicating a codebase to others;
- [Christopher Alexander](http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperbacks/dp/0674627512)
  and [Richard Gabriel](https://www.dreamsongs.com/Files/PatternsOfSoftware.pdf) for
  the intellectual tools for reasoning about the higher order design of a
  codebase;
- [David Parnas](http://www.cs.umd.edu/class/spring2003/cmsc838p/Design/criteria.pdf)
  and others for highlighting the value of separating concerns and stepwise
  refinement;
- The folklore of debugging by print and the trace facility in many Lisp
  systems;
- Automated tests for showing the value of developing programs inside an
  elaborate harness;

On a more tactical level, this project has made progress in a series of bursts
as I discovered the following resources. In autobiographical order, with no
claims of completeness:
- [&ldquo;Bootstrapping a compiler from nothing&rdquo;](http://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html) by Edmund Grimley-Evans.
- [StoneKnifeForth](https://github.com/kragen/stoneknifeforth) by [Kragen Sitaker](http://canonical.org/~kragen),
  including [a tiny sketch of an ELF loader](https://github.com/kragen/stoneknifeforth/blob/master/386.c).
- [&ldquo;Creating tiny ELF executables&rdquo;](https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html) by Brian Raiter.
- [Single-page cheatsheet for the x86 ISA](https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf)
  by Daniel Plohmann ([cached local copy](https://github.com/akkartik/mu/blob/main/cheatsheet.pdf))
- [Minimal Linux Live](http://minimal.linux-bg.org) for teaching how to create
  a bootable disk image using the syslinux bootloader.
- [&ldquo;Writing a bootloader from scratch&rdquo;](https://www.cs.bham.ac.uk/~exr/lectures/opsys/10_11/lectures/os-dev.pdf)
  by Nick Blundell.
- Wikipedia on BIOS interfaces: [Int 10h](https://en.wikipedia.org/wiki/INT_10H), [Int 13h](https://en.wikipedia.org/wiki/INT_13H).
- [Some tips on programming bootloaders](https://stackoverflow.com/questions/43786251/int-13h-42h-doesnt-load-anything-in-bochs/43787939#43787939)
  by Michael Petch.
- [xv6, the port of Unix Version 6 to x86 processors](https://github.com/mit-pdos/xv6-public)
- Some tips on handling keyboard interrupts by [Alex Dzyoba](https://alex.dzyoba.com/blog/os-interrupts)
  and [Michael Petch](https://stackoverflow.com/questions/37618111/keyboard-irq-within-an-x86-kernel).