1 files changed, 223 insertions, 253 deletions
diff --git a/subx/Readme.md b/subx/Readme.md
index 612081dc..d9a6cde4 100644
--- a/subx/Readme.md
+++ b/subx/Readme.md
@@ -1,274 +1,227 @@
-## SubX: a simplistic assembly language
+## A minimalist assembly language
 
-SubX is a minimalist assembly language designed:
-* to explore ways to turn arbitrary manual tests into reproducible automated
-  tests,
-* to be easy to implement in itself, and
-* to help learn and teach the x86 instruction set.
+SubX is a simple, minimalist stack for programming your computer.
 
-```
-$ git clone https://github.com/akkartik/mu
-$ cd mu/subx
-$ ./subx  # print out a help message
-```
+  ```
+  $ git clone https://github.com/akkartik/mu
+  $ cd mu/subx
+  $ ./subx  # print out a help message
+  ```
 
 [![Build Status](https://api.travis-ci.org/akkartik/mu.svg)](https://travis-ci.org/akkartik/mu)
 
-Expanding on the first bullet, it hopes to support more comprehensive tests
-by:
-
-0. Running generated binaries in _emulated mode_. Emulated mode is slower than
-   native execution (which will also work), but there's more sanity checking,
-   and more descriptive error messages for common low-level problems.
-
-   ```sh
-   $ ./subx translate examples/ex1.subx -o examples/ex1
-   $ ./examples/ex1  # only on Linux
-   $ echo $?
-   42
-   $ ./subx run examples/ex1  # on Linux or BSD or OS X
-   $ echo $?
-   42
-   ```
-
-   The assembly syntax is designed so the assembler (`subx translate`) has
-   very little to do, making it feasible to reimplement in itself. Programmers
-   have to explicitly specify all opcodes and operands.
-
-   ```sh (just for syntax highlighting)
-   # exit(42)
-   bb/copy-to-EBX  0x2a/imm32  # 42 in hex
-   b8/copy-to-EAX  1/imm32/exit
-   cd/syscall  0x80/imm8
-   ```
-
-   To keep code readable you can add _metadata_ to any word after a `/`.
-   Metadata can be just comments for readers, and they'll be ignored. They can
-   also trigger checks. Here, tagging operands with the `imm32` type allows
-   SubX to check that instructions have precisely the operand types they
-   should. x86 instructions have 14 types of operands, and missing one causes
-   all future instructions to go off the rails, interpreting operands as
-   opcodes and vice versa. So this is a useful check.
-
-1. Designing testable wrappers for operating system interfaces. For example,
-   it can `read()` from or `write()` to fake in-memory files in tests. More
-   details [below](#subx-library). We are continuing to port syscalls from
-   [the old Mu VM in the parent directory](https://github.com/akkartik/mu).
-
-2. Supporting a special _trace_ stream in addition to the default `stdin`,
-   `stdout` and `stderr` streams. The trace stream is designed for programs to
-   emit structured facts they deduce about their domain as they execute. Tests
-   can then check the set of facts deduced in addition to the results of the
-   function under test. This form of _automated whitebox testing_ permits
-   writing tests for performance, fault tolerance, deadlock-freedom, memory
-   usage, etc. For example, if a sort function traces each swap, a performance
-   test could check that the number of swaps doesn't quadruple when the size
-   of the input doubles.
-
-The hypothesis is that designing the entire system to be testable from day 1
-and from the ground up would radically impact the culture of an eco-system in
-a way that no bolted-on tool or service at higher levels can replicate. It
-would make it easier to write programs that can be [easily understood by newcomers](http://akkartik.name/about).
-It would reassure authors that an app is free from regression if all automated
-tests pass. It would make the stack easy to rewrite and simplify by dropping
-features, without fear that a subset of targeted apps might break. As a result
-people might fork projects more easily, and also exchange code between
-disparate forks more easily (copy the tests over, then try copying code over
-and making tests pass, rewriting and polishing where necessary). The community
-would have in effect a diversified portfolio of forks, a “wavefront” of
-possible combinations of features and alternative implementations of features
-instead of the single trunk with monotonically growing complexity that we get
-today. Application writers who wrote thorough tests for their apps (something
-they just can’t do today) would be able to bounce around between forks more
-easily without getting locked in to a single one as currently happens.
-
-However, that vision is far away, and SubX is just a first, hesitant step.
-SubX supports a small, regular subset of the 32-bit x86 instruction set.
-(Think of the name as short for "sub-x86".)
-
-  - Only instructions that operate on the 32-bit integer E\*X registers, and a
-    couple of instructions for operating on 8-bit values. No floating-point
-    yet. Most legacy registers will never be supported.
-
-  - Only instructions that assume a flat address space; legacy instructions
-    that use segment registers will never be supported.
-
-  - No instructions that check the carry or parity flags; arithmetic operations
-    always operate on signed integers (while bitwise operations always operate
-    on unsigned integers).
-
-  - Only relative jump instructions (with 8-bit or 32-bit offsets).
-
-The (rudimentary, statically linked) ELF binaries SubX generates can be run
-natively on Linux, and they require only the Linux kernel.
-
-## Status
-
-I'm currently implementing SubX in SubX in 3 phases:
-
-  1. Converting ascii hex bytes to binary. (✓)
-  2. Packing bitfields for x86 instructions into bytes. (80% complete)
-  3. Replacing addresses with labels.
-
-In parallel, I'm designing testable wrappers for syscalls, particularly for
-scalably running blocking syscalls with a test harness concurrently monitoring
-their progress.
-
-## An example program
-
-In the interest of minimalism, SubX requires more knowledge than traditional
-assembly languages of the x86 instructions it supports. Here's an example
-SubX program, using one line per instruction:
+You can generate native ELF binaries with it that run on a bare Linux
+kernel. No other dependencies needed.
 
-<img alt='examples/ex3.subx' src='../html/subx/ex3.png'>
-
-This program sums the first 10 natural numbers. By convention I use horizontal
-tabstops to help read instructions, dots to help follow the long lines,
-comments before groups of instructions to describe their high-level purpose,
-and comments at the end of complex instructions to state the low-level
-operation they perform. Numbers are always in hexadecimal (base 16); the '0x'
-prefix is optional, and I tend to include it as a reminder when numbers look
-like decimal numbers or words.
+  ```sh
+  $ ./subx translate examples/ex1.subx -o examples/ex1
+  $ ./examples/ex1  # only on Linux
+  $ echo $?
+  42
+ ```
 
-As you can see, programming in SubX requires the programmer to know the (kinda
-complex) structure of x86 instructions, all the different operands that an
-instruction can have, their layout in bytes (for example, the `subop` and
-`r32` fields use the same bits, so an instruction can't have both; more on
-this below), the opcodes for supported instructions, and so on.
+You can emulate programs on an interpreter/VM for better error messages.
 
-While SubX syntax is fairly dumb, the error-checking is relatively smart. I
-try to provide clear error messages on instructions missing operands or having
-unexpected operands. Either case would otherwise cause instruction boundaries
-to diverge from what you expect, and potentially lead to errors far away. It's
-useful to catch such errors early.
+  ```sh
+  $ ./subx run examples/ex1  # on Linux or BSD or OS X
+  $ echo $?
+  42
+  ```
 
-Try running this example now:
+Emulated runs generate a trace that permits [time-travel debugging](https://github.com/akkartik/mu/blob/master/browse_trace/Readme.md).
 
-```sh
-$ ./subx translate examples/ex3.subx -o examples/ex3
-$ ./subx run examples/ex3
-$ echo $?
-55
-```
+  ```sh
+  $ ./subx --map translate examples/factorial.subx -o examples/factorial
+  $ ./subx --map --trace run examples/factorial
+  saving trace to 'last_run'
+  $ ../browse_trace/browse_trace last_run  # text-mode debugger UI
+  ```
 
-If you're on Linux you can also run it natively:
+You can write tests for your assembly programs. The entire stack is thoroughly
+covered by automated tests. SubX's tagline: tests before syntax.
 
-```sh
-$ ./examples/ex3
-$ echo $?
-55
-```
+  ```sh
+  $ ./subx test
+  $ ./subx run apps/factorial test
+  ```
 
-The rest of this Readme elaborates on the syntax for SubX programs, starting
-with a few prerequisites about the x86 instruction set.
+You can use it to learn about the x86 processor that (almost certainly) runs
+your computer. (See below.)
 
-## A quick tour of the x86 instruction set
+You can read its tiny zero-dependency internals and understand how they work.
+You can hack on it, and its thorough tests will raise the alarm when you break
+something.
 
-The [Intel processor manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf)
-is the final source of truth on the x86 instruction set, but it can be
-forbidding to make sense of, so here's a quick orientation. You will need
-familiarity with binary and hexadecimal encodings (starting with '0x') for
-numbers, and maybe a few other things. Email [me](mailto:mu@akkartik.com)
-any time if something isn't clear. I love explaining this stuff for as long as
-it takes.
+Eventually you will be able to program in higher-level notations. But you'll
+always have tests as guardrails and traces for inspecting runs. The entire
+stack will always be designed for others to comprehend. You'll always be
+empowered to understand how things work, and change what doesn't work for you.
+You'll always be expected to make small changes during upgrades.
 
-The x86 instructions SubX supports can take anywhere from 1 to 13 bytes. Early
-bytes affect what later bytes mean and where an instruction ends. Here's the
-big picture of a single x86 instruction from the Intel manual:
+## What it looks like
 
-<img alt='x86 instruction structure' src='../html/subx/encoding.png'>
+Here is the first example we ran above, a program that just returns 42:
 
-There's a lot here, so let's unpack it piece by piece:
+  ```sh
+  bb/copy-to-EBX  0x2a/imm32  # 42 in hex
+  b8/copy-to-EAX  1/imm32/exit
+  cd/syscall  0x80/imm8
+  ```
 
-* The prefix bytes are not used by SubX, so ignore them.
+Every line contains at most one instruction. Instructions consist of words
+separated by whitespace. Words may be _opcodes_ (defining the operation being
+performed) or _arguments_ (specifying the data the operation acts on). Any
+word can have extra _metadata_ attached to it after `/`. Some metadata is
+required (like the `/imm32` and `/imm8` above), but unrecognized metadata is
+silently skipped so you can attach comments to words (like the instruction
+name `/copy-to-EAX` above, or the `/exit` operand).
+
+SubX doesn't provide much syntax (there aren't even the usual mnemonics for
+opcodes), but it _does_ provide error-checking. If you miss an operand or
+accidentally add an extra operand you'll get a nice error. SubX won't arbitrarily
+interpret bytes of data as instructions or vice versa.
+
+So much for syntax. What do all these numbers actually _mean_? SubX supports a
+small subset of the 32-bit x86 instruction set that likely runs on your
+computer. (Think of the name as short for "sub-x86".) Instructions operate on
+a few registers:
+
+* 6 general-purpose 32-bit registers: EAX, EBX, ECX, EDX, ESI and EDI
+* 2 additional 32-bit registers: ESP and EBP, I suggest you only use these to
+  manage the call stack.
+* 3 bit-size _flag_ registers for conditional branching:
+  - zero/equal flag ZF
+  - sign flag SF
+  - overflow flag OF
+
+SubX programs consist of instructions like `89/copy`, `01/add`, `39/compare`
+and `52/push-ECX` which modify these registers as well as a byte-addressable
+memory. For a complete list of supported instructions, run `subx help opcodes`.
+
+(SubX doesn't support floating-point registers yet. Intel processors support
+an 8-bit mode, 16-bit mode and 64-bit mode. SubX will never support them.
+There are other flags. SubX will never support them. There are also _many_
+more instructions that SubX will never support.)
+
+It's worth distinguishing between an instruction's _operands_ and its _arguments_.
+Arguments are provided directly in instructions. Operands are pieces of data
+in register or memory that are operated on by instructions. Intel processors
+determine operands from arguments in fairly complex ways.
+
+## Lengthy interlude: How x86 instructions compute operands
 
-* The opcode bytes encode the instruction used. Ignore their internal structure;
-  we'll just treat them as a sequence of whole bytes. The opcode sequences
-  SubX recognizes are enumerated by running `subx help opcodes`. For more
-  details on a specific opcode, consult html guides like https://c9x.me/x86 or
-  the Intel manual.
+The [Intel processor manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf)
+is the final source of truth on the x86 instruction set, but it can be
+forbidding to make sense of, so here's a quick orientation. You will need
+familiarity with binary numbers, and maybe a few other things. Email [me](mailto:mu@akkartik.com)
+any time if something isn't clear. I love explaining this stuff for as long as
+it takes. The bad news is that it takes some getting used to. The good news is
+that internalizing the next 500 words will give you a significantly deeper
+understanding of your computer.
+
+Most instructions operate on an operand in register or memory ('reg/mem'), and
+a second operand in a register. The register operand is specified fairly
+directly using the `/r32` argument. The reg/mem operand, however, gets
+complex. It can be specified by 1-7 arguments, each ranging in size from 2
+bits to 4 bytes.
+
+The key argument that's always present for reg/mem operands is `/mod`, the
+_addressing mode_. This is a 2-bit argument that can take 4 possible values,
+and it determines what other arguments are required, and how to interpret
+them.
+
+* If `/mod` is `3`: the operand is the register described by the `/rm32` bits:
+    - 0 means register `EAX`
+    - 1 means register `ECX`
+    - 2 means register `EDX`
+    - 3 means register `EBX`
+    - 4 means register `ESP`
+    - 5 means register `EBP`
+    - 6 means register `ESI`
+    - 7 means register `EDI`
+
+* If `/mod` is `0`: the operand is the the address provided in the register
+  described by `/rm32`. That's `*rm32` in C syntax.
+
+* If `/mod` is `1`: the operand is the address provided by adding the register
+  in `/rm32` with the (1-byte) displacement. That's `*(rm32 + disp8)` in C
+  syntax.
+
+* If `/mod` is `2`: the operand is the address provided by adding the register
+  in `r/m` with the (4-byte) displacement. That's `*(r/m + disp32)` in C
+  syntax.
+
+In the last three cases, one exception occurs when the `/rm32` argument
+contains `4`. Rather than encoding register `ESP`, it means the address is
+provided by three _whole new_ arguments (`/base`, `/index` and `/scale`) in a
+_totally_ different way:
 
-* The addressing mode byte is used by all instructions that take an `rm32`
-  operand according to `subx help opcodes`. (That's most instructions.) The
-  `rm32` operand expresses how an instruction should load one 32-bit operand
-  from either a register or memory. It is configured by the addressing mode
-  byte and, optionally, the SIB (scale, index, base) byte as follows:
+  ```
+  reg/mem = *(base + index * 2^scale)
+  ```
 
-  - if the `mod` (mode) field is `11` (3): the `rm32` operand is the contents
-    of the register described by the `r/m` bits.
-    - `000` (0) means register `EAX`
-    - `001` (1) means register `ECX`
-    - `010` (2) means register `EDX`
-    - `011` (3) means register `EBX`
-    - `100` (4) means register `ESP`
-    - `101` (5) means register `EBP`
-    - `110` (6) means register `ESI`
-    - `111` (7) means register `EDI`
+(There are a couple more exceptions ☹; see [Table 2-2](modrm.pdf) and [Table 2-3](sib.pdf)
+of the Intel manual for the complete story.)
 
-  - if `mod` is `00` (0): `rm32` is the contents of the address provided in the
-    register provided by `r/m`. That's `*r/m` in C syntax.
+Phew, that was a lot to take in. Some examples to work through as you reread
+and digest it:
 
-  - if `mod` is `01` (1): `rm32` is the contents of the address provided by
-    adding the register in `r/m` with the (1-byte) displacement. That's
-    `*(r/m + disp8)` in C syntax.
+1. To read directly from the EAX register, `/mod` must be `3` (direct mode),
+   and `/rm32` must be `0`. There must be no `/base`, `/index` or `/scale`
+   arguments.
 
-  - if `mod` is `10` (2): `rm32` is the contents of the address provided by
-    adding the register in `r/m` with the (4-byte) displacement. That's
-    `*(r/m + disp32)` in C syntax.
+1. To read from `*EAX` (in C syntax), `/mod` must be `0` (indirect mode), and
+   the `/rm32` argument must be `0`. There must be no `/base`, `/index` or
+   `/scale` arguments.
 
-  In the last 3 cases, one exception occurs when the `r/m` field contains
-  `010` (4). Rather than encoding register ESP, that means the address is
-  provided by a SIB byte next:
+1. To read from `*(EAX+4)`, `/mod` must be `1` (indirect + disp8 mode),
+   `/rm32` must be `0`, there must be no SIB byte, and there must be a single
+   displacement byte containing `4`.
 
-  ```
-  base + index * 2^scale + displacement
-  ```
+1. To read from `*(EAX+ECX+4)`, one approach would be to set `/mod` to `1` as
+   above, `/rm32` to `4` (SIB byte next), `/base` to `0`, `/index` to `1`
+   (ECX) and a single displacement byte to `4`. (What should the `scale` bits
+   be? Can you think of another approach?)
 
-  (There are a couple more exceptions ☹; see [Table 2-2](modrm.pdf) and [Table 2-3](sib.pdf)
-  of the Intel manual for the complete story.)
+1. To read from `*(EAX+ECX+1000)`, one approach would be:
+   - `mod`: `2` (indirect + disp32)
+   - `r/m`: `4` (`/base`, `/index` and `/scale` arguments required)
+   - `base`: `0` (EAX)
+   - `index`: `1` (ECX)
+   - `displacement`: 4 bytes containing `1000`
 
-  Phew, that was a lot to take in. Some examples to work through as you reread
-  and digest it:
+## Putting it all together
 
-  1. To read directly from the EAX register, `mod` must be `11` (direct mode),
-     and the `r/m` bits must be `000` (EAX). There must be no SIB byte.
+Here's a more meaty example:
 
-  1. To read from `*EAX` in C syntax, `mod` must be `00` (indirect mode), and
-     the `r/m` bits must be `000`. There must be no SIB byte.
+<img alt='examples/ex3.subx' src='../html/subx/ex3.png'>
 
-  1. To read from `*(EAX+4)`, `mod` must be `01` (indirect + disp8 mode),
-     `r/m` must be `000`, there must be no SIB byte, and there must be a
-     single displacement byte containing `00000010` (4).
+This program sums the first 10 natural numbers. By convention I use horizontal
+tabstops to help read instructions, dots to help follow the long lines,
+comments before groups of instructions to describe their high-level purpose,
+and comments at the end of complex instructions to state the low-level
+operation they perform. Numbers are always in hexadecimal (base 16); the '0x'
+prefix is optional, and I tend to include it as a reminder when numbers look
+like decimal numbers or words.
 
-  1. To read from `*(EAX+ECX+4)`, one approach would be to set `mod` to `01`,
-     `r/m` to `100` (SIB byte next), `base` to `000`, `index` to `001` (ECX)
-     and a single displacement byte to 4. What should the `scale` bits be? Can
-     you think of another approach?
+Try running this example now:
 
-  1. To read from `*(EAX+ECX+0x00f00000)`, one approach would be:
-     - `mod`: `10` (indirect + disp32)
-     - `r/m`: `100` (SIB byte)
-     - `base`: `000` (EAX)
-     - `index`: `001` (ECX)
-     - `displacement`: 4 bytes containing 0x00f00000
+```sh
+$ ./subx translate examples/ex3.subx -o examples/ex3
+$ ./subx run examples/ex3
+$ echo $?
+55
+```
 
-* Back to the instruction picture. We've already covered the SIB byte and most
-  of the addressing mode byte. Instructions can also provide a second operand
-  as either a displacement or immediate value (the two are distinct because
-  some instructions use a displacement as part of `rm32` and an immediate for
-  the other operand).
+If you're on Linux you can also run it natively:
 
-* Finally, the `reg` bits in the addressing mode byte can also encode the
-  second operand. Sometimes they can also be part of the opcode bits. For
-  example, an operand byte of `ff` and `reg` bits of `001` means "increment
-  rm32". (Notice that instructions that use the `reg` bits as a "sub-opcode"
-  cannot also use it as a second operand.)
+```sh
+$ ./examples/ex3
+$ echo $?
+55
+```
 
-That concludes our quick tour. By this point it's probably clear to you that
-the x86 instruction set is overly complicated. Many simpler instruction sets
-exist. However, your computer right now likely runs x86 instructions and not
-them. Internalizing the last 750 words may allow you to program your computer
-fairly directly, with only minimal-going-on-zero reliance on a C compiler.
+Use it now to follow along for a more complete tour of SubX syntax.
 
 ## The syntax of SubX programs
 
@@ -299,23 +252,13 @@ Within the `code` segment, each line contains a comment, label or instruction.
 Comments start with a `#` and are ignored. Labels should always be the first
 word on a line, and they end with a `:`.
 
-Instructions consist of a sequence of words. As mentioned above, each word can
-contain _metadata_ after a `/`. Metadata can be either required by SubX or act
-as a comment for the reader; SubX silently ignores unrecognized metadata. A
-single word can contain multiple pieces of metadata, each starting with a `/`.
-
-The words in an instruction consist of 1-3 opcode bytes, and different kinds
-of operands corresponding to the bitfields in an x86 instruction listed above.
-For error checking, these operands must be tagged with one of the following
-bits of metadata:
-  - `mod`
-  - `rm32` ("r/m" in the x86 instruction diagram above, but we can't use `/`
-    in metadata tags)
-  - `r32` ("reg" in the x86 diagram)
-  - `subop` (for when "reg" in the x86 diagram encodes a sub-opcode rather
-    than an operand)
-  - displacement: `disp8`, `disp16` or `disp32`
-  - immediate: `imm8` or `imm32`
+Instruction arguments must specify their type, from:
+  - `/mod`
+  - `/rm32`
+  - `/r32`
+  - `/subop` (sometimes the `/r32` bits in an instruction are used as an extra opcode)
+  - displacement: `/disp8` or `/disp32`
+  - immediate: `/imm8` or `/imm32`
 
 Different instructions (opcodes) require different operands. SubX will
 validate each instruction in your programs, and raise an error anytime you
@@ -324,10 +267,11 @@ miss or spuriously add an operand.
 I recommend you order operands consistently in your programs. SubX allows
 operands in any order, but only because that's simplest to explain/implement.
 Switching order from instruction to instruction is likely to add to the
-reader's burden. Here's the order I've been using:
+reader's burden. Here's the order I've been using after opcodes:
 
 ```
-/subop  /mod /rm32  /base /index /scale  /r32  /displacement  /immediate
+        |<--------- reg/mem --------->|        |<- reg/mem? ->|
+/subop  /mod /rm32  /base /index /scale  /r32   /displacement   /immediate
 ```
 
 Instructions can refer to labels in displacement or immediate operands, and
@@ -371,6 +315,29 @@ translating the corresponding `.subx` file. The binary should also be natively
 runnable on a Linux system running on Intel x86 processors, either 32- or
 64-bit. If either of these invariants is broken it's a bug on my part.
 
+## Roadmap and status
+
+* Bootstrapping a SubX-\>ELF translator in SubX
+
+  1. [Converting ascii hex bytes to binary.](http://akkartik.github.io/mu/html/subx/apps/hex.subx.html) (✓)
+  2. [Packing bitfields for x86 instructions into bytes.](http://akkartik.github.io/mu/html/subx/apps/pack.subx.html) (✓)
+  3. [Combining segments with the same name.](apps/assort.subx) (30% complete)
+  4. Replacing addresses with labels.
+  5. Support for string literals.
+
+* Testable, dependency-injected vocabulary of primitives
+  - Streams: `read()`, `write()`. (✓)
+  - `exit()` (✓)
+  - Sockets
+  - Files
+  - Concurrency, and a framework for testing blocking code
+
+* Using the trace in [white-box tests](https://git.sr.ht/~akkartik/basic-whitebox-test/tree/master/Readme.md)
+  for performance, fault tolerance, etc.
+
+* Higher-level notations. Like programming languages, but with thinner
+  implementations that you can -- and are expected to! -- modify.
+
 ## Running
 
 Running `subx` will transparently compile it as necessary.
@@ -494,6 +461,9 @@ rudimentary but hopefully still workable toolkit:
   layer. It makes the trace a lot more verbose and a lot less dense, necessitating
   a lot more scrolling around, so I keep it turned off most of the time.
 
+* If the trace seems overwhelming, try [browsing it](https://github.com/akkartik/mu/blob/master/browse_trace/Readme.md)
+  in the 'time-travel debugger'.
+
 Hopefully these hints are enough to get you started. The main thing to
 remember is to not be afraid of modifying the sources. A good debugging
 session gets into a nice rhythm of generating a trace, staring at it for a