about summary refs log tree commit diff stats
path: root/subx
diff options
context:
space:
mode:
authorKartik Agaram <vc@akkartik.com>2018-09-30 22:29:43 -0700
committerKartik Agaram <vc@akkartik.com>2018-09-30 22:29:43 -0700
commit2e693f723de721fff35f74ef1194132d48222615 (patch)
tree2f98be9274e2aebaf41faf0ad9d5955e659ebc9d /subx
parentb515f3b70417c502991e3c6443badf32cded6a20 (diff)
downloadmu-2e693f723de721fff35f74ef1194132d48222615.tar.gz
4623
Diffstat (limited to 'subx')
-rw-r--r--subx/Readme.md162
-rw-r--r--subx/html/ex1.pngbin140501 -> 0 bytes
2 files changed, 89 insertions, 73 deletions
diff --git a/subx/Readme.md b/subx/Readme.md
index b909f20c..67a7e5e1 100644
--- a/subx/Readme.md
+++ b/subx/Readme.md
@@ -8,8 +8,7 @@ C++ compiler and runtime.)
 
 ## Thin layer of abstraction over machine code, isn't that just an assembler?
 
-Assemblers try to hide the precise instructions emitted from the programmer.
-Consider these instructions in Assembly language:
+Compare some code in Assembly:
 
 ```
 add EBX, ECX
@@ -17,70 +16,84 @@ copy EBX, 0
 copy ECX, 1
 ```
 
-Here are the same instructions in SubX, just a list of numbers (opcodes and
-operands) with metadata 'comments' after a `/`:
+..with the same instructions in SubX:
 
 ```
 01/add 3/mod/direct 3/rm32/ebx 1/r32/ecx
-bb/copy 0/imm32
-b9/copy 1/imm32
+bb/copy-EBX 0/imm32
+b9/copy-ECX 1/imm32
 ```
 
-Notice that a single instruction, say 'copy', maps to multiple opcodes.
-That's just the tip of the iceberg of complexity that Assembly languages deal
-with.
-
-SubX doesn't shield the programmer from these details. Words always contain
-the actual bits or bytes for machine code. But they also can contain metadata
-after slashes, and SubX will run cross-checks and give good error messages
-when there's a discrepancy between code and metadata.
-
-## But why not use an assembler?
-
-The long-term goal is to make programming in machine language ergonomic enough
-that I (or someone else) can build a compiler for a high-level language in it.
-That is, building a compiler without needing a compiler, anywhere among its
-prerequisites.
-
-Assemblers today are complex enough that they're built in a high-level
-language, and need a compiler to build. They also tend to be designed to fit
-into a larger toolchain, to be a back-end for a compiler. Their output is in
-turn often passed to other tools like a linker. The formats that all these
-tools use to talk to each other have grown increasingly complex in the face of
-decades of evolution, usage and backwards-compatibility constraints. All these
-considerations add to the burden of the assembler developer. Building the
-assembler in a high-level language helps face up to them.
-
-Assemblers _do_ often accept a far simpler language, just a file format
-really, variously called 'flat' or 'binary', which gives the programmer
-complete control over the precise bytes in an executable. SubX is basically
-trying to be a more ergonomic flat assembler that will one day be bootstrapped
-from machine code.
-
-## Why in the world?
-
-1. It seems wrong-headed that our computers look polished but are plagued by
-   foundational problems of security and reliability. I'd like to learn to
-   walk before I try to run. The plan: start out using the computer only to
-   check my program for errors rather than to hide low-level details. Force
-   myself to think about security by living with raw machine code for a while.
-   Reintroduce high level languages (HLLs) only after confidence is regained
-   in the foundations (and when the foundations are ergonomic enough to
-   support developing a compiler in them). Delegate only when I can verify
-   with confidence.
-
-2. The software in our computers has grown incomprehensible. Nobody
-   understands it all, not even experts. Even simple programs written by a
-   single author require lots of time for others to comprehend. Compilers are
-   a prime example, growing so complex that programmers have to choose to
-   either program them or use them. I think they may also contribute to the
-   incomprehensibility of the stack above them. I'd like to explore how much
-   of a HLL I can build without a monolithic optimizing compiler, and see if
-   deconstructing the work of the compiler can make the stack as a whole more
-   comprehensible to others.
-
-3. I want to learn about the internals of the infrastructure we all rely on in
-   our lives.
+Assembly is pretty low-level, but SubX makes Assembly look like the gleaming
+chrome of the Starship Enterprise. Opcodes for instructions are explicit, as
+are addressing modes and the precise bit fields used to encode them. There is
+no portability. Only a subset of x86 is supported, so there's no backwards
+compatibility either, zero interoperability with existing libraries. Only
+statically linked libraries are supported, so the kernel will inefficiently
+juggle multiple copies of the same libraries in RAM.
+
+In exchange for these drawbacks, SubX will hopefully be simpler to implement.
+Ideally in itself.
+
+I'm also hoping that SubX will be simpler to program in, that it will fit a
+programmer's head better in spite of the lack of syntax. Modern Assembly
+supports 50+ years of accretions in the x86 ISA and 40+ years of accumulated
+cruft in the toolchain (standard library, ELF format, binutils, linker,
+loader).
+
+You may say I just don't understand the toolchain well enough. And that's the
+point. I tried, and I failed. Each package above has only a piece of the
+puzzle. Learning each of the above tools takes time; figuring out how they all
+work together is not a well-supported activity.
+
+My hypothesis is that _it's easier to understand a coherent system written in
+machine code than an incoherent system in a high-level language._ To test this
+hypothesis, I plan to take a hatchet to [anything I don't understand](https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence),
+but to take full ownership of what's left. Not just how it runs, but the
+experience of programming with it. A few basic mechanisms can hopefully be put
+together into a more self-explanatory system:
+
+a) Metadata. In the example above, words after a slash (`/`) act as metadata
+that doesn't make it into the final binary. Metadata can act as comments for
+readers, or as directives for tools operating on SubX code. Programmers will
+be encouraged to create new tools of their own.
+
+b) Checks. While SubX doesn't provide syntax, it tries to provide good
+guardrails for invalid programs. Metadata specifies which field of an instruction
+each operand is intended for. Missing operands are caught before they can
+silently mislead instruction decoding. Instructions with unexpected operand
+types are immediately flagged. SubX includes an emulator for a subset of x86,
+which provides better error messages than native execution for certain kinds
+of bad binaries.
+
+c) A test harness. SubX includes automated tests from the start, and the
+entire stack is designed to be easy to test. We will provide wrappers for OS
+syscalls that allow fakes to be _dependency-injected_ in, expanding the kinds
+of tests that can be written. See [the earlier Mu interpreter](https://github.com/akkartik/mu#readme)
+for more details.
+
+d) Traces of execution. Writing good error messages for a compiler is a hard
+problem, and it can add complexity. We'd like to keep things ergonomic with a
+minimum of code, so we will provide a _trace browser_ that allows programmers
+to scan the trace of events emitted by SubX leading up to an error message,
+drilling down into details as needed. Traces will also be available in tests,
+enabling testing for cross-cutting concerns like performance, race conditions,
+precise error messages displayed on screen, and so on. The effect is again to
+expand the kinds of tests that can be written. [More details.](http://akkartik.name/about)
+
+e) Incremental construction. SubX programs are translated into monolithic ELF
+binaries, but you will be able to build just a subset of their code (denominated
+in _layers_), and get a running program that passes all its automated tests.
+[More details.](https://akkartik.name/post/wart-layers)
+
+It seems wrong-headed that our computers look polished but are plagued by
+foundational problems of security and reliability. I'd like to learn to walk
+before I try to run. The plan: start out using the computer only to check my
+program for errors rather than to hide low-level details. Force myself to
+think about security by living with raw machine code for a while. Reintroduce
+high level languages (HLLs) only after confidence is regained in the foundations
+(and when the foundations are ergonomic enough to support developing a
+compiler in them). Delegate only when I can verify with confidence.
 
 ## Running
 
@@ -107,31 +120,34 @@ Running `subx` will transparently compile it as necessary.
 
 Putting them together, build and run one of the example programs:
 
-<img alt='examples/ex1.1.subx' src='html/ex1.png'>
+<img alt='apps/factorial.subx' src='../html/subx/factorial.png'>
 
 ```
-$ ./subx translate examples/ex1.1.subx examples/ex1
-$ ./subx run examples/ex1
+$ ./subx translate apps/factorial.subx apps/factorial
+$ ./subx run apps/factorial  # returns the factorial of 5
+$ echo $?
+120  
 ```
 
-If you're running on Linux, `ex1` will also be runnable directly:
+If you're running on Linux, `factorial` will also be runnable directly:
 ```
-$ examples/ex1
+$ apps/factorial
 ```
 
-There are a few such example programs in the examples/ directory. At any
-commit an example's binary should be identical bit for bit with the output of
-translating the .subx file. The binary should also be natively runnable on a
-32-bit Linux system. If either of these invariants is broken it's a bug on my
-part. The binary should also be runnable on a 64-bit Linux system. I can't
-guarantee it, but I'd appreciate hearing if it doesn't run.
+The `examples/` directory shows some simpler programs giving a more gradual
+introduction to SubX features. The repo includes the binary for all examples.
+At any commit an example's binary should be identical bit for bit with the
+result of translating the .subx file. The binary should also be natively
+runnable on a 32-bit Linux system. If either of these invariants is broken
+it's a bug on my part. The binary should also be runnable on a 64-bit Linux
+system. I can't guarantee it, but I'd appreciate hearing if it doesn't run.
 
 However, not all 32-bit Linux binaries are guaranteed to be runnable by
 `subx`. I'm not building general infrastructure here for all of the x86 ISA
 and ELF format. SubX is about programming with a small, regular subset of
 32-bit x86:
 
-* Only instructions that operate on the 32-bit E\*X registers. (No
+* Only instructions that operate on the 32-bit integer E\*X registers. (No
   floating-point yet.)
 * Only instructions that assume a flat address space; no instructions that use
   segment registers.
diff --git a/subx/html/ex1.png b/subx/html/ex1.png
deleted file mode 100644
index c491c471..00000000
--- a/subx/html/ex1.png
+++ /dev/null
Binary files differ