about summary refs log tree commit diff stats
path: root/subx/Readme.md
blob: 1f24ddc1eef7fc54688058fde1aad9dd90ef3571 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
## What is this? 

A suite of tools for directly programming in (32-bit x86) machine code without
a compiler. The generated ELF binaries require just a Unix-like kernel to run.
(It isn't self-hosted yet, so generating the binaries requires a C++ compiler
and runtime.)

## Why in the world?

1. It seems wrong-headed that our computers look polished but are plagued by
   foundational problems of security and reliability. I'd like to learn to
   walk before I try to run. The plan: start out using the computer only to
   check my program for errors rather than to hide low-level details. Force
   myself to think about security by living with raw machine code for a while.
   Reintroduce high level languages (HLLs) only after confidence is regained
   in the foundations (and when the foundations are ergonomic enough to
   support developing a compiler in them). Delegate only when I can verify
   with confidence.

2. The software in our computers has grown incomprehensible. Nobody
   understands it all, not even experts. Even simple programs written by a
   single author require lots of time for others to comprehend. Compilers are
   a prime example, growing so complex that programmers have to choose to
   either program them or use them. I think they may also contribute to the
   incomprehensibility of the stack above them. I'd like to explore how much
   of a HLL I can build without a monolithic optimizing compiler, and see if
   deconstructing the work of the compiler can make the stack as a whole more
   comprehensible to others.

3. I want to learn about the internals of the infrastructure we all rely on in
   our lives.

## Running

```
$ git clone https://github.com/akkartik/mu
$ cd mu/subx
$ ./subx
```

Running `subx` will transparently compile it as necessary.

## Usage

`subx` currently has the following sub-commands:

* `subx test`: runs all automated tests.

* `subx translate <input file> <output ELF binary>`: translates a text file
  containing hex bytes and macros into an executable ELF binary.

* `subx run <ELF binary>`: simulates running the ELF binaries emitted by `subx
  translate`. Useful for debugging, and also enables more thorough testing of
  `translate`.

Putting them together, build and run one of the example programs:

<img alt='ex1.1.subx' src='html/ex1.png'>

```
$ ./subx translate ex1.1.subx ex1
$ ./subx run ex1
```

If you're running on Linux, `ex1` will also be runnable directly:
```
$ chmod +x ex1
$ ./ex1
```

There are a few such example programs here. At any commit an example's binary
should be identical bit for bit with the output of translating the .subx file.
The binary should also be natively runnable on a 32-bit Linux system. If
either of these invariants is broken it's a bug on my part. The binary should
also be runnable on a 64-bit Linux system. I can't guarantee it, but I'd
appreciate hearing if it doesn't run.

However, there are a few more binaries in the teensy/ directory. They are not
guaranteed to be runnable by `subx`. I'm not building general infrastructure
here for all of the x86 ISA and ELF format. SubX is about programming with a
small, regular subset of 32-bit x86:

* Only instructions that operate on the 32-bit E\*X registers. (No
  floating-point yet.)
* Only instructions that assume a flat address space; no instructions that use
  segment registers.
* No instructions that check the carry or parity flags; arithmetic operations
  always operate on signed integers (while bitwise operations always operate
  on unsigned integers)
* Only relative jump instructions (with 8-bit or 16-bit offsets).

The ELF binaries generated are statically linked and missing a lot of advanced
ELF features as well. But they will run.

For more details on programming in this subset, consult the online help:
```
$ ./subx help
```

## Resources

* [Single-page cheatsheet for the x86 ISA](https://net.cs.uni-bonn.de/fileadmin/user_upload/plohmann/x86_opcode_structure_and_instruction_overview.pdf)
  (pdf; [cached local copy](https://github.com/akkartik/mu/blob/master/subx/cheatsheet.pdf))
* [Concise reference for the x86 ISA](https://c9x.me/x86)
* [Intel programming manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf) (pdf)

## Inspirations

* [&ldquo;Creating tiny ELF executables&rdquo;](https://www.muppetlabs.com/~breadbox/software/tiny/teensy.html)
* [&ldquo;Bootstrapping a compiler from nothing&rdquo;](http://web.archive.org/web/20061108010907/http://www.rano.org/bcompiler.html)
* Forth implementations like [StoneKnifeForth](https://github.com/kragen/stoneknifeforth)