about summary refs log tree commit diff stats
diff options
context:
space:
mode:
authorKartik Agaram <vc@akkartik.com>2018-07-24 16:06:43 -0700
committerKartik Agaram <vc@akkartik.com>2018-07-24 16:06:43 -0700
commitd57d39cb3f867d6e33ffa8e4cdc7d0ca749cb75c (patch)
tree1800345f57b37202c18218a78b1bb21496460a0f
parent1faa9790960043f5fbad482cd4b2cf4a6c50efd8 (diff)
downloadmu-d57d39cb3f867d6e33ffa8e4cdc7d0ca749cb75c.tar.gz
4395
-rw-r--r--subx/010vm.cc6
-rw-r--r--subx/011parse.cc3
-rw-r--r--subx/021translate.cc26
-rw-r--r--subx/022check_instruction.cc39
4 files changed, 64 insertions, 10 deletions
diff --git a/subx/010vm.cc b/subx/010vm.cc
index 81a8c12b..d2f39e90 100644
--- a/subx/010vm.cc
+++ b/subx/010vm.cc
@@ -1,4 +1,8 @@
 //: Core data structures for simulating the SubX VM (subset of an x86 processor)
+//:
+//: At the lowest level ("level 1") of abstraction, SubX executes x86
+//: instructions provided in the form of an array of bytes, loaded into memory
+//: starting at a specific address.
 
 //:: registers
 //: assume segment registers are hard-coded to 0
@@ -32,7 +36,7 @@ cerr << "  registers\n";
 :(before "End Help Texts")
 put(Help, "registers",
   "SubX currently supports eight 32-bit integer registers: R0 to R7.\n"
-  "R4 contains the top of the stack.\n"
+  "R4 (ESP) contains the top of the stack.\n"
   "There's also a register for the address of the currently executing instruction. It is modified by jumps.\n"
   "Various instructions modify one or more of three 1-bit 'flag' registers, as a side-effect:\n"
   "- the sign flag (SF): usually set if an arithmetic result is negative, or reset if not.\n"
diff --git a/subx/011parse.cc b/subx/011parse.cc
index b37eb1cf..189f369d 100644
--- a/subx/011parse.cc
+++ b/subx/011parse.cc
@@ -10,6 +10,9 @@ put(Help, "syntax",
   "A good rule of thumb is to try to start the first segment at the default address of 0x08048000, and to start each subsequent segment at least 0x1000 (most common page size) bytes after the last.\n"
   "If a segment occupies than 0x1000 bytes you'll need to push subsequent segments further down.\n"
   "Currently only the first segment contains executable code (because it gets annoying to have to change addresses in later segments every time an earlier one changes length; one of those finicky requirements).\n"
+  "\n"
+  "Lines consist of a series of words. Words can contain arbitrary metadata after a '/', but they can never contain whitespace. Metadata has no effect at runtime, but can be handy when rewriting macros.\n"
+  "\n"
   "Check out some examples in this directory (ex*.subx)\n"
   "Programming in machine code can be annoying, but let's see if we can make it nice enough to be able to write a compiler in it.\n"
 );
diff --git a/subx/021translate.cc b/subx/021translate.cc
index 630681f5..bae946ec 100644
--- a/subx/021translate.cc
+++ b/subx/021translate.cc
@@ -1,11 +1,23 @@
-//: Beginnings of a nicer way to build SubX programs.
-//: We're going to question every notion, including "Assembly language" and
-//: "compiler".
-//: Motto: Abstract nothing, check everything.
+//: The bedrock level 1 of abstraction is now done, and we're going to start
+//: building levels above it that make programming in x86 machine code a
+//: little more ergonomic.
 //:
-//: Workflow: read 'source' file. Run a series of transforms on it, each
-//: passing through what it doesn't understand. The final program should be
-//: just machine code, suitable to write to an ELF binary.
+//: Higher levels will be in later layers. Since we can stop at any layer, we
+//: can execute levels from bedrock up to any level.
+//:
+//: All levels will be "pass through by default". Whatever they don't
+//: understand they will silently pass through to lower levels.
+//:
+//: Since raw hex bytes of machine code are always possible to inject, SubX is
+//: not a language, and we aren't building a compiler. This is something
+//: deliberately leakier. Levels are more for improving auditing, checks and
+//: error messages rather than for hiding low-level details.
+
+//: Translator workflow: read 'source' file. Run a series of transforms on it,
+//: each passing through what it doesn't understand. The final program should
+//: be just machine code, suitable to write to an ELF binary.
+//:
+//: Higher levels usually transform code on the basis of metadata.
 
 :(before "End Main")
 if (is_equal(argv[1], "translate")) {
diff --git a/subx/022check_instruction.cc b/subx/022check_instruction.cc
index 0544b168..5fd4760c 100644
--- a/subx/022check_instruction.cc
+++ b/subx/022check_instruction.cc
@@ -1,9 +1,44 @@
-//: Catch instructions with the wrong size or type (metadata) of operands.
+//: Beginning of "level 2": tagging bytes with metadata around what field of
+//: an x86 instruction they're for.
+//:
+//: The x86 instruction set is variable-length, and how a byte is interpreted
+//: affects later instruction boundaries. A lot of the pain in programming machine code
+//: stems from computer and programmer going out of sync on what a byte
+//: means. The miscommunication is usually not immediately caught, and
+//: metastasizes at runtime into kilobytes of misinterpreted instructions.
+//: Tagging bytes with what the programmer expects them to be interpreted as
+//: helps the computer catch miscommunication immediately.
+//:
+//: This is one way SubX is going to be different from a 'language': we
+//: typically think of languages as less verbose than machine code. Here we're
+//: making machine code *more* verbose.
+//:
+//: ---
+//:
+//: While we're here, we'll also improve a couple of other things:
+//:
+//: a) Machine code often packs logically separate operands into bitfields of
+//: a single byte. We'll start writing out each operand separately, and the
+//: translator will construct the right bytes out of operands.
+//:
+//: SubX now gets still more verbose. What used to be a single byte, say 'c3',
+//: can now expand to '3/mod 0/subop 3/rm32'.
+//:
+//: b) Since each operand is tagged, we can loosen ordering restrictions and
+//: allow writing out the operands in any order, like keyword arguments.
+//:
+//: c) Operand values can be expressed in either decimal or hex (when prefixed
+//: with '0x'. Raw 2-character hex bytes without the '0x' are only valid when
+//: tagged without any operand metadata. (This may be a bad idea.)
+//:
+//: Coda: the actual opcodes (1-3 bytes) will continue to be at the start of
+//: each line, in hex, and untagged. The x86 instruction set is a mess, and
+//: instructions don't admit good names.
 
 :(before "End Help Texts")
 put(Help, "instructions",
   "Each x86 instruction consists of an instruction or opcode and some number of operands.\n"
-  "Each operand has a type. An instruction won't have more than one of any type.\n"
+  "Each operand has a type. An instruction won't have more than one operand of any type.\n"
   "Each instruction has some set of allowed operand types. It'll reject others.\n"
   "The complete list of operand types: mod, subop, r32 (register), rm32 (register or memory), scale, index, base, disp8, disp16, disp32, imm8, imm32.\n"
   "Each of these has its own help page. Try reading 'subx help mod' next.\n"