about summary refs log tree commit diff stats
diff options
context:
space:
mode:
authorKartik Agaram <vc@akkartik.com>2020-01-02 01:28:24 -0800
committerKartik Agaram <vc@akkartik.com>2020-01-02 01:28:24 -0800
commitd02aa9ac0b9e1130ffcd5a27aa1304e80eee08d9 (patch)
treebf592fba4275002cbbde420cef8c806bb9b1b45f
parent01013f2ad2132dd945c6ceb168b85dc52e18882c (diff)
downloadmu-d02aa9ac0b9e1130ffcd5a27aa1304e80eee08d9.tar.gz
5863
Just clarified for myself why `subx translate` and `subx run` need to share
code: emulation supports the tests first and foremost.

In the process we clean up our architecture for levels of layers. It's
a good idea but unused once we reconceive of "level 1" as just part of
the test harness.
-rw-r--r--010---vm.cc10
-rw-r--r--011run.cc20
-rw-r--r--030---translate.cc21
-rw-r--r--031transforms.cc69
-rw-r--r--032---operands.cc7
5 files changed, 25 insertions, 102 deletions
diff --git a/010---vm.cc b/010---vm.cc
index 6675cab9..d67667f7 100644
--- a/010---vm.cc
+++ b/010---vm.cc
@@ -1,11 +1,5 @@
-//: Core data structures for simulating the SubX VM (subset of an x86 processor)
-//:
-//: At the lowest level ("level 1") of abstraction, SubX executes x86
-//: instructions provided in the form of an array of bytes, loaded into memory
-//: starting at a specific address.
-//:
-//: SubX is fundamentally a translator. But having a VM to execute its
-//: translations affords greater confidence in it.
+//: Core data structures for simulating the SubX VM (subset of an x86 processor),
+//: either in tests or debug aids.
 
 //:: registers
 //: assume segment registers are hard-coded to 0
diff --git a/011run.cc b/011run.cc
index 585f9930..e4194687 100644
--- a/011run.cc
+++ b/011run.cc
@@ -78,15 +78,14 @@ void test_copy_imm32_to_EAX() {
   );
 }
 
-// top-level helper for scenarios: parse the input, transform any macros, load
-// the final hex bytes into memory, run it
+// top-level helper for tests: parse the input, load the hex bytes into memory, run
 void run(const string& text_bytes) {
   program p;
   istringstream in(text_bytes);
+  // Loading Test Program
   parse(in, p);
   if (trace_contains_errors()) return;  // if any stage raises errors, stop immediately
-  transform(p);
-  if (trace_contains_errors()) return;
+  // Running Test Program
   load(p);
   if (trace_contains_errors()) return;
   // convenience to keep tests concise: 'Entry' label need not be provided
@@ -244,19 +243,6 @@ void test_detect_duplicate_segments() {
   );
 }
 
-//:: transform
-
-:(before "End Types")
-typedef void (*transform_fn)(program&);
-:(before "End Globals")
-vector<transform_fn> Transform;
-
-:(code)
-void transform(program& p) {
-  for (int t = 0;  t < SIZE(Transform);  ++t)
-    (*Transform.at(t))(p);
-}
-
 //:: load
 
 void load(const program& p) {
diff --git a/030---translate.cc b/030---translate.cc
index 9737834e..b950fce7 100644
--- a/030---translate.cc
+++ b/030---translate.cc
@@ -1,20 +1,9 @@
-//: The bedrock level 1 of abstraction is now done, and we're going to start
-//: building levels above it that make programming in x86 machine code a
-//: little more ergonomic.
-//:
-//: All levels will be "pass through by default". Whatever they don't
-//: understand they will silently pass through to lower levels.
-//:
-//: Since raw hex bytes of machine code are always possible to inject, SubX is
-//: not a language, and we aren't building a compiler. This is something
-//: deliberately leakier. Levels are more for improving auditing, checks and
-//: error messages rather than for hiding low-level details.
+//: After that lengthy prelude to define an x86 emulator, we are now ready to
+//: start translating SubX notation.
 
 //: Translator workflow: read 'source' file. Run a series of transforms on it,
 //: each passing through what it doesn't understand. The final program should
-//: be just machine code, suitable to write to an ELF binary.
-//:
-//: Higher levels usually transform code on the basis of metadata.
+//: be just machine code, suitable to emulate, or to write to an ELF binary.
 
 :(before "End Main")
 if (is_equal(argv[1], "translate")) {
@@ -69,6 +58,10 @@ if (is_equal(argv[1], "translate")) {
 }
 
 :(code)
+void transform(program& p) {
+  // End transform(program& p)
+}
+
 void print_translate_usage() {
   cerr << "Usage: subx translate file1 file2 ... -o output\n";
 }
diff --git a/031transforms.cc b/031transforms.cc
index a6e12502..5f13b697 100644
--- a/031transforms.cc
+++ b/031transforms.cc
@@ -1,64 +1,11 @@
-//: Ordering transforms is a well-known hard problem when building compilers.
-//: In our case we also have the additional notion of layers. The ordering of
-//: layers can have nothing in common with the ordering of transforms when
-//: SubX is tangled and run. This can be confusing for readers, particularly
-//: if later layers start inserting transforms at arbitrary points between
-//: transforms introduced earlier. Over time adding transforms can get harder
-//: and harder, having to meet the constraints of everything that's come
-//: before. It's worth thinking about organization up-front so the ordering is
-//: easy to hold in our heads, and it's obvious where to add a new transform.
-//: Some constraints:
-//:
-//:   1. Layers force us to build SubX bottom-up; since we want to be able to
-//:   build and run SubX after stopping loading at any layer, the overall
-//:   organization has to be to introduce primitives before we start using
-//:   them.
-//:
-//:   2. Transforms usually need to be run top-down, converting high-level
-//:   representations to low-level ones so that low-level layers can be
-//:   oblivious to them.
-//:
-//:   3. When running we'd often like new representations to be checked before
-//:   they are transformed away. The whole reason for new representations is
-//:   often to add new kinds of automatic checking for our machine code
-//:   programs.
-//:
-//: Putting these constraints together, we'll use the following broad
-//: organization:
-//:
-//:   a) We'll divide up our transforms into "levels", each level consisting
-//:   of multiple transforms, and dealing in some new set of representational
-//:   ideas. Levels will be added in reverse order to the one their transforms
-//:   will be run in.
-//:
-//:     To run all transforms:
-//:       Load transforms for level n
-//:       Load transforms for level n-1
-//:       ...
-//:       Load transforms for level 2
-//:       Run code at level 1
-//:
-//:   b) *Within* a level we'll usually introduce transforms in the order
-//:   they're run in.
-//:
-//:     To run transforms for level n:
-//:       Perform transform of layer l
-//:       Perform transform of layer l+1
-//:       ...
-//:
-//:   c) Within a level it's often most natural to introduce a new
-//:   representation by showing how it's transformed to the level below. To
-//:   make such exceptions more obvious checks usually won't be first-class
-//:   transforms; instead code that keeps the program unmodified will run
-//:   within transforms before they mutate the program. As an example:
-//:
-//:     Layer l introduces a transform
-//:     Layer l+1 adds precondition checks for the transform
-//:
-//: This may all seem abstract, but will hopefully make sense over time. The
-//: goals are basically to always have a working program after any layer, to
-//: have the order of layers make narrative sense, and to order transforms
-//: correctly at runtime.
+:(before "End Types")
+typedef void (*transform_fn)(program&);
+:(before "End Globals")
+vector<transform_fn> Transform;
+
+:(before "End transform(program& p)")
+for (int t = 0;  t < SIZE(Transform);  ++t)
+  (*Transform.at(t))(p);
 
 :(before "End One-time Setup")
 // Begin Transforms
diff --git a/032---operands.cc b/032---operands.cc
index 5203201e..5d434319 100644
--- a/032---operands.cc
+++ b/032---operands.cc
@@ -1,5 +1,4 @@
-//: Beginning of "level 2": tagging bytes with metadata around what field of
-//: an x86 instruction they're for.
+//: Metadata for fields of an x86 instruction.
 //:
 //: The x86 instruction set is variable-length, and how a byte is interpreted
 //: affects later instruction boundaries. A lot of the pain in programming
@@ -27,6 +26,10 @@ put_new(Help, "instructions",
 :(before "End Help Contents")
 cerr << "  instructions\n";
 
+:(before "Running Test Program")
+transform(p);
+if (trace_contains_errors()) return;
+
 :(code)
 void test_pack_immediate_constants() {
   run(