010---vm.cc « subx - mu - Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.

blob: 7e5b8d798d1b462b49c3b7e833388ba89f836e56 (plain) (tree)

1f56ac64 ^

d57d39cb ^

1f56ac64 ^

f8831a02 ^

08b7c4d9 ^

ced962db ^

f8831a02 ^

08b7c4d9 ^

ced962db ^

f8831a02 ^

dd824566 ^

06d9b1a5 ^

116e7730 ^

3d3cb2a1 ^

116e7730 ^

3d3cb2a1 ^

116e7730 ^

3d3cb2a1 ^

0ca791cd ^

3d3cb2a1 ^

116e7730 ^

dd824566 ^

f8831a02 ^

1f56ac64 ^

bd00c14c ^

f8831a02 ^

bd00c14c ^

f8831a02 ^

03dcb7c9 ^

f8831a02 ^

729609ff ^

f8831a02 ^

069ed1c8 ^

f8831a02 ^

03dcb7c9 ^

729609ff ^

f8831a02 ^

069ed1c8 ^

f8831a02 ^

630433cd ^

4a99a6e0 ^

630433cd ^

ed79099b ^

630433cd ^

cfaac2a8 ^

4a99a6e0 ^

630433cd ^

cfaac2a8 ^

cfaac2a8 ^

bfe9fa11 ^

cfaac2a8 ^

630433cd ^

df39c0c8 ^

630433cd ^

630433cd ^

df39c0c8 ^

630433cd ^

f8831a02 ^

630433cd ^

f8831a02 ^

763396e5 ^

a893d3b6 ^

665a4d70 ^

630433cd ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

630433cd ^

880b8e91 ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

16db2a2f ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

16db2a2f ^

665a4d70 ^

630433cd ^

665a4d70 ^

b396b7b4 ^

54ed56f2 ^

880b8e91 ^

b396b7b4 ^

54ed56f2 ^

7163541b ^

54ed56f2 ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

880b8e91 ^

665a4d70 ^

f8831a02 ^

630433cd ^

16db2a2f ^

630433cd ^

f8831a02 ^

f8831a02 ^

f8831a02 ^

8950915a ^

069ed1c8 ^

b359facb ^

33ad0851 ^

e0a0484c ^

33ad0851 ^

b359facb ^

a893d3b6 ^

f8831a02 ^

e307a807 ^

f8831a02 ^

a7b2a5de ^

6602c82f ^

f8831a02 ^

f2889b00 ^

f8831a02 ^

e307a807 ^

f8831a02 ^

e307a807 ^

f8831a02 ^

a7b2a5de ^

6602c82f ^

f8831a02 ^

a7b2a5de ^

6602c82f ^

f8831a02 ^

f8831a02 ^

a7b2a5de ^

6602c82f ^

f8831a02 ^

2af68cc8 ^

665a4d70 ^

f8831a02 ^

62c6d163 ^

b359facb ^

62c6d163 ^

b359facb ^

62c6d163 ^

62197fd5 ^

0f851e48 ^

62197fd5 ^

0f851e48 ^

62197fd5 ^

0f851e48 ^

3f4bbe9e ^

0f851e48 ^

3f4bbe9e ^

0f851e48 ^

3f4bbe9e ^

0f851e48 ^

3f4bbe9e ^

116e7730 ^

dc559a00 ^

62197fd5 ^

f8831a02 ^

a7b2a5de ^

edd2253b ^

34
35
36

38
39
40

62
63

64
65
66

75
76

79
80
81

88
89

92
93
94

101
102

104
105

120
121

150
151
152

153
154
155

157
158
159

194
195

203
204

205
206

208
209

212
213

215
216
217

229
230

232
233

247
248

255
256

262
263

264
265

266
267

268
269

270
271

272
273

274
275

276
277

290
291

292
293

295
296
297

308
<## SubX: a simplistic assembly language

SubX is a minimalist assembly language designed:
* to explore ways to turn arbitrary manual tests into reproducible automated
  tests,
* to be easy to implement in itself, and
* to help learn and teach the x86 instruction set.

```
$ git clone https://github.com/akkartik/mu
$ cd mu/subx
$ ./subx  # print out a help message
```

[![Build Status](https://api.travis-ci.org/akkartik/mu.svg)](https://travis-ci.org/akkartik/mu)

Expanding on the first bullet, it hopes to support more comprehensive tests
by:

0. Running generated binaries in _emulated mode_. Emulated mode is slower than
   native execution (which will also work), but there's more sanity checking,
   and more descriptive error messages for common low-level problems.

   ```
   $ ./subx translate examples/ex1.subx -o examples/ex1
   $ ./examples/ex1  # only on Linux
   $ echo $?
   42
   $ ./subx run examples/ex1  # on Linux or BSD or OS X
   $ echo $?
   42
   ```

   The assembly syntax is designed so the assembler (`subx translate`) has
   very little to do, making it feasible to reimplement in itself. Programmers
   have to explicitly specify all opcodes and operands.

   ```
   # exit(42)
   bb/copy-to-EBX  0x2a/imm32  # 42 in hex
   b8/copy-to-EAX  1/imm32/exit
   cd/syscall  0x80/imm8
   ```

   To keep code readable you can add _metadata_ to any word after a `/`.
   Metadata can be just comments for readers, and they'll be ignored. They can
   also trigger checks. Here, tagging operands with the `imm32` type allows
   SubX to check that instructions have precisely the operand types they
   should. x86 instructions have 14 types of operands, and missing one causes
   all future instructions to go off the rails, interpreting operands as
   opcodes and vice versa. So this is a useful check.

1. Designing testable wrappers for operating system interfaces. For example,
   it can `read()` from or `write()` to fake in-memory files in tests. More
   details [below](#subx-library). We are continuing to port syscalls from
   [the old Mu VM in the parent directory](https://github.com/akkartik/mu).

2. Supporting a special _trace_ stream in addition to the default `stdin`,
   `stdout` and `stderr` streams. The trace stream is designed for programs to
   emit structured facts they deduce about their domain as they execute. Tests
   can then check the set of facts deduced in addition to the results of the
   function under test. This form of _automated whitebox testing_ permits
   writing tests for performance, fault tolerance, deadlock-freedom, memory
   usage, etc. For example, if a sort function traces each swap, a performance
   test could check that the number of swaps doesn't quadruple when the size
   of the input doubles.

The hypothesis is that designing the entire system to be testable from day 1
and from the ground up would radically impact the culture of an eco-system in
a way that no bolted-on tool or service at higher levels can replicate. It
would make it easier to write programs that can be [easily understood by newcomers](http://akkartik.name/about).
It would reassure authors that an app is free from regression if all automated
tests pass. It would make the stack easy to rewrite and simplify by dropping
features, without fear that a subset of targeted apps might break. As a result
people might fork projects more easily, and also exchange code between
disparate forks more easily (copy the tests over, then try copying code over
and making tests pass, rewriting and polishing where necessary). The community
would have in effect a diversified portfolio of forks, a “wavefront” of
possible combinations of features and alternative implementations of features
instead of the single trunk with monotonically growing complexity that we get
today. Application writers who wrote thorough tests for their apps (something
they just can’t do today) would be able to bounce around between forks more
easily without getting locked in to a single one as currently happens.

However, that vision is far away, and SubX is just a first, hesitant step.
SubX supports a small, regular subset of the 32-bit x86 instruction set.
(Think of the name as short for "sub-x86".)

  - Only instructions that operate on the 32-bit integer E\*X registers, and a
    couple of instructions for operating on 8-bit values. No floating-point
    yet. Most legacy registers will never be supported.

  - Only instructions that assume a flat address space; legacy instructions
    that use segment registers will never be supported.

  - No instructions that check the carry or parity flags; arithmetic operations
    always operate on signed integers (while bitwise operations always operate
    on unsigned integers).

  - Only relative jump instructions (with 8-bit or 32-bit offsets).

The (rudimentary, statically linked) ELF binaries SubX generates can be run
natively on Linux, and they require only the Linux kernel.

## Status

I'm currently implementing SubX in SubX in 3 phases:

  1. Converting ascii hex bytes to binary. (✓)
  2. Packing bitfields for x86 instructions into bytes.
  3. Replacing addresses with labels.

In parallel, I'm designing testable wrappers for syscalls, particularly for
scalably running blocking syscalls with a test harness concurrently monitoring
their progress.

## An example program

In the interest of minimalism, SubX requires more knowledge than traditional
assembly languages of the x86 instructions it supports. Here's an example
SubX program, using one line per instruction:

<img alt='examples/ex3.subx' src='../html/subx/ex3.png'>

This program sums the first 10 natural numbers. By convention I use horizontal
tabstops to help read instructions, dots to help follow the long lines,
comments before groups of instructions to describe their high-level purpose,
and comments at the end of complex instructions to state the low-level
operation they perform. Numbers are always shown in hexadecimal (base 16).

As you can see, programming in SubX requires the programmer to know the (kinda
complex) structure of x86 instructions, all the different operands that an
instruction can have, their layout in bytes (for example, the `subop` and
`r32` fields use the same bits, so an instruction can't have both; more on
this below), the opcodes for supported instructions, and so on.

While SubX syntax is fairly dumb, the error-checking is relatively smart. I
try to provide clear error messages on instructions missing operands or having
unexpected operands. Either case would otherwise cause instruction boundaries
to diverge from what you expect, and potentially lead to errors far away. It's
useful to catch such errors early.

Try running this example now:

```
$ ./subx translate examples/ex3.subx -o examples/ex3
$ ./subx run examples/ex3
$ echo $?
55
```

If you're on Linux you can also run it natively:

```
$ ./examples/ex3
$ echo $?
55
```

The rest of this Readme elaborates on the syntax for SubX programs, starting
with a few prerequisites about the x86 instruction set.

## A quick tour of the x86 instruction set

The [Intel processor manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf)
is the final source of truth on the x86 instruction set, but it can be
forbidding to make sense of, so here's a quick orientation. You will need
familiarity with binary and hexadecimal encodings (starting with '0x') for
numbers, and maybe a few other things. Email [me](mailto:mu@akkartik.com)
any time if something isn't clear. I love explaining this stuff for as long as
it takes.

The x86 instructions SubX supports can take anywhere from 1 to 13 bytes. Early
bytes affect what later bytes mean and where an instruction ends. Here's the
big picture of a single x86 instruction from the Intel manual:

<img alt='x86 instruction structure' src='../html/subx/encoding.png'>

There's a lot here, so let's unpack it piece by piece:

* The prefix bytes are not used by SubX, so ignore them.

* The opcode bytes encode the instruction used. Ignore their internal structure;
  we'll just treat them as a sequence of whole bytes. The opcode sequences
  SubX recognizes are enumerated by running `subx help opcodes`. For more
  details on a specific opcode, consult html guides like https://c9x.me/x86 or
  the Intel manual.

* The addressing mode byte is used by all instructions that take an `rm32`
  operand according to `subx help opcodes`. (That's most instructions.) The
  `rm32` operand expresses how an instruction should load one 32-bit operand
  from either a register or memory. It is configured by the addressing mode
  byte and, optionally, the SIB (scale, index, base) byte as follows:

  - if the `mod` (mode) field is `11` (3): the `rm32` operand is the contents
    of the register described by the `r/m` bits.
    - `000` (0) means register `EAX`
    - `001` (1) means register `ECX`
    - `010` (2) means register `EDX`
    - `011` (3) means register `EBX`
    - `100` (4) means register `ESP`
    - `101` (5) means register `EBP`
    - `110` (6) means register `ESI`
    - `111` (7) means register `EDI`

  - if `mod` is `00` (0): `rm32` is the c//: Core data structures for simulating the SubX VM (subset of an x86 processor)
//:
//: At the lowest level ("level 1") of abstraction, SubX executes x86
//: instructions provided in the form of an array of bytes, loaded into memory
//: starting at a specific address.

//:: registers
//: assume segment registers are hard-coded to 0
//: no floating-point, MMX, etc. yet

:(before "End Types")
enum {
  EAX,
  ECX,
  EDX,
  EBX,
  ESP,
  EBP,
  ESI,
  EDI,
  NUM_INT_REGISTERS,
};
union reg {
  int32_t i;
  uint32_t u;
};
:(before "End Globals")
reg Reg[NUM_INT_REGISTERS] = { {0} };
uint32_t EIP = 1;  // preserve null pointer
:(before "End Reset")
bzero(Reg, sizeof(Reg));
EIP = 1;  // preserve null pointer

:(before "End Help Contents")
cerr << "  registers\n";
:(before "End Help Texts")
put_new(Help, "registers",
  "SubX currently supports eight 32-bit integer registers. From 0 to 7, they are:\n"
  "  EAX ECX EDX EBX ESP EBP ESI EDI\n"
  "ESP contains the top of the stack.\n"
  "\n"
  "-- 8-bit registers\n"
  "Some instructions operate on eight *overlapping* 8-bit registers.\n"
  "From 0 to 7, they are:\n"
  "  AL CL DL BL AH CH DH BH\n"
  "The 8-bit registers overlap with the 32-bit ones. AL is the lowest signicant byte\n"
  "of EAX, AH is the second lowest significant byte, and so on.\n"
  "\n"
  "For example, if EBX contains 0x11223344, then BL contains 0x44, and BH contains 0x33.\n"
  "\n"
  "There is no way to access bytes within ESP, EBP, ESI or EDI.\n"
  "\n"
  "For complete details consult the IA-32 software developer's manual, volume 2,\n"
  "table 2-2, \"32-bit addressing forms with the ModR/M byte\".\n"
  "It is included in this repository as 'modrm.pdf'.\n"
  "The register encodings are described in the top row of the table, but you'll need\n"
  "to spend some time with it.\n"
  "\n"
  "-- flag registers\n"
  "Various instructions (particularly 'compare') modify one or more of three 1-bit 'flag'\n"
  "registers, as a side-effect:\n"
  "- the sign flag (SF): usually set if an arithmetic result is negative, or\n"
  "  reset if not.\n"
  "- the zero flag (ZF): usually set if a result is zero, or reset if not.\n"
  "- the overflow flag (OF): usually set if an arithmetic result overflows.\n"
  "The flag bits are read by conditional jumps.\n"
  "\n"
  "For complete details on how different instructions update the flags, consult the IA-32\n"
  "manual (volume 2). There's various versions of it online, such as https://c9x.me/x86,\n"
  "though of course you'll need to be careful to ignore instructions and flag registers\n"
  "that SubX doesn't support.\n"
  "\n"
  "It isn't simple, but if this is the processor you have running on your computer,\n"
  "might as well get good at it.\n"
);

:(before "End Globals")
// the subset of x86 flag registers we care about
bool SF = false;  // sign flag
bool ZF = false;  // zero flag
bool OF = false;  // overflow flag
:(before "End Reset")
SF = ZF = OF = false;

//: how the flag registers are updated after each instruction

:(before "End Includes")
// Combine 'arg1' and 'arg2' with arithmetic operation 'op' and store the
// result in 'arg1', then update flags.
// beware: no side-effects in args
#define BINARY_ARITHMETIC_OP(op, arg1, arg2) { \
  /* arg1 and arg2 must be signed */ \
  int64_t tmp = arg1 op arg2; \
  arg1 = arg1 op arg2; \
  trace(90, "run") << "storing 0x" << HEXWORD << arg1 << end(); \
  SF = (arg1 < 0); \
  ZF = (arg1 == 0); \
  OF = (arg1 != tmp); \
}

// Combine 'arg1' and 'arg2' with bitwise operation 'op' and store the result
// in 'arg1', then update flags.
#define BINARY_BITWISE_OP(op, arg1, arg2) { \
  /* arg1 and arg2 must be unsigned */ \
  arg1 = arg1 op arg2; \
  trace(90, "run") << "storing 0x" << HEXWORD << arg1 << end(); \
  SF = (arg1 >> 31); \
  ZF = (arg1 == 0); \
  OF = false; \
}

//:: simulated RAM

:(before "End Types")
const uint32_t SEGMENT_ALIGNMENT = 0x1000000;
inline uint32_t align_upwards(uint32_t x, uint32_t align) {
  return (x+align-1) & -(align);
}

// Like in real-world Linux, we'll allocate RAM for our programs in disjoint
// slabs called VMAs or Virtual Memory Areas.
struct vma {
  uint32_t start;  // inclusive
  uint32_t end;  // exclusive
  vector<uint8_t> _data;
  vma(uint32_t s, uint32_t e) :start(s), end(e) {}
  vma(uint32_t s) :start(s), end(align_upwards(s+1, SEGMENT_ALIGNMENT)) {}
  bool match(uint32_t a) {
    return a >= start && a < end;
  }
  bool match32(uint32_t a) {
    return a >= start && a+4 <= end;
  }
  uint8_t& data(uint32_t a) {
    assert(match(a));
    uint32_t result_index = a-start;
    if (_data.size() <= result_index) {
      const int align = 0x1000;
      uint32_t result_size = result_index + 1;  // size needed for result_index to be valid
      uint32_t new_size = align_upwards(result_size, align);
      // grow at least 2x to maintain some amortized complexity guarantees
      if (new_size < _data.size() * 2)
        new_size = _data.size() * 2;
      // never grow past the stated limit
      if (new_size > end-start)
        new_size = end-start;
      _data.resize(new_size);
    }
    return _data.at(result_index);
  }
  void grow_until(uint32_t new_end_address) {
    if (new_end_address < end) return;
    // Ugly: vma knows about the global Memory list of vmas
    void sanity_check(uint32_t start, uint32_t end);
    sanity_check(start, new_end_address);
    end = new_end_address;
  }
  // End vma Methods
};
:(code)
void sanity_check(uint32_t start, uint32_t end) {
  bool dup_found = false;
  for (int i = 0;  i < SIZE(Mem);  ++i) {
    const vma& curr = Mem.at(i);
    if (curr.start == start) {
      assert(!dup_found);
      dup_found = true;
    }
    else if (curr.start > start) {
      assert(curr.start > end);
    }
    else if (curr.start < start) {
      assert(curr.end < start);
    }
  }
}

:(before "End Globals")
// RAM is made of VMAs.
vector<vma> Mem;
:(code)
// The first 3 VMAs are special. When loading ELF binaries in later layers,
// we'll assume that the first VMA is for code, the second is for data
// (including the heap), and the third for the stack.
void grow_code_segment(uint32_t new_end_address) {
  assert(!Mem.empty());
  Mem.at(0).grow_until(new_end_address);
}
void grow_data_segment(uint32_t new_end_address) {
  assert(SIZE(Mem) > 1);
  Mem.at(1).grow_until(new_end_address);
}
:(before "End Globals")
uint32_t End_of_program = 0;  // when the program executes past this address in tests we'll stop the test
// The stack grows downward. Can't increase its size for now.
:(before "End Reset")
Mem.clear();
End_of_program = 0;
:(code)
// These helpers depend on Mem being laid out contiguously (so you can't use a
// map, etc.) and on the host also being little-endian.
inline uint8_t read_mem_u8(uint32_t addr) {
  uint8_t* handle = mem_addr_u8(addr);  // error messages get printed here
  return handle ? *handle : 0;
}
inline int8_t read_mem_i8(uint32_t addr) {
  return static_cast<int8_t>(read_mem_u8(addr));
}
inline uint32_t read_mem_u32(uint32_t addr) {
  uint32_t* handle = mem_addr_u32(addr);  // error messages get printed here
  return handle ? *handle : 0;
}
inline int32_t read_mem_i32(uint32_t addr) {
  return static_cast<int32_t>(read_mem_u32(addr));
}

inline uint8_t* mem_addr_u8(uint32_t addr) {
  uint8_t* result = NULL;
  for (int i = 0;  i < SIZE(Mem);  ++i) {
    if (Mem.at(i).match(addr)) {
      if (result)
        raise << "address 0x" << HEXWORD << addr << " is in two segments\n" << end();
      result = &Mem.at(i).data(addr);
    }
  }
  if (result == NULL)
    raise << "Tried to access uninitialized memory at address 0x" << HEXWORD << addr << '\n' << end();
  return result;
}
inline int8_t* mem_addr_i8(uint32_t addr) {
  return reinterpret_cast<int8_t*>(mem_addr_u8(addr));
}
inline uint32_t* mem_addr_u32(uint32_t addr) {
  uint32_t* result = NULL;
  for (int i = 0;  i < SIZE(Mem);  ++i) {
    if (Mem.at(i).match32(addr)) {
      if (result)
        raise << "address 0x" << HEXWORD << addr << " is in two segments\n" << end();
      result = reinterpret_cast<uint32_t*>(&Mem.at(i).data(addr));
    }
  }
  if (result == NULL) {
    raise << "Tried to access uninitialized memory at address 0x" << HEXWORD << addr << '\n' << end();
    raise << "The entire 4-byte word should be initialized and lie in a single segment.\n" << end();
  }
  return result;
}
inline int32_t* mem_addr_i32(uint32_t addr) {
  return reinterpret_cast<int32_t*>(mem_addr_u32(addr));
}
// helper for some syscalls. But read-only.
inline const char* mem_addr_kernel_string(uint32_t addr) {
  return reinterpret_cast<const char*>(mem_addr_u8(addr));
}
inline string mem_addr_string(uint32_t addr, uint32_t size) {
  ostringstream out;
  for (size_t i = 0;  i < size;  ++i)
    out << read_mem_u8(addr+i);
  return out.str();
}


inline void write_mem_u8(uint32_t addr, uint8_t val) {
  uint8_t* handle = mem_addr_u8(addr);
  if (handle != NULL) *handle = val;
}
inline void write_mem_i8(uint32_t addr, int8_t val) {
  int8_t* handle = mem_addr_i8(addr);
  if (handle != NULL) *handle = val;
}
inline void write_mem_u32(uint32_t addr, uint32_t val) {
  uint32_t* handle = mem_addr_u32(addr);
  if (handle != NULL) *handle = val;
}
inline void write_mem_i32(uint32_t addr, int32_t val) {
  int32_t* handle = mem_addr_i32(addr);
  if (handle != NULL) *handle = val;
}

inline bool already_allocated(uint32_t addr) {
  bool result = false;
  for (int i = 0;  i < SIZE(Mem);  ++i) {
    if (Mem.at(i).match(addr)) {
      if (result)
        raise << "address 0x" << HEXWORD << addr << " is in two segments\n" << end();
      result = true;
    }
  }
  return result;
}

//:: core interpreter loop

:(code)
// skeleton of how x86 instructions are decoded
void run_one_instruction() {
  uint8_t op=0, op2=0, op3=0;
  // Run One Instruction
  trace(90, "run") << "inst: 0x" << HEXWORD << EIP << end();
  op = next();
  if (Dump_trace) {
    cerr << "opcode: " << HEXBYTE << NUM(op) << '\n';
    cerr << "registers at start: ";
    dump_registers();
//?     dump_stack();  // for debugging; not defined until later layer
  }
  switch (op) {
  case 0xf4:  // hlt
    EIP = End_of_program;
    break;
  // End Single-Byte Opcodes
  case 0x0f:
    switch(op2 = next()) {
    // End Two-Byte Opcodes Starting With 0f
    default:
      cerr << "unrecognized second opcode after 0f: " << HEXBYTE << NUM(op2) << '\n';
      DUMP("");
      exit(1);
    }
    break;
  case 0xf2:
    switch(op2 = next()) {
    // End Two-Byte Opcodes Starting With f2
    case 0x0f:
      switch(op3 = next()) {
      // End Three-Byte Opcodes Starting With f2 0f
      default:
        cerr << "unrecognized third opcode after f2 0f: " << HEXBYTE << NUM(op3) << '\n';
        DUMP("");
        exit(1);
      }
      break;
    default:
      cerr << "unrecognized second opcode after f2: " << HEXBYTE << NUM(op2) << '\n';
      DUMP("");
      exit(1);
    }
    break;
  case 0xf3:
    switch(op2 = next()) {
    // End Two-Byte Opcodes Starting With f3
    case 0x0f:
      switch(op3 = next()) {
      // End Three-Byte Opcodes Starting With f3 0f
      default:
        cerr << "unrecognized third opcode after f3 0f: " << HEXBYTE << NUM(op3) << '\n';
        DUMP("");
        exit(1);
      }
      break;
    default:
      cerr << "unrecognized second opcode after f3: " << HEXBYTE << NUM(op2) << '\n';
      DUMP("");
      exit(1);
    }
    break;
  default:
    cerr << "unrecognized opcode: " << HEXBYTE << NUM(op) << '\n';
    DUMP("");
    exit(1);
  }
}

inline uint8_t next() {
  return read_mem_u8(EIP++);
}

void dump_registers() {
  for (int i = 0;  i < NUM_INT_REGISTERS;  ++i) {
    if (i > 0) cerr << "; ";
    cerr << "  " << i << ": " << std::hex << std::setw(8) << std::setfill('_') << Reg[i].u;
  }
  cerr << " -- SF: " << SF << "; ZF: " << ZF << "; OF: " << OF << '\n';
}

//: start tracking supported opcodes
:(before "End Globals")
map</*op*/string, string> Name;
map</*op*/string, string> Name_0f;
map</*op*/string, string> Name_f3;
map</*op*/string, string> Name_f3_0f;
:(before "End One-time Setup")
init_op_names();
:(code)
void init_op_names() {
  put(Name, "f4", "halt (hlt)");
  // End Initialize Op Names
}

:(before "End Help Special-cases(key)")
if (key == "opcodes") {
  cerr << "Opcodes currently supported by SubX:\n";
  for (map<string, string>::iterator p = Name.begin();  p != Name.end();  ++p)
    cerr << "  " << p->first << ": " << p->second << '\n';
  for (map<string, string>::iterator p = Name_0f.begin();  p != Name_0f.end();  ++p)
    cerr << "  0f " << p->first << ": " << p->second << '\n';
  for (map<string, string>::iterator p = Name_f3.begin();  p != Name_f3.end();  ++p)
    cerr << "  f3 " << p->first << ": " << p->second << '\n';
  for (map<string, string>::iterator p = Name_f3_0f.begin();  p != Name_f3_0f.end();  ++p)
    cerr << "  f3 0f " << p->first << ": " << p->second << '\n';
  cerr << "Run `subx help instructions` for details on words like 'r32' and 'disp8'.\n"
          "For complete details on these instructions, consult the IA-32 manual (volume 2).\n"
          "There's various versions of it online, such as https://c9x.me/x86.\n"
          "The mnemonics in brackets will help you locate each instruction.\n";
  return 0;
}
:(before "End Help Contents")
cerr << "  opcodes\n";

:(before "End Includes")
#include <iomanip>
#define HEXBYTE  std::hex << std::setw(2) << std::setfill('0')
#define HEXWORD  std::hex << std::setw(8) << std::setfill('0')
// ugly that iostream doesn't print uint8_t as an integer
#define NUM(X) static_cast<int>(X)
#include <stdint.h>