From de2990880ab7ac98db79747a2eaedc3442fd4afc Mon Sep 17 00:00:00 2001 From: Kartik Agaram Date: Sat, 16 Feb 2019 22:55:12 -0800 Subject: 4976 - recommend that operand order be fixed I've been allowing operands in any order just because it simplifies implementation. I don't actually rely on this flexibility; all the .subx programs in this repo consistently use a single ordering. Why is a hard-coded canonical order hard to implement? The order that seems most logical to me is complicated by the "reg" bits in the ModR/M byte: - In instructions that interpret it as an `/r32` operand, it needs to be deemphasized because it refers to a different argument of the instruction than the `/mod`, `/rm32`, `/base`, `/index` and `/scale` operands that capture the bulk of instruction decoding complexity and so should be emphasized. `/r32` can also be unused, which strengthens the case for deemphasizing it. - In instructions that interpret the "reg" bits as a `/subop` operand, it should be colocated with the opcode because it performs the same function: specifying the *operation* the instruction performs. In both cases, the bits in the `reg` bitfield are conceptually unrelated to the other bitfields in the same byte. But they sometimes want to be close to the opcode bytes on the left, and at other times need to be deemphasized rightward. Fixing both these possibilities seems complicated and stateful, particularly since all operands are optional in general. On the other hand, just pulling operands you need to create each byte, regardless of where in the instruction they occur, that's nicely stateless. --- subx/Readme.md | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-) diff --git a/subx/Readme.md b/subx/Readme.md index a0fff7ca..404f4dc6 100644 --- a/subx/Readme.md +++ b/subx/Readme.md @@ -303,15 +303,15 @@ Within the code segment, each line contains a comment, label or instruction. Comments start with a `#` and are ignored. Labels should always be the first word on a line, and they end with a `:`. -Instructions consist of a sequence of opcode bytes and their operands. As -mentioned above, each opcode and operand can contain _metadata_ after a `/`. -Metadata can be either for SubX or act as a comment for the reader; SubX -silently ignores unrecognized metadata. A single word can contain multiple -pieces of metadata, each starting with a `/`. - -SubX uses metadata to express instruction encoding and get decent error -messages. You must tag each instruction operand with the appropriate operand -type: +Instructions consist of a sequence of words. As mentioned above, each word can +contain _metadata_ after a `/`. Metadata can be either required by SubX or act +as a comment for the reader; SubX silently ignores unrecognized metadata. A +single word can contain multiple pieces of metadata, each starting with a `/`. + +The words in an instruction consist of 1-3 opcode bytes, and different kinds +of operands corresponding to the bitfields in an x86 instruction listed above. +For error checking, these operands must be tagged with one of the following +bits of metadata: - `mod` - `rm32` ("r/m" in the x86 instruction diagram above, but we can't use `/` in metadata tags) @@ -321,11 +321,18 @@ type: - displacement: `disp8`, `disp16` or `disp32` - immediate: `imm8` or `imm32` -You don't need to remember what order instruction operands are in, -or pack bitfields by hand. SubX will do all that for you. If you get the types -wrong, giving an instruction an incorrect operand or forgetting an operand, -you should get a clear error message. Remember, don't use `subop` (sub-operand -above) and `r32` (reg in the x86 figure above) in a single instruction. +Different instructions (opcodes) require different operands. SubX will +validate each instruction in your programs, and raise an error anytime you +miss or spuriously add an operand. + +I recommend you order operands consistently in your programs. SubX allows +operands in any order, but only because that's simplest to explain/implement. +Switching order from instruction to instruction is likely to add to the +reader's burden. Here's the order I've been using: + +``` +/subop /mod /rm32 /base /index /scale /r32 /displacement /immediate +``` Instructions can refer to labels in displacement or immediate operands, and they'll obtain a value based on the address of the label: immediate operands -- cgit 1.4.1-2-gfad0