diff options
Diffstat (limited to 'transect/compiler3')
-rw-r--r-- | transect/compiler3 | 73 |
1 files changed, 73 insertions, 0 deletions
diff --git a/transect/compiler3 b/transect/compiler3 new file mode 100644 index 00000000..6bc6bf85 --- /dev/null +++ b/transect/compiler3 @@ -0,0 +1,73 @@ +== Goal + +A memory-safe language with a simple translator to x86 that can be feasibly written in x86. + +== Definitions of terms + +Memory-safe: it should be impossible to: + a) create a pointer out of arbitrary data, or + b) to access heap memory after it's been freed. + +Simple: do all the work in a 2-pass translator: + Pass 1: check each instruction's types in isolation. + Pass 2: emit code for each instruction in isolation. + +== Implications + +=> Each instruction matches a pattern and yields a template to emit. +=> There's a 1-to-1 mapping between instructions in the source language and x86 machine code. + Zero runtime. +=> Programmers have to decide how to use registers. +=> Translator can't insert any instructions that write to registers. (We don't know if a register is in use.) + +== Lessons from Mu + +1. For easy bounds checking, never advance pointers to arrays or heap allocations. No pointer arithmetic. +2. Store the array length with the array. +3. Store an allocation id with heap allocations. Allocation id goes monotonically up, never gets reused. When it wraps around to zero the program panics. +4. Heap pointers also carry around allocation id. +5. When dereferencing a heap pointer, first ensure its alloc id matches the alloc id of the payload. This ensures some other copy of the pointer didn't get freed (and potentially reused) + +== Problem 1 + +How to index into an array? + + The array has a length that needs to be checked. + Its elements have a type T. + The base will be in memory, either on the stack or the heap. + The index may be in the register, stack or heap. + +That's too much work to do in a single instruction. + +So arrays have to take multiple steps. And we have to guard against the steps +being misused in unsafe ways. + +To index into an array with elements of type T, starting with the size of the +array in bytes: + + step 1: get the offset the index is at + <reg offset> : (index T) <- index <reg/mem idx> : int, <literal> : (size T) + step 2: convert the array to address-of-element + <reg x> : (address T) <- advance <reg/mem A> : (array T), <reg offset> : (index T) + implicitly compares the offset with the size, panic if greater + => + compare <reg offset> : (index T), <reg/mem> : (array T) + jge panic + step 3: use the address to the element + ... + +(index T) is a special type. You can do only two things with it: + - pass it to the advance instruction + - convert it to a number (but no converting back) + +(address T) is a short-term pointer. You can't store addresses in structs, you +can't define global variables of that type, and you can't pass the type to the +memory allocator to save to the heap. Only place you can store an (address T) +is on the stack or a register. + +[But you can still be holding an address in a long-lived stack frame after +it's been freed?!] + +== Problem 2 + +How to dereference a heap allocation? |