about summary refs log blame commit diff stats
path: root/subx/050_write.subx
blob: 0d6b8152b0b45f2d40bf3039addd38862b94f76e (plain) (tree)
<
== Goal

A memory-safe language with a simple translator to x86 that can be feasibly written in x86.

== Definitions of terms

Memory-safe: it should be impossible to:
  a) create a pointer out of arbitrary data, or
  b) to access heap memory after it's been freed.

Simple: do all the work in a 2-pass translator:
  Pass 1: check each instruction's types in isolation.
  Pass 2: emit code for each instruction in isolation.

== Implications

=> Each instruction matches a pattern and yields a template to emit.
=> There's a 1-to-1 mapping between instructions in the source language and x86 machine code.
  Zero runtime.
=> Programmers have to decide how to use registers.
=> Translator can't insert any instructions that write to registers. (We don't know if a register is in use.)

== Lessons from Mu

1. For easy bounds checking, never advance pointers to arrays or heap allocations. No pointer arithmetic.
2. Store the array length with the array.
3. Store an allocation id with heap allocations. Allocation id goes monotonically up, never gets reused. When it wraps around to zero the program panics.
4. Heap pointers also carry around allocation id.
5. When dereferencing a heap pointer, first ensure its alloc id matches the alloc id of the payload. This ensures some other copy of the pointer didn't get freed (and potentially reused)

== Problem 1

How to index into an array?

  The array has a length that needs to be checked.
  Its elements have a type T.
  The base will be in memory, either on the stack or the heap.
  The index may be in the register, stack or heap.

That's too much work to do in a single instruction.

So arrays have to take multiple steps. And we have to guard against the steps
being misused in unsafe ways.

To index into an array with elements of type T, starting with the size of the
array in bytes:

  step 1: get the offset the index is at
    <reg offset> : (index T) <- index <reg/mem idx> : int, <literal> : (size T)
  step 2: convert the array to address-of-element
    <reg x> : (address T) <- advance <reg/mem A> : (array T), <reg offset> : (index T)
    implicitly compares the offset with the size, panic if greater
    =>
      compare <reg offset> : (index T), <reg/mem> : (array T)
      jge panic
  step 3: use the address to the element
    ...

(index T) is a special type. You can do only two things with it:
  - pass it to the advance instruction
  - convert it to a number (but no converting back)

(address T) is a short-term pointer. You can't store addresses in structs, you
can't define global variables of that type, and you can't pass the type to the
memory allocator to save to the heap. Only place you can store an (address T)
is on the stack or a register.

[But you can still be holding an address in a long-lived stack frame after
it's been freed?!]

== Problem 2

How to dereference a heap allocation?
me/subx/050_write.subx?h=hlt&id=e9661581f092f3e210b7bd900af058d8b8c4369e'>^
e59a74ab ^


cf02c20b ^
ee9a9237 ^
6030d7e2 ^



ee9a9237 ^
6030d7e2 ^


4224ec81 ^
e59a74ab ^







ee9a9237 ^
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
                                         
 
       


                                                                                                                                                 
 

                                               
                           
                                
                         
 
                                                       
              

                                                                                                                                                                       
                      




                                               
                
                                                                                                                                                                            
                        
                                                                                                                                                                             
                                                                                                                                                                  
                       
                                                                                                                                                                             
                                                                                                                                                                        
               

                                 


                                              
            
                         



                 
              


                                                                                                                                                                       
 







                                                                                                       
                            
# _write: write to a file descriptor (fd)

== code
#   instruction                     effective address                                                   register    displacement    immediate
# . op          subop               mod             rm32          base        index         scale       r32
# . 1-3 bytes   3 bits              2 bits          3 bits        3 bits      3 bits        2 bits      2 bits      0/1/2/4 bytes   0/1/2/4 bytes

Entry:  # just exit; can't test _write just yet
    # . syscall(exit, 0)
    bb/copy-to-EBX  0/imm32
    b8/copy-to-EAX  1/imm32/exit
    cd/syscall  0x80/imm8

_write:  # fd : int, s : (address array byte) -> <void>
    # . prolog
    55/push-EBP
    89/copy                         3/mod/direct    5/rm32/EBP    .           .             .           4/r32/ESP   .               .                 # copy ESP to EBP
    # . save registers
    50/push-EAX
    51/push-ECX
    52/push-EDX
    53/push-EBX
    # syscall(write, fd, (data) s+4, (size) *s)
    # . fd : EBX
    8b/copy                         1/mod/*+disp8   5/rm32/EBP    .           .             .           3/r32/EBX   8/disp8         .                 # copy *(EBP+8) to EBX
    # . data : ECX = s+4
    8b/copy                         1/mod/*+disp8   5/rm32/EBP    .           .             .           1/r32/ECX   0xc/disp8       .                 # copy *(EBP+12) to ECX
    81          0/subop/add         3/mod/direct    1/rm32/ECX    .           .             .           .           .               4/imm32           # add to ECX
    # . size : EDX = *s
    8b/copy                         1/mod/*+disp8   5/rm32/EBP    .           .             .           2/r32/EDX   0xc/disp8       .                 # copy *(EBP+12) to EDX
    8b/copy                         0/mod/indirect  2/rm32/EDX    .           .             .           2/r32/EDX   .               .                 # copy *EDX to EDX
    # . syscall
    b8/copy-to-EAX  4/imm32/write
    cd/syscall  0x80/imm8
    # if (EAX < 0) abort
    3d/compare-EAX-with  0/imm32
    0f 8c/jump-if-lesser  $_write:abort/disp32
$_write:end:
    # . restore registers
    5b/pop-to-EBX
    5a/pop-to-EDX
    59/pop-to-ECX
    58/pop-to-EAX
    # . epilog
    89/copy                         3/mod/direct    4/rm32/ESP    .           .             .           5/r32/EBP   .               .                 # copy EBP to ESP
    5d/pop-to-EBP
    c3/return

$_write:abort:
    # can't write a message here for risk of an infinite loop, so we'll use a special exit code instead
    # . syscall(exit, 255)
    bb/copy-to-EBX  0xff/imm32
    b8/copy-to-EAX  1/imm32/exit
    cd/syscall  0x80/imm8
    # never gets here

# . . vim:nowrap:textwidth=0