mu - Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.

	Commit message (Expand)	Author	Age	Files	Lines
*	5801 - move `tangle` to `tools/` dir	Kartik Agaram	2019-12-07	1	-0/+112

141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196

Mu programs are lists of functions. Each function has the following form: fn _name_ _inouts_with_types_ -> _outputs_with_types_ { _instructions_ } Each function has a header line, and some number of instructions, each on a separate line. Instructions may be primitives or function calls. Either way, all instructions have one of the following forms: # defining variables var _name_: _type_ var _name_/_register_: _type_ # doing things with variables _operation_ _inouts_ _outputs_ <- _operation_ _inouts_ Instructions and functions may have inouts and outputs. Both inouts and outputs are variables. As seen above, variables can be defined to live in a register, like this: n/eax Variables not assigned a register live in the stack. Function inouts must always be on the stack, and outputs must always be in registers. A function call must always write to the exact registers its definition requires. For example: fn foo -> x/eax: int { ... } fn main { a/eax <- foo # ok a/ebx <- foo # wrong } Primitive inouts may be on the stack or in registers, but outputs must always be in registers. Functions can contain nested blocks inside { and }. Variables defined in a block don't exist outside it. { _instructions_ { _more instructions_ } } Blocks can be named like so: $name: { _instructions_ } ## Primitive instructions Primitive instructions currently supported in Mu ('n' indicates a literal integer rather than a variable, and 'var/reg' indicates a variable in a register): var/reg <- increment increment var var/reg <- decrement decrement var var1/reg1 <- add var2/reg2 var/reg <- add var2 add-to var1, var2/reg var/reg <- add n add-to var, n var1/reg1 <- sub var2/reg2 var/reg <- sub var2 sub-from var1, var2/reg var/reg <- sub n sub-from var, n var1/reg1 <- and var2/reg2 var/reg <- and var2 and-with var1, var2/reg var/reg <- and n and-with var, n var1/reg1 <- or var2/reg2 var/reg <- or var2 or-with var1, var2/reg var/reg <- or n or-with var, n var1/reg1 <- xor var2/reg2 var/reg <- xor var2 xor-with var1, var2/reg var/reg <- xor n xor-with var, n var1/reg1 <- copy var2/reg2 copy-to var1, var2/reg var/reg <- copy var2 var/reg <- copy n copy-to var, n compare var1, var2/reg compare var1/reg, var2 compare var/eax, n compare var, n var/reg <- multiply var2 Notice that there are no primitive instructions operating on two variables in memory. That's a restriction of the underlying x86 processor. Any instruction above that takes a variable in memory can be replaced with a dereference (`*`) of an address variable in a register. But you can't dereference variables in memory. ## Primitive jump instructions There are two kinds of jumps, both with many variations: `break` and `loop`. `break` instructions jump to the end of the containing block. `loop` instructions jump to the beginning of the containing block. Jumps can take an optional label starting with '$': loop $foo This instruction jumps to the beginning of the block called $foo. It must lie somewhere inside such a block. Jumps are only legal to containing blocks. Use named blocks with restraint; jumps to places far away can get confusing. There are two unconditional jumps: loop loop label break break label The remaining jump instructions are all conditional. Conditional jumps rely on the result of the most recently executed `compare` instruction. (To keep programs easy to read, keep compare instructions close to the jump that uses them.) break-if-= break-if-= label break-if-!= break-if-!= label Inequalities are similar, but have unsigned and signed variants. We assume unsigned variants are only ever used to compare addresses. break-if-< break-if-< label break-if-> break-if-> label break-if-<= break-if-<= label break-if->= break-if->= label break-if-addr< break-if-addr< label break-if-addr> break-if-addr> label break-if-addr<= break-if-addr<= label break-if-addr>= break-if-addr>= label Similarly, conditional loops: loop-if-= loop-if-= label loop-if-!= loop-if-!= label loop-if-< loop-if-< label loop-if-> loop-if-> label loop-if-<= loop-if-<= label loop-if->= loop-if->= label loop-if-addr< loop-if-addr< label loop-if-addr> loop-if-addr> label loop-if-addr<= loop-if-addr<= label loop-if-addr>= loop-if-addr>= label ## Address operations var/reg: (addr T) <- address var: T # var must be in mem (on the stack) ## Array operations var/reg: int <- length arr/reg: (addr array T) var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int var/reg: (addr T) <- index arr: (array T sz), idx/reg: int var/reg: (addr T) <- index arr/reg: (addr array T), n var/reg: (addr T) <- index arr: (array T sz), n var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int # arr can be in reg or mem var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int # arr can be in reg or mem var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T) ## User-defined types var/reg: (addr T_f) <- get var/reg: (addr T), f where record (product) type T has elements a, b, c, ... of types T_a, T_b, T_c, ... var/reg: (addr T_f) <- get var: T, f ## Handles for safe access to the heap Say we created a handle like this on the stack (it can't be in a register) var x: (handle T) allocate Heap, T, x You can copy handles to another variable on the stack like this: var y: (handle T) copy-handle-to y, x You can also save handles inside other user-defined types like this: var y/reg: (addr handle T_f) <- get var: (addr T), f copy-handle-to *y, x Or this: var y/reg: (addr handle T) <- index arr: (addr array handle T), n copy-handle-to *y, x Handles can be converted into addresses like this: var y/reg: (addr T) <- lookup x It's illegal to continue to use this addr after a function that reclaims heap memory. You have to repeat the lookup.