path: root/mu_summary



Mu programs are lists of functions. Each function has the following form:

  fn _name_ _inouts_with_types_ -> _outputs_with_types_ {
    _instructions_
  }

Each function has a header line, and some number of instructions, each on a
separate line.

Instructions may be primitives or function calls. Either way, all instructions
have one of the following forms:

  # defining variables
  var _name_: _type_
  var _name_/_register_: _type_

  # doing things with variables
  _operation_ _inouts_
  _outputs_ <- _operation_ _inouts_

Instructions and functions may have inouts and outputs. Both inouts and
outputs are variables.

As seen above, variables can be defined to live in a register, like this:

  n/eax

Variables not assigned a register live in the stack.

Function inouts must always be on the stack, and outputs must always be in
registers. A function call must always write to the exact registers its
definition requires. For example:

  fn foo -> x/eax: int {
    ...
  }
  fn main {
    a/eax <- foo  # ok
    a/ebx <- foo  # wrong
  }

Primitive inouts may be on the stack or in registers, but outputs must always
be in registers.

Functions can contain nested blocks inside { and }. Variables defined in a
block don't exist outside it.

  {
    _instructions_
    {
      _more instructions_
    }
  }

Blocks can be named like so:

  $name: {
    _instructions_
  }

## Primitive instructions

Primitive instructions currently supported in Mu ('n' indicates a literal
integer rather than a variable, and 'var/reg' indicates a variable in a
register):

  var/reg <- increment
  increment var
  var/reg <- decrement
  decrement var
  var1/reg1 <- add var2/reg2
  var/reg <- add var2
  add-to var1, var2/reg
  var/reg <- add n
  add-to var, n

  var1/reg1 <- sub var2/reg2
  var/reg <- sub var2
  sub-from var1, var2/reg
  var/reg <- sub n
  sub-from var, n

  var1/reg1 <- and var2/reg2
  var/reg <- and var2
  and-with var1, var2/reg
  var/reg <- and n
  and-with var, n

  var1/reg1 <- or var2/reg2
  var/reg <- or var2
  or-with var1, var2/reg
  var/reg <- or n
  or-with var, n

  var1/reg1 <- xor var2/reg2
  var/reg <- xor var2
  xor-with var1, var2/reg
  var/reg <- xor n
  xor-with var, n

  var/reg <- copy var2/reg2
  copy-to var1, var2/reg
  var/reg <- copy var2
  var/reg <- copy n
  copy-to var, n

  compare var1, var2/reg
  compare var1/reg, var2
  compare var/eax, n
  compare var, n

  var/reg <- multiply var2

Notice that there are no primitive instructions operating on two variables in
memory. That's a restriction of the underlying x86 processor.

Any instruction above that takes a variable in memory can be replaced with a
dereference (`*`) of an address variable in a register. But you can't dereference
variables in memory.

## Byte operations

A special-case is variables of type 'byte'. Mu is a 32-bit platform so for the
most part only supports types that are multiples of 32 bits. However, we do
want to support strings in ASCII and UTF-8, which will be arrays of bytes.

Since most x86 instructions implicitly load 32 bits at a time from memory,
variables of type 'byte' are only allowed in registers, not on the stack. Here
are the possible instructions for reading bytes to/from memory:

  var/reg <- copy-byte var2/reg2      # var: byte, var2: byte
  var/reg <- copy-byte *var2/reg2     # var: byte, var2: (addr byte)
  copy-byte-to *var1/reg1, var2/reg2  # var1: (addr byte), var2: byte

In addition, variables of type 'byte' are restricted to (the lowest bytes of)
just 4 registers: eax, ecx, edx and ebx.

## Primitive jump instructions

There are two kinds of jumps, both with many variations: `break` and `loop`.
`break` instructions jump to the end of the containing block. `loop` instructions
jump to the beginning of the containing block.

Jumps can take an optional label starting with '$':

  loop $foo

This instruction jumps to the beginning of the block called $foo. It must lie
somewhere inside such a block. Jumps are only legal to containing blocks. Use
named blocks with restraint; jumps to places far away can get confusing.

There are two unconditional jumps:

  loop
  loop label
  break
  break label

The remaining jump instructions are all conditional. Conditional jumps rely on
the result of the most recently executed `compare` instruction. (To keep
programs easy to read, keep compare instructions close to the jump that uses
them.)

  break-if-=
  break-if-= label
  break-if-!=
  break-if-!= label

Inequalities are similar, but have unsigned and signed variants. We assume
unsigned variants are only ever used to compare addresses.

  break-if-<
  break-if-< label
  break-if->
  break-if-> label
  break-if-<=
  break-if-<= label
  break-if->=
  break-if->= label

  break-if-addr<
  break-if-addr< label
  break-if-addr>
  break-if-addr> label
  break-if-addr<=
  break-if-addr<= label
  break-if-addr>=
  break-if-addr>= label

Similarly, conditional loops:

  loop-if-=
  loop-if-= label
  loop-if-!=
  loop-if-!= label

  loop-if-<
  loop-if-< label
  loop-if->
  loop-if-> label
  loop-if-<=
  loop-if-<= label
  loop-if->=
  loop-if->= label

  loop-if-addr<
  loop-if-addr< label
  loop-if-addr>
  loop-if-addr> label
  loop-if-addr<=
  loop-if-addr<= label
  loop-if-addr>=
  loop-if-addr>= label

## Address operations

  var/reg: (addr T) <- address var: T         # var must be in mem (on the stack)

## Array operations

  var/reg: int <- length arr/reg: (addr array T)
  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
  var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
  var/reg: (addr T) <- index arr/reg: (addr array T), n
  var/reg: (addr T) <- index arr: (array T sz), n

  var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int     # arr can be in reg or mem
  var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int         # arr can be in reg or mem
  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)

## User-defined types

  var/reg: (addr T_f) <- get var/reg: (addr T), f
    where record (product) type T has elements a, b, c, ... of types T_a, T_b, T_c, ...
  var/reg: (addr T_f) <- get var: T, f

## Handles for safe access to the heap

Say we created a handle like this on the stack (it can't be in a register)
  var x: (handle T)
  allocate Heap, T, x

You can copy handles to another variable on the stack like this:
  var y: (handle T)
  copy-handle-to y, x

You can also save handles inside other user-defined types like this:
  var y/reg: (addr handle T_f) <- get var: (addr T), f
  copy-handle-to *y, x

Or this:
  var y/reg: (addr handle T) <- index arr: (addr array handle T), n
  copy-handle-to *y, x

Handles can be converted into addresses like this:
  var y/reg: (addr T) <- lookup x

It's illegal to continue to use this addr after a function that reclaims heap
memory. You have to repeat the lookup.
Mu programs are lists of functions. Each function has the following form:

  fn _name_ _inouts_with_types_ -> _outputs_with_types_ {
    _instructions_
  }

Each function has a header line, and some number of instructions, each on a
separate line.

Instructions may be primitives or function calls. Either way, all instructions
have one of the following forms:

  # defining variables
  var _name_: _type_
  var _name_/_register_: _type_

  # doing things with variables
  _operation_ _inouts_
  _outputs_ <- _operation_ _inouts_

Instructions and functions may have inouts and outputs. Both inouts and
outputs are variables.

As seen above, variables can be defined to live in a register, like this:

  n/eax

Variables not assigned a register live in the stack.

Function inouts must always be on the stack, and outputs must always be in
registers. A function call must always write to the exact registers its
definition requires. For example:

  fn foo -> x/eax: int {
    ...
  }
  fn main {
    a/eax <- foo  # ok
    a/ebx <- foo  # wrong
  }

Primitive inouts may be on the stack or in registers, but outputs must always
be in registers.

Functions can contain nested blocks inside { and }. Variables defined in a
block don't exist outside it.

  {
    _instructions_
    {
      _more instructions_
    }
  }

Blocks can be named like so:

  $name: {
    _instructions_
  }

## Primitive instructions

Primitive instructions currently supported in Mu ('n' indicates a literal
integer rather than a variable, and 'var/reg' indicates a variable in a
register):

  var/reg <- increment
  increment var
  var/reg <- decrement
  decrement var
  var1/reg1 <- add var2/reg2
  var/reg <- add var2
  add-to var1, var2/reg
  var/reg <- add n
  add-to var, n

  var1/reg1 <- sub var2/reg2
  var/reg <- sub var2
  sub-from var1, var2/reg
  var/reg <- sub n
  sub-from var, n

  var1/reg1 <- and var2/reg2
  var/reg <- and var2
  and-with var1, var2/reg
  var/reg <- and n
  and-with var, n

  var1/reg1 <- or var2/reg2
  var/reg <- or var2
  or-with var1, var2/reg
  var/reg <- or n
  or-with var, n

  var1/reg1 <- xor var2/reg2
  var/reg <- xor var2
  xor-with var1, var2/reg
  var/reg <- xor n
  xor-with var, n

  var/reg <- copy var2/reg2
  copy-to var1, var2/reg
  var/reg <- copy var2
  var/reg <- copy n
  copy-to var, n

  compare var1, var2/reg
  compare var1/reg, var2
  compare var/eax, n
  compare var, n

  var/reg <- multiply var2

Notice that there are no primitive instructions operating on two variables in
memory. That's a restriction of the underlying x86 processor.

Any instruction above that takes a variable in memory can be replaced with a
dereference (`*`) of an address variable in a register. But you can't dereference
variables in memory.

## Byte operations

A special-case is variables of type 'byte'. Mu is a 32-bit platform so for the
most part only supports types that are multiples of 32 bits. However, we do
want to support strings in ASCII and UTF-8, which will be arrays of bytes.

Since most x86 instructions implicitly load 32 bits at a time from memory,
variables of type 'byte' are only allowed in registers, not on the stack. Here
are the possible instructions for reading bytes to/from memory:

  var/reg <- copy-byte var2/reg2      # var: byte, var2: byte
  var/reg <- copy-byte *var2/reg2     # var: byte, var2: (addr byte)
  copy-byte-to *var1/reg1, var2/reg2  # var1: (addr byte), var2: byte

In addition, variables of type 'byte' are restricted to (the lowest bytes of)
just 4 registers: eax, ecx, edx and ebx.

## Primitive jump instructions

There are two kinds of jumps, both with many variations: `break` and `loop`.
`break` instructions jump to the end of the containing block. `loop` instructions
jump to the beginning of the containing block.

Jumps can take an optional label starting with '$':

  loop $foo

This instruction jumps to the beginning of the block called $foo. It must lie
somewhere inside such a block. Jumps are only legal to containing blocks. Use
named blocks with restraint; jumps to places far away can get confusing.

There are two unconditional jumps:

  loop
  loop label
  break
  break label

The remaining jump instructions are all conditional. Conditional jumps rely on
the result of the most recently executed `compare` instruction. (To keep
programs easy to read, keep compare instructions close to the jump that uses
them.)

  break-if-=
  break-if-= label
  break-if-!=
  break-if-!= label

Inequalities are similar, but have unsigned and signed variants. We assume
unsigned variants are only ever used to compare addresses.

  break-if-<
  break-if-< label
  break-if->
  break-if-> label
  break-if-<=
  break-if-<= label
  break-if->=
  break-if->= label

  break-if-addr<
  break-if-addr< label
  break-if-addr>
  break-if-addr> label
  break-if-addr<=
  break-if-addr<= label
  break-if-addr>=
  break-if-addr>= label

Similarly, conditional loops:

  loop-if-=
  loop-if-= label
  loop-if-!=
  loop-if-!= label

  loop-if-<
  loop-if-< label
  loop-if->
  loop-if-> label
  loop-if-<=
  loop-if-<= label
  loop-if->=
  loop-if->= label

  loop-if-addr<
  loop-if-addr< label
  loop-if-addr>
  loop-if-addr> label
  loop-if-addr<=
  loop-if-addr<= label
  loop-if-addr>=
  loop-if-addr>= label

## Address operations

  var/reg: (addr T) <- address var: T         # var must be in mem (on the stack)

## Array operations

  var/reg: int <- length arr/reg: (addr array T)
  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: int
  var/reg: (addr T) <- index arr: (array T sz), idx/reg: int
  var/reg: (addr T) <- index arr/reg: (addr array T), n
  var/reg: (addr T) <- index arr: (array T sz), n

  var/reg: (offset T) <- compute-offset arr: (addr array T), idx/reg: int     # arr can be in reg or mem
  var/reg: (offset T) <- compute-offset arr: (addr array T), idx: int         # arr can be in reg or mem
  var/reg: (addr T) <- index arr/reg: (addr array T), idx/reg: (offset T)

## User-defined types

  var/reg: (addr T_f) <- get var/reg: (addr T), f
    where record (product) type T has elements a, b, c, ... of types T_a, T_b, T_c, ...
  var/reg: (addr T_f) <- get var: T, f

## Handles for safe access to the heap

Say we created a handle like this on the stack (it can't be in a register)
  var x: (handle T)
  allocate Heap, T, x

You can copy handles to another variable on the stack like this:
  var y: (handle T)
  copy-handle-to y, x

You can also save handles inside other user-defined types like this:
  var y/reg: (addr handle T_f) <- get var: (addr T), f
  copy-handle-to *y, x

Or this:
  var y/reg: (addr handle T) <- index arr: (addr array handle T), n
  copy-handle-to *y, x

Handles can be converted into addresses like this:
  var y/reg: (addr T) <- lookup x

It's illegal to continue to use this addr after a function that reclaims heap
memory. You have to repeat the lookup.