# Mu Syntax Here are two valid statements in Mu: ``` increment x y <- increment ``` Understanding when to use one vs the other is the critical idea in Mu. In short, the former increments a value in memory, while the latter increments a value in a register. Most languages start from some syntax and do what it takes to implement it. Mu, however, is designed as a safe way to program in [a regular subset of 32-bit x86 machine code](subx.md), _satisficing_ rather than optimizing for a clean syntax. To keep the mapping to machine code lightweight, Mu exclusively uses statements. Most statements map to a single instruction of machine code. Since the x86 instruction set restricts how many memory locations an instruction can use, Mu makes registers explicit as well. Variables must be explicitly mapped to specific registers; otherwise they live in memory. While you have to do your own register allocation, Mu will helpfully point out when you get it wrong. Statements consist of 3 parts: the operation, optional _inouts_ and optional _outputs_. Outputs come before the operation name and `<-`. Outputs are always registers; memory locations that need to be modified are passed in by reference in inouts. So Mu programmers need to make two new categories of decisions: whether to define variables in registers or memory, and whether to put variables to the left or right. There's always exactly one way to write any given operation. In return for this overhead you get a lightweight and future-proof stack. And Mu will provide good error messages to support you. Further down, this page enumerates all available primitives in Mu, and [a separate page](http://akkartik.github.io/mu/html/mu_instructions.html) describes how each primitive is translated to machine code. There is also a useful list of pre-defined functions (implemented in unsafe machine code) in [400.mu](http://akkartik.github.io/mu/html/400.mu.html) and [vocabulary.md](vocabulary.md). ## Functions and calls Zooming out from single statements, here's a complete sample program in Mu: ex2.mu Mu programs are lists of functions. Each function has the following form: ``` fn _name_ _inout_ ... -> _output_ ... { _statement_ _statement_ ... } ``` Each function has a header line, and some number of statements, each on a separate line. Headers describe inouts and outputs. Inouts can't be registers, and outputs _must_ be registers (specified using metadata after a `/`). Outputs can't take names. The above program also demonstrates a function call (to the function `do-add`). Function calls look the same as primitive statements: they can return (multiple) outputs in registers, and modify inouts passed in by reference. In addition, there's one more constraint: output registers must match the function header. For example: ``` fn f -> _/eax: int { ... } fn g { a/eax <- f # ok a/ebx <- f # wrong; `a` must be in register `eax` } ``` You can exit a function at any time with the `return` instruction. Give it the right number of arguments, and it'll assign them respectively to the function's outputs before jumping back to the caller. The function `main` is special; it is where the program starts running. It must always return a single int in register `ebx` (as the exit status of the process). It can also optionally accept an array of strings as input (from the shell command-line). To be precise, `main` must have one of the following two signatures: - `fn main -> _/ebx: int` - `fn main args: (addr array (addr array byte)) -> _/ebx: int` (The name of the inout is flexible.) Mu encloses multi-word types in parentheses, and types can get quite expressive. For example, you read `main`'s inout type as "an address to an array of addresses to arrays of bytes." Since addresses to arrays of bytes are almost always strings in Mu, you'll quickly learn to mentally shorten this type to "an address to an array of strings". Mu currently has no way to name magic constants. Instead, document integer literals using metadata after a `/`. For example: ``` var x/eax: int <- copy 3/margin-left ``` Here we use metadata in two ways: to specify a register for the variable `x` (checked), and to give a name to the constant `3` (unchecked; purely for documentation). Variables can't currently accept unchecked metadata for documentation. (Perhaps this should change.) ## Blocks Blocks are useful for grouping related statements. They're delimited by `{` and `}`, each alone on a line. Blocks can nest: ``` { _statements_ { _more statements_ } } ``` Blocks can be named (with the name ending in a `:` on the same line as the `{`): ``` $name: { _statements_ } ``` Further down we'll see primitive statements for skipping or repeating blocks. Besides control flow, the other use for blocks is... ## Local variables Functions can define new variables at any time with the keyword `var`. There are two variants of the `var` statement, for defining variables in registers or memory. ``` var name: type var name/reg: type <- ... ``` Variables on the stack are never initialized. (They're always implicitly zeroed out.) Variables in registers are always initialized. Register variables can go in 6 integer registers (`eax`, `ebx`, `ecx`, `edx`, `esi`, `edi`) or 8 floating-point registers (`xmm0`, `xmm1`, `xmm2`, `xmm3`, `xmm4`, `xmm5`, `xmm6`, `xmm7`). Defining a variable in a register either clobbers the previous variable (if it was defined in the same block) or shadows it temporarily (if it was defined in an outer block). Variables exist from their definition until the end of their containing block. Register variables may also die earlier if their register is clobbered by a new variable. Variables on the stack can be of many types (but not `byte`). Integer registers can only contain 32-bit values: `int`, `byte`, `boolean`, `(addr ...)`. Floating-point registers can only contain values of type `float`. ## Integer primitives Here is the list of arithmetic primitive operations supported by Mu. The name `n` indicates a literal integer rather than a variable, and `var/reg` indicates a variable in a register, though that's not always valid Mu syntax. ``` var/reg <- increment increment var var/reg <- decrement decrement var var1/reg1 <- add var2/reg2 var/reg <- add var2 add-to var1, var2/reg var/reg <- add n add-to var, n var1/reg1 <- subtract var2/reg2 var/reg <- subtract var2 subtract-from var1, var2/reg var/reg <- subtract n subtract-from var, n var1/reg1 <- and var2/reg2 var/reg <- and var2 and-with var1, var2/reg var/reg <- and n and-with var, n var1/reg1 <- or var2/reg2 var/reg <- or var2 or-with var1, var2/reg var/reg <- or n or-with var, n var1/reg1 <- xor var2/reg2 var/reg <- xor var2 xor-with var1, var2/reg var/reg <- xor n xor-with var, n var1/reg1 <- negate negate var var/reg <- copy var2/reg2 copy-to var1, var2/reg var/reg <- copy var2 var/reg <- copy n copy-to var, n compare var1, var2/reg compare var1/reg, var2 compare var/eax, n compare var, n var/reg <- shift-left n var/reg <- shift-right n var/reg <- shift-right-signed n shift-left var, n shift-right var, n shift-right-signed var, n var/reg <- multiply var2 ``` Any statement above that takes a variable in memory can be replaced with a dereference (`*`) of an address variable (of type `(addr ...)`) in a register. You can't dereference variables in memory. You have to load them into a register first. Excluding dereferences, the above statements must operate on non-address values with primitive types: `int`, `boolean` or `byte`. (Booleans are really just `int`s, and Mu assumes any value but `0` is true.) You can copy addresses to int variables, but not the other way around. ## Floating-point primitives These instructions may use the floating-point registers `xmm0` ... `xmm7` (denoted by `/xreg2` or `/xrm32`). They also use integer values on occasion (`/rm32` and `/r32`). ``` var/xreg <- add var2/xreg2 var/xreg <- add var2 var/xreg <- add *var2/reg2 var/xreg <- subtract var2/xreg2 var/xreg <- subtract var2 var/xreg <- subtract *var2/reg2 var/xreg <- multiply var2/xreg2 var/xreg <- multiply var2 var/xreg <- multiply *var2/reg2 var/xreg <- divide var2/xreg2 var/xreg <- divide var2 var/xreg <- divide *var2/reg2 var/xreg <- reciprocal var2/xreg2 var/xreg <- reciprocal var2 var/xreg <- reciprocal *var2/reg2 var/xreg <- square-root var2/xreg2 var/xreg <- square-root var2 var/xreg <- square-root *var2/reg2 var/xreg <- inverse-square-root var2/xreg2 var/xreg <- inverse-square-root var2 var/xreg <- inverse-square-root *var2/reg2 var/xreg <- min var2/xreg2 var/xreg <- min var2 var/xreg <- min *var2/reg2 var/xreg <- max var2/xreg2 var/xreg <- max var2 var/xreg <- max *var2/reg2 ``` Remember, when these instructions use indirect mode, they still use an integer register. Floating-point registers can't hold addresses. Two instructions in the above list are approximate. According to the Intel manual, `reciprocal` and `inverse-square-root` [go off the rails around the fourth decimal place](x86_approx.md). If you need more precision, use `divide` separately. Most instructions operate exclusively on integer or floating-point operands. The only exceptions are the instructions for converting between integers and floating-point numbers. ``` var/xreg <- convert var2/reg2 var/xreg <- convert var2 var/xreg <- convert *var2/reg2 var/reg <- convert var2/xreg2 var/reg <- convert var2 var/reg <- convert *var2/reg2 var/reg <- truncate var2/xreg2 var/reg <- truncate var2 var/reg <- truncate *var2/reg2 ``` There are no instructions accepting floating-point literals. To obtain integer literals in floating-point registers, copy them to general-purpose registers and then convert them to floating-point. The floating-point instructions above always write to registers. The only instructions that can write floats to memory are `copy` instructions. ``` var/xreg <- copy var2/xreg2 copy-to var1, var2/xreg var/xreg <- copy var2 var/xreg <- copy *var2/reg2 ``` Finally, there are floating-point comparisons. They must always put a register on the left-hand side: ``` compare var1/xreg1, var2/xreg2 compare var1/xreg1, var2 ``` ## Operating on individual bytes A special case is variables of type `byte`. Mu is a 32-bit platform so for the most part only supports types th
## Reference documentation on available primitives

### Data Structures

For memory safety, the following data structures are opaque and only modified
using functions described further down. I still find it useful to understand
how they work under the hood.

- Handles: addresses to objects allocated on the heap. They're augmented with
  book-keeping to guarantee memory-safety, and so cannot be stored in registers.
  See [mu.md](mu.md) for details, but in brief:
    - You need `addr` values to access data they point to.
    - You can't store `addr` values in other types. They're temporary.
    - You can store `handle` values in other types.
    - To convert `handle` to `addr`, use `lookup`.
    - Reclaiming memory (currently unimplemented) invalidates all `addr`
      values.

- Arrays: size-prefixed regions of memory containing multiple elements of a
  single type. Contents are preceded by 4 bytes (32 bits) containing the
  `size` of the array in bytes.

- Slices: a pair of 32-bit addresses denoting a [half-open](https://en.wikipedia.org/wiki/Interval_(mathematics))
  \[`start`, `end`) interval to live memory with a consistent lifetime.

  Invariant: `start` <= `end`

- Streams: strings prefixed by 32-bit `write` and `read` indexes that the next
  write or read goes to, respectively.

  - offset 0: write index
  - offset 4: read index
  - offset 8: size of array (in bytes)
  - offset 12: start of array data

  Invariant: 0 <= `read` <= `write` <= `size`

  Writes to a stream abort if it's full. Reads to a stream abort if it's
  empty.

- Graphemes: 32-bit fragments of utf-8 that encode a single Unicode code-point.
- Code-points: 32-bit integers representing a Unicode character.

### Functions

The most useful functions from 400.mu and later .mu files. Look for definitions
(using `ctags`) to see type signatures.

- `abort`: print a message in red on the bottom left of the screen and halt

#### assertions for tests

- `check`: fails current test if given boolean is false (`= 0`).
- `check-not`: fails current test if given boolean isn't false (`!= 0`).
- `check-ints-equal`: fails current test if given ints aren't equal.
- `check-strings-equal`: fails current test if given strings have different bytes.
- `check-stream-equal`: fails current test if stream's data doesn't match
  string in its entirety. Ignores the stream's read index.
- `check-array-equal`: fails if an array's elements don't match what's written
  in a whitespace-separated string.
- `check-next-stream-line-equal`: fails current test if next line of stream
  until newline doesn't match string.

#### predicates

-