about summary refs log tree commit diff stats
path: root/vocabulary.md
diff options
context:
space:
mode:
authorKartik Agaram <vc@akkartik.com>2020-07-05 15:28:37 -0700
committerKartik Agaram <vc@akkartik.com>2020-07-06 01:05:10 -0700
commit9a524793ee01ce47f3963768559a0d6c348631c5 (patch)
treeaa1045424609f1a14a7a8e5bfaa1517101acd894 /vocabulary.md
parent3ccb2c83280e22ad5b7f7c47d7bd95748b969521 (diff)
downloadmu-9a524793ee01ce47f3963768559a0d6c348631c5.tar.gz
6618 - new docs
Diffstat (limited to 'vocabulary.md')
-rw-r--r--vocabulary.md208
1 files changed, 208 insertions, 0 deletions
diff --git a/vocabulary.md b/vocabulary.md
new file mode 100644
index 00000000..426ed590
--- /dev/null
+++ b/vocabulary.md
@@ -0,0 +1,208 @@
+## Reference documentation on available primitives
+
+### Data Structures
+
+- Kernel strings: null-terminated regions of memory. Unsafe and to be avoided,
+  but needed for interacting with the kernel.
+
+- Arrays: length-prefixed regions of memory containing multiple elements of a
+  single type. Contents are preceded by 4 bytes (32 bits) containing the
+  `length` of the array in bytes.
+
+- Slices: a pair of 32-bit addresses denoting a [half-open](https://en.wikipedia.org/wiki/Interval_(mathematics))
+  \[`start`, `end`) interval to live memory with a consistent lifetime.
+
+  Invariant: `start` <= `end`
+
+- Streams: strings prefixed by 32-bit `write` and `read` indexes that the next
+  write or read goes to, respectively.
+
+  - offset 0: write index
+  - offset 4: read index
+  - offset 8: length of array (in bytes)
+  - offset 12: start of array data
+
+  Invariant: 0 <= `read` <= `write` <= `length`
+
+- File descriptors (fd): Low-level 32-bit integers that the kernel uses to
+  track files opened by the program.
+
+- File: 32-bit value containing either a fd or an address to a stream (fake
+  file).
+
+- Buffered files (buffered-file): Contain a file descriptor and a stream for
+  buffering reads/writes. Each `buffered-file` must exclusively perform either
+  reads or writes.
+
+### 'system calls'
+
+As I said at the top, a primary design goal of SubX (and Mu more broadly) is
+to explore ways to turn arbitrary manual tests into reproducible automated
+tests. SubX aims for this goal by baking testable interfaces deep into the
+stack, at the OS syscall level. The idea is that every syscall that interacts
+with hardware (and so the environment) should be *dependency injected* so that
+it's possible to insert fake hardware in tests.
+
+But those are big goals. Here are the syscalls I have so far:
+
+- `write`: takes two arguments, a file `f` and an address to array `s`.
+
+  Comparing this interface with the Unix `write()` syscall shows two benefits:
+
+  1. SubX can handle 'fake' file descriptors in tests.
+
+  1. `write()` accepts buffer and its length in separate arguments, which
+     requires callers to manage the two separately and so can be error-prone.
+     SubX's wrapper keeps the two together to increase the chances that we
+     never accidentally go out of array bounds.
+
+- `read`: takes two arguments, a file `f` and an address to stream `s`. Reads
+  as much data from `f` as can fit in (the free space of) `s`.
+
+  Like with `write()`, this wrapper around the Unix `read()` syscall adds the
+  ability to handle 'fake' file descriptors in tests, and reduces the chances
+  of clobbering outside array bounds.
+
+  One bit of weirdness here: in tests we do a redundant copy from one stream
+  to another. See [the comments before the implementation](http://akkartik.github.io/mu/html/060read.subx.html)
+  for a discussion of alternative interfaces.
+
+- `stop`: takes two arguments:
+  - `ed` is an address to an _exit descriptor_. Exit descriptors allow us to
+    `exit()` the program in production, but return to the test harness within
+    tests. That allows tests to make assertions about when `exit()` is called.
+  - `value` is the status code to `exit()` with.
+
+  For more details on exit descriptors and how to create one, see [the
+  comments before the implementation](http://akkartik.github.io/mu/html/059stop.subx.html).
+
+- `new-segment`
+
+  Allocates a whole new segment of memory for the program, discontiguous with
+  both existing code and data (heap) segments. Just a more opinionated form of
+  [`mmap`](http://man7.org/linux/man-pages/man2/mmap.2.html).
+
+- `allocate`: takes two arguments, an address to allocation-descriptor `ad`
+  and an integer `n`
+
+  Allocates a contiguous range of memory that is guaranteed to be exclusively
+  available to the caller. Returns the starting address to the range in `eax`.
+
+  An allocation descriptor tracks allocated vs available addresses in some
+  contiguous range of memory. The int specifies the number of bytes to allocate.
+
+  Explicitly passing in an allocation descriptor allows for nested memory
+  management, where a sub-system gets a chunk of memory and further parcels it
+  out to individual allocations. Particularly helpful for (surprise) tests.
+
+- ... _(to be continued)_
+
+I will continue to import syscalls over time from [the old Mu VM in the parent
+directory](https://github.com/akkartik/mu), which has experimented with
+interfaces for the screen, keyboard, mouse, disk and network.
+
+### primitives built atop system calls
+
+_(Compound arguments are usually passed in by reference. Where the results are
+compound objects that don't fit in a register, the caller usually passes in
+allocated memory for it.)_
+
+#### assertions for tests
+- `check-ints-equal`: fails current test if given ints aren't equal
+- `check-stream-equal`: fails current test if stream doesn't match string
+- `check-next-stream-line-equal`: fails current test if next line of stream
+  until newline doesn't match string
+
+#### error handling
+- `error`: takes three arguments, an exit-descriptor, a file and a string (message)
+
+  Prints out the message to the file and then exits using the provided
+  exit-descriptor.
+
+- `error-byte`: like `error` but takes an extra byte value that it prints out
+  at the end of the message.
+
+#### predicates
+- `kernel-string-equal?`: compares a kernel string with a string
+- `string-equal?`: compares two strings
+- `stream-data-equal?`: compares a stream with a string
+- `next-stream-line-equal?`: compares with string the next line in a stream, from
+  `read` index to newline
+
+- `slice-empty?`: checks if the `start` and `end` of a slice are equal
+- `slice-equal?`: compares a slice with a string
+- `slice-starts-with?`: compares the start of a slice with a string
+- `slice-ends-with?`: compares the end of a slice with a string
+
+#### writing to disk
+- `write`: string -> file
+  - Can also be used to cat a string into a stream.
+  - Will abort the entire program if destination is a stream and doesn't have
+    enough room.
+- `write-stream`: stream -> file
+  - Can also be used to cat one stream into another.
+  - Will abort the entire program if destination is a stream and doesn't have
+    enough room.
+- `write-slice`: slice -> stream
+  - Will abort the entire program if there isn't enough room in the
+    destination stream.
+- `append-byte`: int -> stream
+  - Will abort the entire program if there isn't enough room in the
+    destination stream.
+- `append-byte-hex`: int -> stream
+  - textual representation in hex, no '0x' prefix
+  - Will abort the entire program if there isn't enough room in the
+    destination stream.
+- `print-int32`: int -> stream
+  - textual representation in hex, including '0x' prefix
+  - Will abort the entire program if there isn't enough room in the
+    destination stream.
+- `write-buffered`: string -> buffered-file
+- `write-slice-buffered`: slice -> buffered-file
+- `flush`: buffered-file
+- `write-byte-buffered`: int -> buffered-file
+- `print-byte-buffered`: int -> buffered-file
+  - textual representation in hex, no '0x' prefix
+- `print-int32-buffered`: int -> buffered-file
+  - textual representation in hex, including '0x' prefix
+
+#### reading from disk
+- `read`: file -> stream
+  - Can also be used to cat one stream into another.
+  - Will silently stop reading when destination runs out of space.
+- `read-byte-buffered`: buffered-file -> byte
+- `read-line-buffered`: buffered-file -> stream
+  - Will abort the entire program if there isn't enough room.
+
+#### non-IO operations on streams
+- `new-stream`: allocates space for a stream of `n` elements, each occupying
+  `b` bytes.
+  - Will abort the entire program if `n*b` requires more than 32 bits.
+- `clear-stream`: resets everything in the stream to `0` (except its `length`).
+- `rewind-stream`: resets the read index of the stream to `0` without modifying
+  its contents.
+
+#### reading/writing hex representations of integers
+- `is-hex-int?`: takes a slice argument, returns boolean result in `eax`
+- `parse-hex-int`: takes a slice argument, returns int result in `eax`
+- `is-hex-digit?`: takes a 32-bit word containing a single byte, returns
+  boolean result in `eax`.
+- `from-hex-char`: takes a hexadecimal digit character in `eax`, returns its
+  numeric value in `eax`
+- `to-hex-char`: takes a single-digit numeric value in `eax`, returns its
+  corresponding hexadecimal character in `eax`
+
+#### tokenization
+
+from a stream:
+- `next-token`: stream, delimiter byte -> slice
+- `skip-chars-matching`: stream, delimiter byte
+- `skip-chars-not-matching`: stream, delimiter byte
+
+from a slice:
+- `next-token-from-slice`: start, end, delimiter byte -> slice
+  - Given a slice and a delimiter byte, returns a new slice inside the input
+    that ends at the delimiter byte.
+
+- `skip-chars-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
+- `skip-chars-not-matching-in-slice`:  curr, end, delimiter byte -> new-curr (in `eax`)