about summary refs log tree commit diff stats
path: root/linux/vocabulary.md
blob: 2eefae333526ed7b689199bc241071836fd3ea64 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
## Reference documentation on available primitives

### Data Structures

For memory safety, the following data structures are opaque and only modified
using functions described further down. I still find it useful to understand
how they work under the hood.

- Handles: addresses to objects allocated on the heap. They're augmented with
  book-keeping to guarantee memory-safety, and so cannot be stored in registers.
  See [mu.md](mu.md) for details, but in brief:
    - You need `addr` values to access data they point to.
    - You can't store `addr` values in other types. They're temporary.
    - You can store `handle` values in other types.
    - To convert `handle` to `addr`, use `lookup`.
    - Reclaiming memory (currently unimplemented) invalidates all `addr`
      values.

- Kernel strings: null-terminated regions of memory. Unsafe and to be avoided,
  but needed for interacting with the kernel.

- Arrays: size-prefixed regions of memory containing multiple elements of a
  single type. Contents are preceded by 4 bytes (32 bits) containing the
  `size` of the array in bytes.

- Slices: a pair of 32-bit addresses denoting a [half-open](https://en.wikipedia.org/wiki/Interval_(mathematics))
  \[`start`, `end`) interval to live memory with a consistent lifetime.

  Invariant: `start` <= `end`

- Streams: strings prefixed by 32-bit `write` and `read` indexes that the next
  write or read goes to, respectively.

  - offset 0: write index
  - offset 4: read index
  - offset 8: size of array (in bytes)
  - offset 12: start of array data

  Invariant: 0 <= `read` <= `write` <= `size`

- File descriptors (fd): Low-level 32-bit integers that the kernel uses to
  track files opened by the program.

- File: 32-bit value containing either a fd or an address to a stream (fake
  file).

- Buffered files (buffered-file): Contain a file descriptor and a stream for
  buffering reads/writes. Each `buffered-file` must exclusively perform either
  reads or writes.

- Graphemes: 32-bit fragments of utf-8 that encode a single Unicode code-point.
- Code-points: 32-bit integers representing a Unicode character.

### 'system calls'

Low-level testable primitives for unsafe SubX code.

- `write`: takes two arguments, a file `f` and an address to array `s`.

  Comparing this interface with the Unix `write()` syscall shows two benefits:

  1. SubX can handle 'fake' file descriptors in tests.

  1. `write()` accepts buffer and its size in separate arguments, which
     requires callers to manage the two separately and so can be error-prone.
     SubX's wrapper keeps the two together to increase the chances that we
     never accidentally go out of array bounds.

- `read`: takes two arguments, a file `f` and an address to stream `s`. Reads
  as much data from `f` as can fit in (the free space of) `s`.

  Like with `write()`, this wrapper around the Unix `read()` syscall adds the
  ability to handle 'fake' file descriptors in tests, and reduces the chances
  of clobbering outside array bounds.

  One bit of weirdness here: in tests we do a redundant copy from one stream
  to another. See [the comments before the implementation](http://akkartik.github.io/mu/html/linux/111read.subx.html)
  for a discussion of alternative interfaces.

- `stop`: takes two arguments:
  - `ed` is an address to an _exit descriptor_. Exit descriptors allow us to
    `exit()` the program in production, but return to the test harness within
    tests. That allows tests to make assertions about when `exit()` is called.
  - `value` is the status code to `exit()` with.

  For more details on exit descriptors and how to create one, see [the
  comments before the implementation](http://akkartik.github.io/mu/html/linux/110stop.subx.html).

- `allocate`: takes two arguments, an address to allocation-descriptor `ad`
  and an integer `n`

  Allocates a contiguous range of memory that is guaranteed to be exclusively
  available to the caller. Returns the starting address to the range in `eax`.

  An allocation descriptor tracks allocated vs available addresses in some
  contiguous range of memory. The int specifies the number of bytes to allocate.

  Explicitly passing in an allocation descriptor allows for nested memory
  management, where a sub-system gets a chunk of memory and further parcels it
  out to individual allocations. Particularly helpful for (surprise) tests.

### Functions

The most useful functions from 400.mu and later .mu files. Look for definitions
(using `ctags`) to see type signatures.

_(Compound arguments are usually passed in by reference. Where the results are
compound objects that don't fit in a register, the caller usually passes in
allocated memory for it.)_

#### assertions for tests

- `check`: fails current test if given boolean is false (`= 0`).
- `check-not`: fails current test if given boolean isn't false (`!= 0`).
- `check-ints-equal`: fails current test if given ints aren't equal
- `check-array-equal`: only arrays of ints, passes in a literal array in a
  whitespace-separated string.
- `check-stream-equal`: fails current test if stream doesn't match string
- `check-next-stream-line-equal`: fails current test if next line of stream
  until newline doesn't match string

Every Mu computer has a global trace that programs can write to, and that
tests can make assertions on.

- `clear-trace-stream`
- `check-trace-contains`
- `check-trace-scans-to`: like `check-trace-contains` but with an implicit,
  stateful start index

#### error handling

- `error`: takes three arguments, an exit-descriptor, a file and a string (message)

  Prints out the message to the file and then exits using the provided
  exit-descriptor.

- `error-byte`: like `error` but takes an extra byte value that it prints out
  at the end of the message.

#### numbers

- `abs`
- `repeated-shift-left`, since x86 only supports bit-shifts by constant values
- `repeated-shift-right`
- `shift-left-bytes`: shift left by `n*8` bits
- `integer-divide`

Floating point constructors, since x86 doesn't support immediate floats and Mu
doesn't yet parse floating-point literals:

- `rational`: int, int -> float
- `fill-in-rational`: int, int, (addr float)
- `fill-in-sqrt`: int, (addr float)

#### arrays and strings

- `populate`: allocates space for `n` objects of the appropriate type.
- `copy-array-object`: allocates enough space and writes out a copy of an
  array of some type.
- `slice-to-string`: allocates space for an array of bytes and copies the
  slice into it.

- `array-equal?`
- `substring`: string, start, length -> string
- `split-string`: string, delimiter -> array of strings

#### predicates

- `kernel-string-equal?`: compares a kernel string with a string
- `string-equal?`: compares two strings
- `stream-data-equal?`: compares a stream with a string
- `next-stream-line-equal?`: compares with string the next line in a stream, from
  `read` index to newline

- `slice-empty?`: checks if the `start` and `end` of a slice are equal
- `slice-equal?`: compares a slice with a string
- `slice-starts-with?`: compares the start of a slice with a string
- `slice-ends-with?`: compares the end of a slice with a string

#### writing to disk

- `write`: string -> file
  - Can also be used to cat a string into a stream.
- `write-stream`: stream -> file
  - Can also be used to cat one stream into another.
- `write-stream-data`: stream -> file
  - Like `write-stream` but ignores read index.
- `write-slice`: slice -> stream
- `append-byte`: int -> stream
- `append-byte-hex`: int -> stream
  - textual representation in hex, no '0x' prefix

- `write-int`: int -> stream
  - write number to stream
- `write-int32-hex`: int -> stream
  - textual representation in hex, including '0x' prefix
- `write-int32-hex-buffered`: int -> buffered-file
- `write-int32-decimal`
- `write-int32-decimal-buffered`
- `write-buffered`: string -> buffered-file
- `write-slice-buffered`: slice -> buffered-file
- `flush`: buffered-file
- `write-byte-buffered`: int -> buffered-file
- `write-byte-buffered`: int -> buffered-file
  - textual representation in hex, no '0x' prefix
- `print-int32-buffered`: int -> buffered-file
  - textual representation in hex, including '0x' prefix

- `write-code-point-utf8`: code-point-utf8 -> stream
- `to-utf8`: code-point -> code-point-utf8

- `write-float-decimal-approximate`: float, precision: int -> stream

- `new-buffered-file`
- `populate-buffered-file-containing`: string -> buffered-file

Unless otherwise states, writes to a stream will abort the entire program if
there isn't enough room in the destination stream.

#### reading from disk

- `read`: file -> stream
  - Can also be used to cat one stream into another.
  - Will silently stop reading when destination runs out of space.
- `read-byte-buffered`: buffered-file -> byte
- `read-line-buffered`: buffered-file -> stream
  - Will abort the entire program if there isn't enough room.

- `read-code-point-utf8`: stream -> code-point-utf8
- `read-code-point-utf8-buffered`: buffered-file -> code-point-utf8

- `read-lines`: buffered-file -> array of strings

#### non-IO operations on streams

- `populate-stream`: allocates space in a stream for `n` objects of the
  appropriate type.
  - Will abort the entire program if `n*b` requires more than 32 bits.
- `clear-stream`: resets everything in the stream to `0` (except its `size`).
- `rewind-stream`: resets the read index of the stream to `0` without modifying
  its contents.

#### reading/writing hex representations of integers

- `is-hex-int?`: slice -> boolean
- `parse-hex-int`: string -> int
- `parse-hex-int-from-slice`: slice -> int
- `is-hex-digit?`: byte -> boolean

- `parse-array-of-ints`
- `parse-array-of-decimal-ints`

#### printing to screen

All screen primitives require a screen object, which can be either the real
screen on the computer or a fake screen for tests. Mu supports a subset of
Unix terminal properties supported by almost all modern terminal emulators.

- `enable-screen-type-mode` (default)
- `enable-screen-grid-mode`

- `clear-screen`
- `screen-size`

- `move-cursor`
- `hide-cursor`
- `show-cursor`

- `print-string`: string -> screen
- `print-stream`
- `print-code-point-utf8`
- `print-code-point`
- `print-int32-hex`
- `print-int32-decimal`
- `print-int32-decimal-right-justified`
- `print-array-of-ints-in-decimal`

- `print-float-hex`
- `print-float-decimal-approximate`: up to some precision

Printing to screen is stateful, and preserves formatting unless explicitly
manipulated.

- `reset-formatting`
- `start-color`: adjusts foreground and background
- `start-bold`
- `start-underline`
- `start-reverse-video`
- `start-blinking`

Assertions for tests:

- `screen-code-point-utf8-at`
- `screen-color-at`
- `screen-background-color-at`
- `screen-bold-at?`
- `screen-underline-at?`
- `screen-reverse-at?`
- `screen-blink-at?`

- `check-screen-row`
- `check-screen-row-from`
- `check-screen-row-in-color`
- `check-screen-row-in-color-from`
- `check-screen-row-in-background-color`
- `check-screen-row-in-background-color-from`
- `check-screen-row-in-bold`
- `check-screen-row-in-bold-from`
- `check-screen-row-in-underline`
- `check-screen-row-in-underline-from`
- `check-screen-row-in-reverse`
- `check-screen-row-in-reverse-from`
- `check-screen-row-in-blinking`
- `check-screen-row-in-blinking-from`

#### keyboard

- `enable-keyboard-type-mode`: process keystrokes on `enter` (default mode)
- `read-line-from-real-keyboard`

- `enable-keyboard-immediate-mode`: process keystrokes as they're typed
- `read-key-from-real-keyboard`

#### tokenization

from a stream:
- `next-token`: stream, delimiter byte -> slice
- `skip-chars-matching`: stream, delimiter byte
- `skip-chars-not-matching`: stream, delimiter byte

from a slice:
- `next-token-from-slice`: start, end, delimiter byte -> slice
  - Given a slice and a delimiter byte, returns a new slice inside the input
    that ends at the delimiter byte.

- `skip-chars-matching-in-slice`: curr, end, delimiter byte -> new-curr (in `eax`)
- `skip-chars-not-matching-in-slice`:  curr, end, delimiter byte -> new-curr (in `eax`)

#### miscellaneous sensors and actuators

- `open`: filename, write? -> buffered-file

- `time`: returns the time in seconds since the epoch.

- `ntime`: returns the number of nanoseconds since some arbitrary point.
  Saturates at 32 bits. Useful for fine-grained measurements over relatively
  short durations.

- `sleep`: sleep for some number of whole seconds and some fraction of a
  second expressed in nanoseconds. Not having decimal literals can be awkward
  here.