mu - Soul of a tiny new machine. More thorough tests → More comprehensible and rewrite-friendly software → More resilient society.

	Commit message (Collapse)	Author	Age	Files	Lines
*	7526	Kartik Agaram	2021-01-16	1	-0/+0
\|
*	7329 - snapshot: advent day 4 part 2	Kartik Agaram	2020-12-04	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \|	I've found two bugs in SubX libraries: 1. next-word had an out-of-bounds read 2. next-word was skipping comments, because that's what I need during bootstrapping. I've created a new variant called next-raw-word that doesn't skip comments. These really need better names. We're now at the point where 4b.mu has the right structure and returns identical result to 4a.mu.
*	7238 - mu.subx: final restrictions on 'addr'	Kartik Agaram	2020-11-15	1	-0/+0
\| \| \| \|	I had to tweak one app that wasn't following the rules.
*	7225	Kartik Agaram	2020-11-11	1	-0/+0
\| \| \| \| \| \| \|	Both manual tests described in commit 7222 now work. To make them work I had to figure out how to copy a file. It requires a dependency on a new syscall: lseek.
*	7173	Kartik Agaram	2020-11-03	1	-0/+0
\| \| \| \|	All tests passing again.
*	7138 - type-check array 'length' instruction	Kartik Agaram	2020-10-29	1	-0/+0
\|
*	7101 - tile: remove quotes when evaluating strings	Kartik Agaram	2020-10-25	1	-0/+0
\| \| \| \|	This found several bugs due to me not checking for null strings.
*	6946 - print floats somewhat intuitively in hex	Kartik Agaram	2020-10-04	1	-0/+0
\|
*	6908 - compiling all floating-point operations	Kartik Agaram	2020-09-30	1	-0/+0
\| \| \| \| \|	We don't yet support emulating these instructions in `bootstrap`. But generated binaries containing them run natively just fine.
*	6783	Kartik Agaram	2020-09-16	1	-0/+0
\| \| \| \|	An extra test that should have been in commit 6781.
*	6781 - new app: RPN (postfix) calculator	Kartik Agaram	2020-09-15	1	-0/+0
\| \| \| \|	This was surprisingly hard; bugs discovered all over the place.
*	6733 - read utf-8 'grapheme' from byte stream	Kartik Agaram	2020-08-28	1	-0/+0
\| \| \| \| \| \|	No support for combining characters. Graphemes are currently just utf-8 encodings of a single Unicode code-point. No support for code-points that require more than 32 bits in utf-8.
*	6719 - error-checking for 'index' instructions	Kartik Agaram	2020-08-21	1	-0/+0
\| \| \| \| \| \| \| \|	1000+ LoC spent; just 300+ excluding tests. Still one known gap; we don't check the entirety of an array's element type if it's a compound. So far we just check if say both sides start with 'addr'. Obviously that's not good enough.
*	6622 - new syscalls: time and ntime	Kartik Agaram	2020-07-08	1	-0/+0
\| \| \| \| \|	As a side-effect I find that my Linode can print ~100k chars/s. At 50 rows and 200 columns per screen, it's 10 frames/s.
*	6604 - new app	Kartik Agaram	2020-07-01	1	-0/+0
\| \| \| \| \| \|	https://archive.org/details/akkartik-2min-2020-07-01 In the process I found a bug, added a new syscall, and 'emulated' it.
*	6597	Kartik Agaram	2020-06-29	1	-0/+0
\|
*	6596	Kartik Agaram	2020-06-29	1	-0/+0
\|
*	6595	Kartik Agaram	2020-06-29	1	-0/+0
\|
*	6594 - start standardizing the meaning of 'print'	Kartik Agaram	2020-06-29	1	-0/+0
\|
*	6528	Kartik Agaram	2020-06-15	1	-0/+0
\|
*	6520 - new app: parse-int	Kartik Agaram	2020-06-14	1	-0/+0
\| \| \| \| \| \|	Several bugs fixed in the process, and expectation of further bugs is growing. I'd somehow started assuming I don't need to have separate cases for rm32 as a register vs mem. That's not right. We might need more reg-reg Primitives.
*	6508 - support null exit-descriptor	Kartik Agaram	2020-06-10	1	-0/+0
\|
*	6507 - use syscall names everywhere	Kartik Agaram	2020-06-10	1	-0/+0
\|
*	6409 - primitives for text-mode UIs	Kartik Agaram	2020-05-27	1	-0/+0
\|
*	6406 - primitive 'copy-handle'	Kartik Agaram	2020-05-25	1	-0/+0
\|
*	6382 - re-enable mu.subx in CI	Kartik Agaram	2020-05-22	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \|	I thought I'd done this in the previous commit, but I hadn't. And, what's more, there was a bug that seemed pretty tough for a time. Turns out my self-hosted translator doesn't support '.' comment tokens in data segments. Hopefully I'm past the valley of the shadow of death now. "I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS." -- James Mickens (https://www.usenix.org/system/files/1311_05-08_mickens.pdf)
*	update binaries	Kartik Agaram	2020-05-22	1	-0/+0
\| \| \| \|	CI should start passing again now.
*	handle nulls in lookup	Kartik Agaram	2020-05-18	1	-0/+0
\| \| \| \| \| \| \| \| \|	Cleaner abstraction, but adds 3 instructions to our overhead for handles, including one potentially-hard-to-predict jump :/ I wish I could have put the alloc id in eax for the comparison as well, to save a few bytes of instruction space. But that messes up the non-null case.
*	support 'fake' handles allocated statically	Kartik Agaram	2020-05-18	1	-0/+0
\| \| \| \| \| \| \| \|	Mystery solved of why the syntax sugar phases don't work even though they don't use any functions whose signatures changed in the migration to handles. The answer: they use the Registers table, and it needs to use handles rather than raw strings.
*	support 'fake' handles allocated statically	Kartik Agaram	2020-05-18	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mystery solved of why the syntax sugar phases don't work even though they don't use any functions whose signatures changed in the migration to handles. The answer: they use the Registers table, and it currently doesn't use handles. Rather than create a whole new set of functions that operate on addresses, I'm going to create fake handles that are never intended to be reclaimed. Which raises the question of the best way to do that. I'd like to continue using string syntax, so I'm going to use a prefix in the payload that can also be rendered as a string. But all the printable characters start with 0x20, and we don't currently have escape sequences for null or any other non-printable characters. I _could_ use newlines, but that seems overly clever. So instead I'll once again not worry about some hypothetical problem with running out of alloc-ids, and just carve out half of the id space that can't be used for real alloc ids. Ascii doesn't use the most significant bit of bytes, so it seems like a natural separation.
*	Rebuild phases of self-hosted SubX translator	Kartik Agaram	2020-05-18	1	-0/+0
\| \| \| \|	For this one commit we need to bootstrap ourselves with subx_translate_debug.
*	6208	Kartik Agaram	2020-04-22	1	-0/+0
\|
*	6182 - start of support for safe handles	Kartik Agaram	2020-04-03	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far it's unclear how to do this in a series of small commits. Still nibbling around the edges. In this commit we standardize some terminology: The length of an array or stream is denominated in the high-level elements. The _size_ is denominated in bytes. The thing we encode into the type is always the size, not the length. There's still an open question of what to do about the Mu `length` operator. I'd like to modify it to provide the length. Currently it provides the size. If I can't fix that I'll rename it.
*	6181	Kartik Agaram	2020-04-03	1	-0/+0
\|
*	6153 - switch 'main' to use Mu strings	Kartik Agaram	2020-03-15	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \|	At the SubX level we have to put up with null-terminated kernel strings for commandline args. But so far we haven't done much with them. Rather than try to support them we'll just convert them transparently to standard length-prefixed strings. In the process I realized that it's not quite right to treat the combination of argc and argv as an array of kernel strings. Argc counts the number of elements, whereas the length of an array is usually denominated in bytes.
*	6094 - new 'compute-offset' instruction	Kartik Agaram	2020-03-07	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If indexing into a type with power-of-2-sized elements we can access them in one instruction: x/reg1: (addr int) <- index A/reg2: (addr array int), idx/reg3: int This translates to a single instruction because x86 instructions support an addressing mode with left-shifts. For non-powers-of-2, however, we need a multiply. To keep things type-safe, it is performed like this: x/reg1: (offset T) <- compute-offset A: (addr array T), idx: int y/reg2: (addr T) <- index A, x An offset is just an int that is guaranteed to be a multiple of size-of(T). Offsets can only be used in index instructions, and the types will eventually be required to line up. In the process, I have to expand Input-size because mu.subx is growing big.
*	6085	Kartik Agaram	2020-03-06	1	-0/+0
\| \| \| \|	Support parsing ints from strings rather than slices.
*	6083	Kartik Agaram	2020-03-06	1	-0/+0
\|
*	6070	Kartik Agaram	2020-02-29	1	-0/+0
\|
*	6064	Kartik Agaram	2020-02-27	1	-0/+0
\| \| \| \|	Fix CI.
*	6000 - clean up after no-local branches	Kartik Agaram	2020-02-09	1	-0/+0
\|
*	5999	Kartik Agaram	2020-02-09	1	-0/+0
\| \| \| \| \|	Fix CI. apps/survey was running out of space in the trace segment when translating apps/mu.subx
*	5948 - branching to named blocks	Kartik Agaram	2020-01-29	1	-0/+0
\|
*	5933	Kartik Agaram	2020-01-27	1	-0/+0
\| \| \| \|	Expand some buffer sizes to continue building mu.subx natively.
*	5898 - strengthen slice-empty? check	Kartik Agaram	2020-01-19	1	-0/+0
\| \| \| \| \| \| \| \| \| \| \|	Anytime we create a slice, the first check tends to be whether it's empty. If we handle ill-formed slices here where start > end, that provides a measure of safety. In the Mu translator (mu.subx) we often check for a trailing ':' or ',' and decrement slice->end to ignore it. But that could conceivably yield ill-formed slices if the slice started out empty. Now we make sure we never operate on such ill-formed slices.
*	5887 - reorganize library	Kartik Agaram	2020-01-14	1	-0/+0
\| \| \| \| \| \| \|	Layers 0-89 are used in self-hosting SubX. Layers 90-99 are not needed for self-hosting SubX, and therefore could use transitional levels of syntax sugar. Layers 100 and up use all SubX syntax sugar.
*	5847 - literal inputs	Kartik Agaram	2019-12-31	1	-0/+0
\|
*	5804	Kartik Agaram	2019-12-08	1	-0/+0
\| \| \| \| \|	Try to make the comments consistent with the type system we'll eventually have.
*	5803	Kartik Agaram	2019-12-07	1	-0/+0
\|
*	5792	Kartik Agaram	2019-12-05	1	-0/+0
\| \| \| \| \|	Fix a bug in one test: it checks eax when the component under test returns nothing. It's been just accidentally passing all these months.

import std/strutils proc toggle[T](s: var set[T], t: T): bool = result = t notin s if result: s.incl(t) else: s.excl(t) type BracketState = enum bsNone, bsInBracketRef, bsInBracket, bsAfterBracket, bsInParen, bsInImage, bsInTag proc getId(line: openArray[char]): string = result = "" var i = 0 var bs = bsNone var escape = false while i < line.len: let c = line[i] if bs == bsInParen: if escape: escape = false inc i continue if c == ')': bs = bsNone elif c == '\\': escape = true inc i continue case c of 'A'..'Z': result &= char(int(c) - int('A') + int('a')) of 'a'..'z', '-', '_', '.': result &= c of ' ': result &= '-' of '[': if bs != bsNone: bs = bsInBracket of ']': if bs == bsInBracket: bs = bsAfterBracket of '(': if bs == bsAfterBracket: bs = bsInParen else: discard inc i type InlineState = enum isItalic, isBold, isCode, isComment, isDel const AsciiAlphaNumeric = {'0'..'9', 'A'..'Z', 'a'..'z'} func startsWithScheme(s: string): bool = for i, c in s: if i > 0 and c == ':': return true if c notin AsciiAlphaNumeric: break false proc parseInline(line: openArray[char]) = var state: set[InlineState] = {} var bs = bsNone var i = 0 var bracketChars = "" var quote = false var image = false template append(s: untyped) = if bs in {bsInBracketRef, bsInBracket}: bracketChars &= s else: stdout.write(s) while i < line.len: let c = line[i] if bs == bsAfterBracket and c != '(': stdout.write("[" & bracketChars & "]") bracketChars = "" bs = bsNone image = false if quote: append c elif isComment in state: if i + 2 < line.len and line.toOpenArray(i, i + 2) == "-->": state.excl(isComment) append "-->" i += 2 else: append c elif bs == bsInTag: if c == '>': # done if bracketChars.startsWithScheme(): # link var linkChars = "" for c in bracketChars: if c == '\'': linkChars &= "&apos" else: linkChars &= c stdout.write("<A HREF='" & linkChars & "'>" & bracketChars & "</A>") else: # tag stdout.write('<' & bracketChars & '>') bracketChars = "" bs = bsNone elif c == '<': stdout.write('<' & bracketChars) bracketChars = "" else: bracketChars &= c elif isCode in state: case c of '<': append "<" of '>': append ">" of '"': append """ of '\'': append "'" of '&': append "&" of '`': append "</CODE>" state.excl(isCode) else: append c elif c == '\\': quote = true elif c == '*' or c == '_' and (i == 0 or line[i - 1] notin AsciiAlphaNumeric or i + 1 >= line.len or line[i + 1] notin AsciiAlphaNumeric + {'_'}): if i + 1 < line.len and line[i + 1] == c: if state.toggle(isBold): append "" else: append "" inc i else: if state.toggle(isItalic): stdout.write("") else: stdout.write("") elif c == '`': state.incl(isCode) append "<CODE>" elif c == '~' and i + 1 < line.len and line[i + 1] == '~': if state.toggle(isDel): append "<DEL>" else: append "</DEL>" inc i elif c == '!' and bs == bsNone and i + 1 < line.len and line[i + 1] == '[': image = true elif c == '[' and bs == bsNone: bs = bsInBracket if i + 1 < line.len and line[i + 1] == '^': inc i bs = bsInBracketRef elif c == ']' and bs == bsInBracketRef: let id = bracketChars.getId() stdout.write("<A HREF='#" & id & "'>" & bracketChars & "</A>") bracketChars = "" elif c == ']' and bs == bsInBracket: bs = bsAfterBracket elif c == '(' and bs == bsAfterBracket: if image: stdout.write("<IMG SRC='") else: stdout.write("<A HREF='") bs = bsInParen elif c == ')' and bs == bsInParen: if image: stdout.write("' ALT='" & bracketChars & "'>") else: stdout.write("'>" & bracketChars & "</A>") image = false bracketChars = "" bs = bsNone elif c == '\'' and bs == bsInParen: stdout.write("'") elif c == '<' and bs == bsNone: bs = bsInTag bracketChars = "" elif i + 4 < line.len and line.toOpenArray(i, i + 3) == "") if i != -1: stdout.write(line.substr(0, i + 2)) state.blockType = btNone line.substr(i + 3).parseInline() else: stdout.write(line & "\n") proc main() = var line: string var state = ParseState(listDepth: -1) while state.reprocess or stdin.readLine(line): state.reprocess = false case state.blockType of btNone: state.parseNone(line) of btPre: state.parsePre(line) of btTabPre: state.parseTabPre(line) of btSpacePre: state.parseSpacePre(line) of btList: state.parseList(line) of btPar: state.parsePar(line) of btHTML: state.parseHTML(line) of btHTMLPre: state.parseHTMLPre(line) of btComment: state.parseComment(line) state.blockData.parseInline() main()