about summary refs log tree commit diff stats
path: root/shell/tokenize.mu
Commit message (Collapse)AuthorAgeFilesLines
* rename grapheme to code-point-utf8Kartik K. Agaram2021-11-091-50/+50
| | | | | | Longer name, but it doesn't lie. We have no data structure right now for combining multiple code points. And it makes no sense for the notion of a grapheme to conflate its Unicode encoding.
* shell: support loading 128x128px imagesKartik K. Agaram2021-07-281-1/+1
| | | | | | I'm loading them in uncompressed ASCII format, and all streams and gap buffers all over the place need to get massively scaled up to 256KB capacity. But the tests don't yet run out of RAM, so I'll keep going.
* shell: second notation for string literalsKartik K. Agaram2021-07-281-5/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've always been dissatisfied with the notion of escaping. It introduces a special-case meta-notation within the tokenizer, and the conventional approach leads to exponential "leaning toothpick syndrome" with each level of escaping. One potential "correct" solution is to keep string terminals parameterizable: [abc] => abc [=] => = [=[abc]=] => abc [=[a]bc]=] => a]bc [==[a]=]bc]==] => a]=]bc ..and so on. Basically the terminals grow linearly as the number of escapings grow. While this is workable, I'd like to wait until I actually need it, and then gauge whether the need is a sign of the stack growing too complex, with too many layers of notation/parsing. Mu's goal is just 3 notations, and it's going to require constant vigilance to keep that from growing. Therefore, for now, there are two notations for string literals, one symmetric and one balanced: "abc" => abc [abc] => abc The balancing notation permits nested brackets as long as they balance. [abc [def]] => abc [def] If you need unbalanced square brackets, use the symmetric terminals: "abc [def" => abc [def If you need double quotes inside strings, use the balanced notation: [abc "def] => abc "def If you need _both_ square brackets (whether balanced or unbalanced) and double quotes, you're currently shit outta luck.
* reading from streamsKartik K. Agaram2021-07-031-1/+1
| | | | | | The Mu shell has no string literals, only streams. No random access, only sequential access. But I've been playing fast and loose with its read pointer until now. Hopefully things are cleaned up now.
* one more bug, and documentation for infixKartik K. Agaram2021-06-231-23/+16
| | | | One error message gets a bit worse.
* beginnings of tokenization within symbolsKartik K. Agaram2021-06-221-1/+1
| | | | We're now down to 4 failing tests. But these will require surgery.
* clean up lexical categoriesKartik K. Agaram2021-06-221-44/+42
|
* start implementing infixKartik K. Agaram2021-06-211-281/+3
| | | | First step: undo operator support in tokenization.
* new macro: withKartik K. Agaram2021-06-201-2/+4
|
* start guessing parentheses based on indentationKartik K. Agaram2021-06-201-56/+28
|
* snapshotKartik K. Agaram2021-06-201-0/+55
| | | | | This is going better than expected; just 3 failing tests among the new ones.
* start emitting indent tokensKartik K. Agaram2021-06-181-1/+155
|
* redo next-token in more high-level termsKartik K. Agaram2021-06-181-73/+93
|
* .Kartik K. Agaram2021-06-181-30/+30
|
* .Kartik K. Agaram2021-06-181-49/+49
|
* start emitting token for newlineKartik K. Agaram2021-06-181-5/+12
|
* newlines are now a tokenKartik K. Agaram2021-06-181-1/+11
|
* start implementing indent-sensitivityKartik K. Agaram2021-06-181-2/+26
| | | | | | | General plan: stop skipping newlines during tokenization introduce a new indent token, initially skip it transparently start doing cleverer things
* .Kartik K. Agaram2021-06-181-6/+2
|
* .Kartik K. Agaram2021-06-181-8/+6
|
* shell: stop punning tokens as cellsKartik K. Agaram2021-06-181-80/+107
|
* shell: support negative integer literalsKartik K. Agaram2021-06-061-1/+96
| | | | We still don't support _any_ fractional literals, positive or negative.
* .Kartik K. Agaram2021-06-051-36/+84
|
* .Kartik K. Agaram2021-05-301-3/+3
|
* .Kartik K. Agaram2021-05-301-1/+1
|
* .Kartik K. Agaram2021-05-291-0/+1
|
* shell: non-stream tokens are now smallKartik K. Agaram2021-05-291-3/+11
|
* .Kartik K. Agaram2021-05-291-6/+4
|
* .Kartik K. Agaram2021-05-291-9/+10
|
* .Kartik K. Agaram2021-05-291-4/+4
|
* .Kartik K. Agaram2021-05-291-2/+0
|
* .Kartik K. Agaram2021-05-291-10/+10
|
* .Kartik K. Agaram2021-05-291-12/+12
|
* shell: start reducing the waste in tokenizeKartik K. Agaram2021-05-291-11/+11
|
* a second place with lousy storage managementKartik K. Agaram2021-05-191-0/+1
|
* disallow null tracesKartik K. Agaram2021-05-191-7/+28
| | | | | | We now use traces everywhere for error-checking. Null traces introduce the possibility of changing a functions error response, and therefore its semantics.
* give up on nested backquotes for nowKartik K. Agaram2021-05-071-6/+2
|
* first passing test for macroexpandKartik K. Agaram2021-05-061-24/+28
| | | | | | | | In the process I spent a long time tracking down a stray TODO in 108write.subx that I thought would abort but didn't since the switch to baremetal. Then after I reintroduced that assertion I had to go track down a bunch of buffer sizes. Stream sizes continue to be a huge mess.
* belatedly migrate stale example definitionsKartik K. Agaram2021-05-061-1/+5
| | | | | Also bare-bones syntax highlighting for .limg files. Doesn't work when .limg file is first file opened with Vim.
* reading and printing backquotes and unquotesKartik K. Agaram2021-05-031-25/+178
|
* shell: commentsKartik K. Agaram2021-04-291-0/+47
|
* .Kartik K. Agaram2021-04-291-1/+0
|
* load large definitionsKartik K. Agaram2021-04-291-1/+1
|
* shell: bugfix for stream literalsKartik K. Agaram2021-04-281-2/+33
| | | | | I was forgetting that callers sometimes reuse outputs between successive tokens.
* shell: stream literalsKartik K. Agaram2021-04-271-0/+7
|
* shell: tokenizing stream (string) literalsKartik K. Agaram2021-04-271-1/+78
| | | | We're calling them streams since they support appending.
* .Kartik K. Agaram2021-04-271-8/+1
|
* shell: dot tokenKartik K. Agaram2021-04-151-0/+58
|
* .Kartik K. Agaram2021-04-151-5/+9
|
* shell: quoteKartik K. Agaram2021-04-061-0/+14
|