about summary refs log tree commit diff stats
path: root/src/utils/wordbreak.nim
Commit message (Collapse)AuthorAgeFilesLines
* utils: add twtunibptato2024-09-081-19/+17
| | | | | | | | | | | | | | | | | | | std/unicode has the following issues: * Rune is an int32, which implies overflow checking. Also, it is distinct, so you have to convert it manually to do arithmetic. * QJS libunicode and Chagashi work with uint32, interfacing with these required pointless type conversions. * fastRuneAt is a template, meaning it's pasted into every call site. Also, it decodes to UCS-4, so it generates two branches that aren't even used. Overall this lead to quite some code bloat. * fastRuneAt and lastRune have frustratingly different interfaces. Writing code to handle both cases is error prone. * On older Nim versions which we still support, std/unicode takes strings, not openArray[char]'s. Replace it with "twtuni", which includes some improved versions of the few procedures from std/unicode that we actually use.
* luwrap: use separate context (+ various cleanups)bptato2024-05-101-16/+28
| | | | | | Use a LUContext to only load required CharRanges once per pager. Also, add kana & hangul vi word break categories for convenience.
* luwrap: replace Nim unicode maps with libunicodebptato2024-05-091-0/+33
Instead of using the built-in (and outdated, and buggy) tables, we now use libunicode from QJS. This shaves some bytes off the executable, though far less than I had imagined it would. Also, a surprising effect of this change: because libunicode's tables aren't glitched out, kanji properly gets classified as alpha. I found this greatly annoying because `w' in Japanese text would now jump through whole sentences. As a band-aid solution I added an extra Han category, but I wish we had a more robust solution that could differentiate between *all* scripts. TODO: I suspect that separately loading the tables for every rune in breaksViWordCat is rather inefficient. Using some context object (at least per operation) would probably be beneficial.