about summary refs log tree commit diff stats
path: root/403unicode.mu
diff options
context:
space:
mode:
authorKartik K. Agaram <vc@akkartik.com>2021-08-29 22:16:34 -0700
committerKartik K. Agaram <vc@akkartik.com>2021-08-29 22:20:09 -0700
commit6e05a8fa27139ddf75a029ad94d44b48a92785b2 (patch)
tree8d04ae5d057030246305c9dc4b46fb2fe176f643 /403unicode.mu
parent4b90a26d71513f3b908b7f7ec651996ddf6460d6 (diff)
downloadmu-6e05a8fa27139ddf75a029ad94d44b48a92785b2.tar.gz
fix bad terminology: grapheme -> code point
Unix text-mode terminals transparently support utf-8 these days, and so
I treat utf-8 sequences (which I call graphemes in Mu) as fundamental.

I then blindly carried over this state of affairs to bare-metal Mu,
where it makes no sense. If you don't have a terminal handling
font-rendering for you, fonts are most often indexed by code points and
not utf-8 sequences.
Diffstat (limited to '403unicode.mu')
-rw-r--r--403unicode.mu15
1 files changed, 9 insertions, 6 deletions
diff --git a/403unicode.mu b/403unicode.mu
index 6ec30c3d..be002311 100644
--- a/403unicode.mu
+++ b/403unicode.mu
@@ -7,18 +7,21 @@
 # Graphemes may consist of multiple code points.
 #
 # Mu graphemes are always represented in utf-8, and they are required to fit
-# in 4 bytes.
+# in 4 bytes. (This can be confusing if you focus just on ASCII, where Mu's
+# graphemes and code-points are identical.)
 #
 # Mu doesn't currently support combining code points, or graphemes made of
 # multiple code points. One day we will.
-# We also don't currently support code points that translate into multiple
-# or wide graphemes. (In particular, Tab will never be supported.)
+#   https://en.wikipedia.org/wiki/Combining_character
+
+fn to-code-point in: grapheme -> _/eax: code-point {
+  var g/eax: grapheme <- copy in
+  var result/eax: code-point <- copy g  # TODO: support non-ASCII
+  return result
+}
 
 # transliterated from tb_utf8_unicode_to_char in https://github.com/nsf/termbox
 # https://wiki.tcl-lang.org/page/UTF%2D8+bit+by+bit explains the algorithm
-#
-# The day we want to support combining characters, this function will need to
-# take multiple code points. Or something.
 fn to-grapheme in: code-point -> _/eax: grapheme {
   var c/eax: int <- copy in
   var num-trailers/ecx: int <- copy 0