about summary refs log tree commit diff stats
path: root/apps/mu.subx
diff options
context:
space:
mode:
authorKartik Agaram <vc@akkartik.com>2020-08-02 15:31:56 -0700
committerKartik Agaram <vc@akkartik.com>2020-08-02 15:50:19 -0700
commit89c9ed80f9f7f4d4d40fea44c6e08362cfde50c7 (patch)
tree2ab8f044346695447468303e1f74372e5f158d76 /apps/mu.subx
parent0f5d0ec519c5b6fbb36ace912426e6a3fb8aa8ec (diff)
downloadmu-89c9ed80f9f7f4d4d40fea44c6e08362cfde50c7.tar.gz
6706 - support utf-8
For example:

  fn main -> r/ebx: int {
    var x/eax: grapheme <- copy 0x9286e2  # code point 0x2192 in utf-8
    print-grapheme-to-real-screen x
    print-string-to-real-screen "\n"
  }

Graphemes must fit in 4 bytes (21 bits for code points). Unclear what we
should do for longer clusters since graphemes are a fixed-size type at
the moment.
Diffstat (limited to 'apps/mu.subx')
-rw-r--r--apps/mu.subx2
1 files changed, 2 insertions, 0 deletions
diff --git a/apps/mu.subx b/apps/mu.subx
index 20f59336..912b2b1f 100644
--- a/apps/mu.subx
+++ b/apps/mu.subx
@@ -414,6 +414,8 @@ Type-id:  # (stream (addr array byte))
   "slice"/imm32  # 12
   "code-point"/imm32  # 13; smallest scannable unit from a text stream
   "grapheme"/imm32  # 14; smallest printable unit; will eventually be composed of multiple code-points, but currently corresponds 1:1
+                    # only 4-byte graphemes in utf-8 are currently supported;
+                    # unclear how we should deal with larger clusters.
   # Keep Primitive-type-ids in sync if you add types here.
                                                           0/imm32
   0/imm32 0/imm32 0/imm32 0/imm32 0/imm32 0/imm32 0/imm32 0/imm32