<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=UTF-8"> <title>Mu - baremetal/403unicode.mu</title> <meta name="Generator" content="Vim/8.1"> <meta name="plugin-version" content="vim8.1_v1"> <meta name="syntax" content="none"> <meta name="settings" content="number_lines,use_css,pre_wrap,no_foldcolumn,expand_tabs,line_ids,prevent_copy="> <meta name="colorscheme" content="minimal-light"> <style type="text/css"> <!-- pre { white-space: pre-wrap; font-family: monospace; color: #000000; background-color: #c6c6c6; } body { font-size:12pt; font-family: monospace; color: #000000; background-color: #c6c6c6; } a { color:inherit; } * { font-size:12pt; font-size: 1em; } .PreProc { color: #c000c0; } .LineNr { } .Delimiter { color: #c000c0; } .SpecialChar { color: #d70000; } .Constant { color: #008787; } .muFunction { color: #af5f00; text-decoration: underline; } .muComment { color: #005faf; } --> </style> <script type='text/javascript'> <!-- /* function to open any folds containing a jumped-to line before jumping to it */ function JumpToLine() { var lineNum; lineNum = window.location.hash; lineNum = lineNum.substr(1); /* strip off '#' */ if (lineNum.indexOf('L') == -1) { lineNum = 'L'+lineNum; } var lineElem = document.getElementById(lineNum); /* Always jump to new location even if the line was hidden inside a fold, or * we corrected the raw number to a line ID. */ if (lineElem) { lineElem.scrollIntoView(true); } return true; } if ('onhashchange' in window) { window.onhashchange = JumpToLine; } --> </script> </head> <body onload='JumpToLine();'> <a href='https://github.com/akkartik/mu/blob/main/baremetal/403unicode.mu'>https://github.com/akkartik/mu/blob/main/baremetal/403unicode.mu</a> <pre id='vimCodeElement'> <span id="L1" class="LineNr"> 1 </span><span class="muComment"># Helpers for Unicode.</span> <span id="L2" class="LineNr"> 2 </span><span class="muComment">#</span> <span id="L3" class="LineNr"> 3 </span><span class="muComment"># Mu has no characters, only code points and graphemes.</span> <span id="L4" class="LineNr"> 4 </span><span class="muComment"># Code points are the indivisible atoms of text streams.</span> <span id="L5" class="LineNr"> 5 </span><span class="muComment"># <a href="https://en.wikipedia.org/wiki/Code_point">https://en.wikipedia.org/wiki/Code_point</a></span> <span id="L6" class="LineNr"> 6 </span><span class="muComment"># Graphemes are the smallest self-contained unit of text.</span> <span id="L7" class="LineNr"> 7 </span><span class="muComment"># Graphemes may consist of multiple code points.</span> <span id="L8" class="LineNr"> 8 </span><span class="muComment">#</span> <span id="L9" class="LineNr"> 9 </span><span class="muComment"># Mu graphemes are always represented in utf-8, and they are required to fit</span> <span id="L10" class="LineNr"> 10 </span><span class="muComment"># in 4 bytes.</span> <span id="L11" class="LineNr"> 11 </span><span class="muComment">#</span> <span id="L12" class="LineNr"> 12 </span><span class="muComment"># Mu doesn't currently support combining code points, or graphemes made of</span> <span id="L13" class="LineNr"> 13 </span><span class="muComment"># multiple code points. One day we will.</span> <span id="L14" class="LineNr"> 14 </span><span class="muComment"># We also don't currently support code points that translate into multiple</span> <span id="L15" class="LineNr"> 15 </span><span class="muComment"># or wide graphemes. (In particular, Tab will never be supported.)</span> <span id="L16" class="LineNr"> 16 </span> <span id="L17" class="LineNr"> 17 </span><span class="muComment"># transliterated from tb_utf8_unicode_to_char in <a href="https://github.com/nsf/termbox">https://github.com/nsf/termbox</a></span> <span id="L18" class="LineNr"> 18 </span><span class="muComment"># <a href="https://wiki.tcl-lang.org/page/UTF%2D8+bit+by+bit">https://wiki.tcl-lang.org/page/UTF%2D8+bit+by+bit</a> explains the algorithm</span> <span id="L19" class="LineNr"> 19 </span><span class="muComment">#</span> <span id="L20" class="LineNr"> 20 </span><span class="muComment"># The day we want to support combining characters, this function will need to</span> <span id="L21" class="LineNr"> 21 </span><span class="muComment"># take multiple code points. Or something.</span> <span id="L22" class="LineNr"> 22 </span><span class="PreProc">fn</span> <span class="muFunction"><a href='403unicode.mu.html#L22'>to-grapheme</a></span> in: code-point<span class="PreProc"> -> </span>_/<span class="Constant">eax</span>: grapheme <span class="Delimiter">{</span> <span id="L23" class="LineNr"> 23 </span> <span class="PreProc">var</span> c/<span class="Constant">eax</span>: int <span class="SpecialChar"><-</span> copy in <span id="L24" class="LineNr"> 24 </span> <span class="PreProc">var</span> num-trailers/<span class="Constant">ecx</span>: int <span class="SpecialChar"><-</span> copy <span class="Constant">0</span> <span id="L25" class="LineNr"> 25 </span> <span class="PreProc">var</span> first/<span class="Constant">edx</span>: int <span class="SpecialChar"><-</span> copy <span class="Constant">0</span> <span id="L26" class="LineNr"> 26 </span> $to-grapheme:compute-length: <span class="Delimiter">{</span> <span id="L27" class="LineNr"> 27 </span> <span class="muComment"># single byte: just return it</span> <span id="L28" class="LineNr"> 28 </span> compare c, <span class="Constant">0x7f</span> <span id="L29" class="LineNr"> 29 </span> <span class="Delimiter">{</span> <span id="L30" class="LineNr"> 30 </span> <span class="PreProc">break-if-></span> <span id="L31" class="LineNr"> 31 </span> <span class="PreProc">var</span> g/<span class="Constant">eax</span>: grapheme <span class="SpecialChar"><-</span> copy c <span id="L32" class="LineNr"> 32 </span> <span class="PreProc">return</span> g <span id="L33" class="LineNr"> 33 </span> <span class="Delimiter">}</span> <span id="L34" class="LineNr"> 34 </span> <span class="muComment"># 2 bytes</span> <span id="L35" class="LineNr"> 35 </span> compare c, <span class="Constant">0x7ff</span> <span id="L36" class="LineNr"> 36 </span> <span class="Delimiter">{</span> <span id="L37" class="LineNr"> 37 </span> <span class="PreProc">break-if-></span> <span id="L38" class="LineNr"> 38 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">1</span> <span id="L39" class="LineNr"> 39 </span> first <span class="SpecialChar"><-</span> copy <span class="Constant">0xc0</span> <span id="L40" class="LineNr"> 40 </span> <span class="PreProc">break</span> $to-grapheme:compute-length <span id="L41" class="LineNr"> 41 </span> <span class="Delimiter">}</span> <span id="L42" class="LineNr"> 42 </span> <span class="muComment"># 3 bytes</span> <span id="L43" class="LineNr"> 43 </span> compare c, <span class="Constant">0xffff</span> <span id="L44" class="LineNr"> 44 </span> <span class="Delimiter">{</span> <span id="L45" class="LineNr"> 45 </span> <span class="PreProc">break-if-></span> <span id="L46" class="LineNr"> 46 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">2</span> <span id="L47" class="LineNr"> 47 </span> first <span class="SpecialChar"><-</span> copy <span class="Constant">0xe0</span> <span id="L48" class="LineNr"> 48 </span> <span class="PreProc">break</span> $to-grapheme:compute-length <span id="L49" class="LineNr"> 49 </span> <span class="Delimiter">}</span> <span id="L50" class="LineNr"> 50 </span> <span class="muComment"># 4 bytes</span> <span id="L51" class="LineNr"> 51 </span> compare c, <span class="Constant">0x1fffff</span> <span id="L52" class="LineNr"> 52 </span> <span class="Delimiter">{</span> <span id="L53" class="LineNr"> 53 </span> <span class="PreProc">break-if-></span> <span id="L54" class="LineNr"> 54 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">3</span> <span id="L55" class="LineNr"> 55 </span> first <span class="SpecialChar"><-</span> copy <span class="Constant">0xf0</span> <span id="L56" class="LineNr"> 56 </span> <span class="PreProc">break</span> $to-grapheme:compute-length <span id="L57" class="LineNr"> 57 </span> <span class="Delimiter">}</span> <span id="L58" class="LineNr"> 58 </span> <span class="muComment"># more than 4 bytes: unsupported</span> <span id="L59" class="LineNr"> 59 </span> <span class="muComment"># TODO: print error message to stderr</span> <span id="L60" class="LineNr"> 60 </span> compare c, <span class="Constant">0x1fffff</span> <span id="L61" class="LineNr"> 61 </span> <span class="Delimiter">{</span> <span id="L62" class="LineNr"> 62 </span> <span class="PreProc">break-if-></span> <span id="L63" class="LineNr"> 63 </span> <span class="PreProc">return</span> <span class="Constant">0</span> <span id="L64" class="LineNr"> 64 </span> <span class="Delimiter">}</span> <span id="L65" class="LineNr"> 65 </span> <span class="Delimiter">}</span> <span id="L66" class="LineNr"> 66 </span> <span class="muComment"># emit trailer bytes, 6 bits from 'in', first two bits '10'</span> <span id="L67" class="LineNr"> 67 </span> <span class="PreProc">var</span> result/<span class="Constant">edi</span>: grapheme <span class="SpecialChar"><-</span> copy <span class="Constant">0</span> <span id="L68" class="LineNr"> 68 </span> <span class="Delimiter">{</span> <span id="L69" class="LineNr"> 69 </span> compare num-trailers, <span class="Constant">0</span> <span id="L70" class="LineNr"> 70 </span> <span class="PreProc">break-if-<=</span> <span id="L71" class="LineNr"> 71 </span> <span class="PreProc">var</span> tmp/<span class="Constant">esi</span>: int <span class="SpecialChar"><-</span> copy c <span id="L72" class="LineNr"> 72 </span> tmp <span class="SpecialChar"><-</span> and <span class="Constant">0x3f</span> <span id="L73" class="LineNr"> 73 </span> tmp <span class="SpecialChar"><-</span> or <span class="Constant">0x80</span> <span id="L74" class="LineNr"> 74 </span> result <span class="SpecialChar"><-</span> shift-left <span class="Constant">8</span> <span id="L75" class="LineNr"> 75 </span> result <span class="SpecialChar"><-</span> or tmp <span id="L76" class="LineNr"> 76 </span> <span class="muComment"># update loop state</span> <span id="L77" class="LineNr"> 77 </span> c <span class="SpecialChar"><-</span> shift-right <span class="Constant">6</span> <span id="L78" class="LineNr"> 78 </span> num-trailers <span class="SpecialChar"><-</span> decrement <span id="L79" class="LineNr"> 79 </span> <span class="PreProc">loop</span> <span id="L80" class="LineNr"> 80 </span> <span class="Delimiter">}</span> <span id="L81" class="LineNr"> 81 </span> <span class="muComment"># emit engine</span> <span id="L82" class="LineNr"> 82 </span> result <span class="SpecialChar"><-</span> shift-left <span class="Constant">8</span> <span id="L83" class="LineNr"> 83 </span> result <span class="SpecialChar"><-</span> or c <span id="L84" class="LineNr"> 84 </span> result <span class="SpecialChar"><-</span> or first <span id="L85" class="LineNr"> 85 </span> <span class="muComment">#</span> <span id="L86" class="LineNr"> 86 </span> <span class="PreProc">return</span> result <span id="L87" class="LineNr"> 87 </span><span class="Delimiter">}</span> <span id="L88" class="LineNr"> 88 </span> <span id="L89" class="LineNr"> 89 </span><span class="muComment"># TODO: bring in tests once we have check-ints-equal</span> <span id="L90" class="LineNr"> 90 </span> <span id="L91" class="LineNr"> 91 </span><span class="muComment"># read the next grapheme from a stream of bytes</span> <span id="L92" class="LineNr"> 92 </span><span class="PreProc">fn</span> <span class="muFunction"><a href='403unicode.mu.html#L92'>read-grapheme</a></span> in: (addr stream byte)<span class="PreProc"> -> </span>_/<span class="Constant">eax</span>: grapheme <span class="Delimiter">{</span> <span id="L93" class="LineNr"> 93 </span> <span class="muComment"># if at eof, return EOF</span> <span id="L94" class="LineNr"> 94 </span> <span class="Delimiter">{</span> <span id="L95" class="LineNr"> 95 </span> <span class="PreProc">var</span> eof?/<span class="Constant">eax</span>: boolean <span class="SpecialChar"><-</span> <a href='309stream.subx.html#L6'>stream-empty?</a> in <span id="L96" class="LineNr"> 96 </span> compare eof?, <span class="Constant">0</span>/false <span id="L97" class="LineNr"> 97 </span> <span class="PreProc">break-if-=</span> <span id="L98" class="LineNr"> 98 </span> <span class="PreProc">return</span> <span class="Constant">0xffffffff</span> <span id="L99" class="LineNr"> 99 </span> <span class="Delimiter">}</span> <span id="L100" class="LineNr">100 </span> <span class="PreProc">var</span> c/<span class="Constant">eax</span>: byte <span class="SpecialChar"><-</span> <a href='112read-byte.subx.html#L13'>read-byte</a> in <span id="L101" class="LineNr">101 </span> <span class="PreProc">var</span> num-trailers/<span class="Constant">ecx</span>: int <span class="SpecialChar"><-</span> copy <span class="Constant">0</span> <span id="L102" class="LineNr">102 </span> $read-grapheme:compute-length: <span class="Delimiter">{</span> <span id="L103" class="LineNr">103 </span> <span class="muComment"># single byte: just return it</span> <span id="L104" class="LineNr">104 </span> compare c, <span class="Constant">0xc0</span> <span id="L105" class="LineNr">105 </span> <span class="Delimiter">{</span> <span id="L106" class="LineNr">106 </span> <span class="PreProc">break-if->=</span> <span id="L107" class="LineNr">107 </span> <span class="PreProc">var</span> g/<span class="Constant">eax</span>: grapheme <span class="SpecialChar"><-</span> copy c <span id="L108" class="LineNr">108 </span> <span class="PreProc">return</span> g <span id="L109" class="LineNr">109 </span> <span class="Delimiter">}</span> <span id="L110" class="LineNr">110 </span> compare c, <span class="Constant">0xfe</span> <span id="L111" class="LineNr">111 </span> <span class="Delimiter">{</span> <span id="L112" class="LineNr">112 </span> <span class="PreProc">break-if-<</span> <span id="L113" class="LineNr">113 </span> <span class="PreProc">var</span> g/<span class="Constant">eax</span>: grapheme <span class="SpecialChar"><-</span> copy c <span id="L114" class="LineNr">114 </span> <span class="PreProc">return</span> g <span id="L115" class="LineNr">115 </span> <span class="Delimiter">}</span> <span id="L116" class="LineNr">116 </span> <span class="muComment"># 2 bytes</span> <span id="L117" class="LineNr">117 </span> compare c, <span class="Constant">0xe0</span> <span id="L118" class="LineNr">118 </span> <span class="Delimiter">{</span> <span id="L119" class="LineNr">119 </span> <span class="PreProc">break-if->=</span> <span id="L120" class="LineNr">120 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">1</span> <span id="L121" class="LineNr">121 </span> <span class="PreProc">break</span> $read-grapheme:compute-length <span id="L122" class="LineNr">122 </span> <span class="Delimiter">}</span> <span id="L123" class="LineNr">123 </span> <span class="muComment"># 3 bytes</span> <span id="L124" class="LineNr">124 </span> compare c, <span class="Constant">0xf0</span> <span id="L125" class="LineNr">125 </span> <span class="Delimiter">{</span> <span id="L126" class="LineNr">126 </span> <span class="PreProc">break-if->=</span> <span id="L127" class="LineNr">127 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">2</span> <span id="L128" class="LineNr">128 </span> <span class="PreProc">break</span> $read-grapheme:compute-length <span id="L129" class="LineNr">129 </span> <span class="Delimiter">}</span> <span id="L130" class="LineNr">130 </span> <span class="muComment"># 4 bytes</span> <span id="L131" class="LineNr">131 </span> compare c, <span class="Constant">0xf8</span> <span id="L132" class="LineNr">132 </span> <span class="Delimiter">{</span> <span id="L133" class="LineNr">133 </span> <span class="PreProc">break-if->=</span> <span id="L134" class="LineNr">134 </span> num-trailers <span class="SpecialChar"><-</span> copy <span class="Constant">3</span> <span id="L135" class="LineNr">135 </span> <span class="PreProc">break</span> $read-grapheme:compute-length <span id="L136" class="LineNr">136 </span> <span class="Delimiter">}</span> <span id="L137" class="LineNr">137 </span> <span class="muComment"># TODO: print error message</span> <span id="L138" class="LineNr">138 </span> <span class="PreProc">return</span> <span class="Constant">0</span> <span id="L139" class="LineNr">139 </span> <span class="Delimiter">}</span> <span id="L140" class="LineNr">140 </span> <span class="muComment"># prepend trailer bytes</span> <span id="L141" class="LineNr">141 </span> <span class="PreProc">var</span> result/<span class="Constant">edi</span>: grapheme <span class="SpecialChar"><-</span> copy c <span id="L142" class="LineNr">142 </span> <span class="PreProc">var</span> num-byte-shifts/<span class="Constant">edx</span>: int <span class="SpecialChar"><-</span> copy <span class="Constant">1</span> <span id="L143" class="LineNr">143 </span> <span class="Delimiter">{</span> <span id="L144" class="LineNr">144 </span> compare num-trailers, <span class="Constant">0</span> <span id="L145" class="LineNr">145 </span> <span class="PreProc">break-if-<=</span> <span id="L146" class="LineNr">146 </span> <span class="PreProc">var</span> tmp/<span class="Constant">eax</span>: byte <span class="SpecialChar"><-</span> <a href='112read-byte.subx.html#L13'>read-byte</a> in <span id="L147" class="LineNr">147 </span> <span class="PreProc">var</span> tmp2/<span class="Constant">eax</span>: int <span class="SpecialChar"><-</span> copy tmp <span id="L148" class="LineNr">148 </span> tmp2 <span class="SpecialChar"><-</span> <a href='403unicode.mu.html#L159'>shift-left-bytes</a> tmp2, num-byte-shifts <span id="L149" class="LineNr">149 </span> result <span class="SpecialChar"><-</span> or tmp2 <span id="L150" class="LineNr">150 </span> <span class="muComment"># update loop state</span> <span id="L151" class="LineNr">151 </span> num-byte-shifts <span class="SpecialChar"><-</span> increment <span id="L152" class="LineNr">152 </span> num-trailers <span class="SpecialChar"><-</span> decrement <span id="L153" class="LineNr">153 </span> <span class="PreProc">loop</span> <span id="L154" class="LineNr">154 </span> <span class="Delimiter">}</span> <span id="L155" class="LineNr">155 </span> <span class="PreProc">return</span> result <span id="L156" class="LineNr">156 </span><span class="Delimiter">}</span> <span id="L157" class="LineNr">157 </span> <span id="L158" class="LineNr">158 </span><span class="muComment"># needed because available primitives only shift by a literal/constant number of bits</span> <span id="L159" class="LineNr">159 </span><span class="PreProc">fn</span> <span class="muFunction"><a href='403unicode.mu.html#L159'>shift-left-bytes</a></span> n: int, k: int<span class="PreProc"> -> </span>_/<span class="Constant">eax</span>: int <span class="Delimiter">{</span> <span id="L160" class="LineNr">160 </span> <span class="PreProc">var</span> i/<span class="Constant">ecx</span>: int <span class="SpecialChar"><-</span> copy <span class="Constant">0</span> <span id="L161" class="LineNr">161 </span> <span class="PreProc">var</span> result/<span class="Constant">eax</span>: int <span class="SpecialChar"><-</span> copy n <span id="L162" class="LineNr">162 </span> <span class="Delimiter">{</span> <span id="L163" class="LineNr">163 </span> compare i, k <span id="L164" class="LineNr">164 </span> <span class="PreProc">break-if->=</span> <span id="L165" class="LineNr">165 </span> compare i, <span class="Constant">4</span> <span class="muComment"># only 4 bytes in 32 bits</span> <span id="L166" class="LineNr">166 </span> <span class="PreProc">break-if->=</span> <span id="L167" class="LineNr">167 </span> result <span class="SpecialChar"><-</span> shift-left <span class="Constant">8</span> <span id="L168" class="LineNr">168 </span> i <span class="SpecialChar"><-</span> increment <span id="L169" class="LineNr">169 </span> <span class="PreProc">loop</span> <span id="L170" class="LineNr">170 </span> <span class="Delimiter">}</span> <span id="L171" class="LineNr">171 </span> <span class="PreProc">return</span> result <span id="L172" class="LineNr">172 </span><span class="Delimiter">}</span> <span id="L173" class="LineNr">173 </span> <span id="L174" class="LineNr">174 </span><span class="muComment"># write a grapheme to a stream of bytes</span> <span id="L175" class="LineNr">175 </span><span class="muComment"># this is like write-to-stream, except we skip leading 0 bytes</span> <span id="L176" class="LineNr">176 </span><span class="PreProc">fn</span> <span class="muFunction"><a href='403unicode.mu.html#L176'>write-grapheme</a></span> out: (addr stream byte), g: grapheme <span class="Delimiter">{</span> <span id="L177" class="LineNr">177 </span>$write-grapheme:body: <span class="Delimiter">{</span> <span id="L178" class="LineNr">178 </span> <span class="PreProc">var</span> c/<span class="Constant">eax</span>: int <span class="SpecialChar"><-</span> copy g <span id="L179" class="LineNr">179 </span> <a href='115write-byte.subx.html#L12'>append-byte</a> out, c <span class="muComment"># first byte is always written</span> <span id="L180" class="LineNr">180 </span> c <span class="SpecialChar"><-</span> shift-right <span class="Constant">8</span> <span id="L181" class="LineNr">181 </span> compare c, <span class="Constant">0</span> <span id="L182" class="LineNr">182 </span> <span class="PreProc">break-if-=</span> $write-grapheme:body <span id="L183" class="LineNr">183 </span> <a href='115write-byte.subx.html#L12'>append-byte</a> out, c <span id="L184" class="LineNr">184 </span> c <span class="SpecialChar"><-</span> shift-right <span class="Constant">8</span> <span id="L185" class="LineNr">185 </span> compare c, <span class="Constant">0</span> <span id="L186" class="LineNr">186 </span> <span class="PreProc">break-if-=</span> $write-grapheme:body <span id="L187" class="LineNr">187 </span> <a href='115write-byte.subx.html#L12'>append-byte</a> out, c <span id="L188" class="LineNr">188 </span> c <span class="SpecialChar"><-</span> shift-right <span class="Constant">8</span> <span id="L189" class="LineNr">189 </span> compare c, <span class="Constant">0</span> <span id="L190" class="LineNr">190 </span> <span class="PreProc">break-if-=</span> $write-grapheme:body <span id="L191" class="LineNr">191 </span> <a href='115write-byte.subx.html#L12'>append-byte</a> out, c <span id="L192" class="LineNr">192 </span><span class="Delimiter">}</span> <span id="L193" class="LineNr">193 </span><span class="Delimiter">}</span> </pre> </body> </html> <!-- vim: set foldmethod=manual : -->