diff options
author | bptato <nincsnevem662@gmail.com> | 2023-07-14 18:51:15 +0200 |
---|---|---|
committer | bptato <nincsnevem662@gmail.com> | 2023-07-14 18:51:38 +0200 |
commit | 96fceba76a218799d49652befab8b99289b07133 (patch) | |
tree | fa597339a887cf2bdbc40772add6b2dec49318e2 /src/html | |
parent | 20aac96068c9b4e41d2571132fcf6558ec2eb487 (diff) | |
download | chawan-96fceba76a218799d49652befab8b99289b07133.tar.gz |
htmlparser: correct outdated comment
Diffstat (limited to 'src/html')
-rw-r--r-- | src/html/htmlparser.nim | 12 |
1 files changed, 3 insertions, 9 deletions
diff --git a/src/html/htmlparser.nim b/src/html/htmlparser.nim index 202d3860..41bcf66b 100644 --- a/src/html/htmlparser.nim +++ b/src/html/htmlparser.nim @@ -87,25 +87,19 @@ type ## the stack. (e.g. say `charsets = @[CHARSET_UTF_16_LE, CHARSET_UTF_8]`, ## then utf-16-le is tried before utf-8.) ## * BOM sniffing is attempted. If successful, confidence is set to - ## certain and the resulting charset is pushed on top of the charset - ## stack. (Continuing the previous example: if BOM sniffing determines - ## the character encoding to be UTF-8, then utf-8 will be tried before - ## utf-16-le.) + ## certain and the resulting charset is used (i.e. other character + ## sets will not be tried for decoding this document.) ## * If the charset stack is empty, UTF-8 is pushed on top. ## * Attempt to parse the document with the first charset on top of ## the stack. ## * If BOM sniffing was unsuccessful, and a <meta charset=...> tag ## is encountered, parsing is restarted with the specified charset. - ## No further attempts are be made to detect the encoding, and decoder + ## No further attempts are made to detect the encoding, and decoder ## errors are signaled by U+FFFD replacement characters. ## * Otherwise, each charset on the charset stack is tried until either no ## decoding errors are encountered, or only one charset is left. For ## the last charset, decoder errors are signaled by U+FFFD replacement ## characters. - ## TODO: changing the charset after a successful BOM sniffing probably - ## makes no sense whatsoever, as almost all supported encodings are - ## ASCII-compatible (and would thus error out on leading high bytes - ## anyways). ctx*: Option[Handle] ## Context element for fragment parsing. When set to some Handle, ## the fragment case is used while parsing. |