about summary refs log tree commit diff stats
path: root/src/html/htmlparser.nim
diff options
context:
space:
mode:
authorbptato <nincsnevem662@gmail.com>2023-07-14 18:51:15 +0200
committerbptato <nincsnevem662@gmail.com>2023-07-14 18:51:38 +0200
commit96fceba76a218799d49652befab8b99289b07133 (patch)
treefa597339a887cf2bdbc40772add6b2dec49318e2 /src/html/htmlparser.nim
parent20aac96068c9b4e41d2571132fcf6558ec2eb487 (diff)
downloadchawan-96fceba76a218799d49652befab8b99289b07133.tar.gz
htmlparser: correct outdated comment
Diffstat (limited to 'src/html/htmlparser.nim')
-rw-r--r--src/html/htmlparser.nim12
1 files changed, 3 insertions, 9 deletions
diff --git a/src/html/htmlparser.nim b/src/html/htmlparser.nim
index 202d3860..41bcf66b 100644
--- a/src/html/htmlparser.nim
+++ b/src/html/htmlparser.nim
@@ -87,25 +87,19 @@ type
     ##   the stack. (e.g. say `charsets = @[CHARSET_UTF_16_LE, CHARSET_UTF_8]`,
     ##   then utf-16-le is tried before utf-8.)
     ## * BOM sniffing is attempted. If successful, confidence is set to
-    ##   certain and the resulting charset is pushed on top of the charset
-    ##   stack. (Continuing the previous example: if BOM sniffing determines
-    ##   the character encoding to be UTF-8, then utf-8 will be tried before
-    ##   utf-16-le.)
+    ##   certain and the resulting charset is used (i.e. other character
+    ##   sets will not be tried for decoding this document.)
     ## * If the charset stack is empty, UTF-8 is pushed on top.
     ## * Attempt to parse the document with the first charset on top of
     ##   the stack.
     ## * If BOM sniffing was unsuccessful, and a <meta charset=...> tag
     ##   is encountered, parsing is restarted with the specified charset.
-    ##   No further attempts are be made to detect the encoding, and decoder
+    ##   No further attempts are made to detect the encoding, and decoder
     ##   errors are signaled by U+FFFD replacement characters.
     ## * Otherwise, each charset on the charset stack is tried until either no
     ##   decoding errors are encountered, or only one charset is left. For
     ##   the last charset, decoder errors are signaled by U+FFFD replacement
     ##   characters.
-    ## TODO: changing the charset after a successful BOM sniffing probably
-    ## makes no sense whatsoever, as almost all supported encodings are
-    ## ASCII-compatible (and would thus error out on leading high bytes
-    ## anyways).
     ctx*: Option[Handle]
     ## Context element for fragment parsing. When set to some Handle,
     ## the fragment case is used while parsing.