about summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
| * Get rid of tagNameEquals, reduce getLocalName usebptato2023-12-302-22/+12
| | | | | | | | | | Compare token tag names where we can get away with it, since that's faster.
| * Remove unnecessary getNamespace callsbptato2023-12-301-4/+2
| | | | | | | | getTagType already checks for it
| * Pass all tree construction testsbptato2023-12-306-191/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many bugfixes: * getTagType now always returns TAG_UNKNOWN for non-HTML namespaced elements * Fix doctype public identifiers being compared case-sensitively * Fix adoption agency algorithm iteration count (again :D) * Add <font color> etc. to foreign content accepted element list * Fix pushInTemplate parser option * Fix SVG/MathML tags being used in table scope * Use table scope where appropriate (IN_TABLE) * minidom: fix parseHTMLFragment for non-HTML namespaces * minidom: fix UTF-8 validator/converter/whatever * Also, fix some test case parsing bugs/omissions (so they actually run :P) * Update readme
| * tests21bptato2023-12-283-15/+17
| | | | | | | | | | | | | | * Fix CDATA section bracket state bug * Fix peekStr bug * Simplify peekStrNoCase * Replace toUpperAscii calls with toLowerAscii
| * tests 18 .. 20bptato2023-12-285-23/+61
| | | | | | | | | | | | | | * Implement template stuff in minidom * Foreign content fixes * Fix </tbody> in "in row" switching to "in body" instead of "in table body"
| * Fix tests 10 .. 17, add todobptato2023-12-285-48/+139
| |
| * htmltokenizer: refactor peek_str, peek_str_nocasebptato2023-12-281-41/+33
| | | | | | | | | | | | Now they are functions. Also, slightly reduce the number of nested templates.
| * htmltokenizer: use static assertionsbptato2023-12-281-8/+11
| |
| * Update readmebptato2023-12-281-6/+4
| |
| * htmltokenizer: null -> \0bptato2023-12-281-28/+26
| | | | | | | | aesthetics
| * Comment out function overload for testingbptato2023-12-281-0/+3
| |
| * Fixes for test9.datbptato2023-12-289-175/+419
| | | | | | | | Now it runs without errors.
| * Add string interning for attribute namesbptato2023-12-279-132/+249
| |
| * Add string interning supportbptato2023-12-2712-484/+694
| | | | | | | | WIP
| * htmltokenizer: refactor EOF handlingbptato2023-12-261-267/+303
| | | | | | | | Now it is done outside of the main loop.
| * tests/tokenizer: remove unused parambptato2023-12-261-5/+5
| |
| * Re-add chakasu for testsbptato2023-12-261-0/+2
| | | | | | | | doesn't work otherwise
| * Remove chakasu from nimble filebptato2023-12-201-1/+0
| | | | | | | | not a mandatory dependency anymore
| * add missing tests/shared folderbptato2023-12-201-0/+262
| | | | | | | | Needed for tree tests, but gitignore blocked it.
| * htmlparser: fix bug in reconstructActiveFormattingbptato2023-12-201-1/+1
| | | | | | | | this needs isNone, not isSome
| * Separate out character encoding support from htmlparserbptato2023-12-2014-730/+584
| | | | | | | | | | | | | | This removes Chakasu as a hard dependency. Now users of this library must either implement encoding support themselves, or use minidom_cs (which still depends on Chakasu).
| * htmlparser: add getDocument getter to DOMBuilderbptato2023-12-202-17/+29
| | | | | | | | | | | | | | This replaces DOMBuilder.document with a getter function, mainly for consistency and flexibility. (Also, it removes the need to convert back DOMBuilder.document into a document node after parsing has finished.)
| * htmlparser: add callbacks for rewinding the input streambptato2023-12-181-5/+27
| |
| * htmlparser: remove superflous setPosition callsbptato2023-12-181-2/+0
| | | | | | | | | | These were made disregarding canReinterpret and would crash any parseHTML call with a stream that cannot be re-interpreted.
| * Fix typosbptato2023-12-051-6/+6
| |
| * htmlparser: take Option[Handle] for `before' in insertTextbptato2023-12-042-4/+5
| | | | | | | | had to be fixed too
| * Update readmebptato2023-12-031-7/+11
| |
| * htmlparser: take Option[Handle] for `before' in insertBeforebptato2023-12-032-14/+15
| | | | | | | | | | Passing `nil' there was an unfortunate mistake that requires an API breakage to fix.
| * Version 0.13.0bptato2023-12-032-2/+2
| |
| * tests/tree: add tests 4-8bptato2023-12-031-17/+73
| |
| * Various fixes & improvements in all modulesbptato2023-12-033-58/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | minidom: * add fragment parsing algorithms * document parseHTML htmlparser: * fix table body/in caption being mixed up in resetInsertionMode * fix frameset-ok not being initialized to true * fix opts.ctx not being used * naively parse tags in `match' instead of using the tokenizer htmltokenizer: * remove special-cased compile-time tokenizer mode * change sbuf to an array (from a seq), and store length in a separate variable instead of constantly resizing it * do not check for eof in emit_current (it never occurs)
| * entity: use pre-generated filebptato2023-11-204-13/+1087
| | | | | | | | | | Nim's JSON parser is slow, in nimvm even more so. Use a pre-generated entity_gen.nim file instead.
| * tests/tree: add tests2, tests3bptato2023-11-191-3/+9
| |
| * minidom: fix insertText if before is first in parentbptato2023-11-191-1/+5
| |
| * htmlparser, tests: make tests1.dat run without errorsbptato2023-11-193-99/+147
| | | | | | | | | | | | | | | | * Fix several bugs in adoptionAgencyAlgorithm, and factor out several "find index" operations * Fix some frameset, table col related bugs * minidom: simplify moveChildren, assert on adding children with an existing parent
| * tests/tree: fix comment handling, log databptato2023-11-181-26/+16
| |
| * htmltokenizer: formatbptato2023-11-181-2/+2
| |
| * htmlparser: adoption agency algorithm fixesbptato2023-11-181-13/+20
| | | | | | | | | | * Fix misunderstanding: the stack grows *downwards*. * Add some comments
| * tests: incomplete support for tree builder testsbptato2023-11-181-0/+275
| |
| * Update chakasubptato2023-11-181-1/+1
| |
| * tokenizer: move flush_chars into a procbptato2023-10-271-28/+28
| |
| * Add null character token typebptato2023-10-273-47/+42
| | | | | | | | So that we do not have to replace it in the parser.
| * Version 0.12.0bptato2023-10-232-3/+3
| |
| * Add pushInTemplate for fragment parsingbptato2023-10-231-0/+5
| |
| * Reduce nil usage for Handlesbptato2023-10-231-9/+13
| | | | | | | | Still not nil-free, because insertBefore & insertText needs nil.
| * htmlparser: add openElementsInit, formInit to optsbptato2023-10-231-1/+12
| | | | | | | | | | Makes it possible to set an initial value for openElements and the form pointer, as required by the HTML fragment parsing algorithm.
| * parser: add initial tokenizer state option; tokenizer: allow any kind of streambptato2023-10-224-27/+61
| | | | | | | | | | Use this to enable the unicodeCharsProblematic test, by importing runestream.
| * update chakasubptato2023-10-221-1/+1
| |
| * Version 0.11.2bptato2023-09-302-2/+2
| |
| * Fix potential OOB seq access in peek_charbptato2023-09-301-1/+2
| | | | | | | | | | Call consume() so that the buffer is filled if we are not at EOF yet (through checkBufLen).