about summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
* Add string interning supportbptato2023-12-2712-484/+694
| | | | WIP
* htmltokenizer: refactor EOF handlingbptato2023-12-261-267/+303
| | | | Now it is done outside of the main loop.
* tests/tokenizer: remove unused parambptato2023-12-261-5/+5
|
* Re-add chakasu for testsbptato2023-12-261-0/+2
| | | | doesn't work otherwise
* Remove chakasu from nimble filebptato2023-12-201-1/+0
| | | | not a mandatory dependency anymore
* add missing tests/shared folderbptato2023-12-201-0/+262
| | | | Needed for tree tests, but gitignore blocked it.
* htmlparser: fix bug in reconstructActiveFormattingbptato2023-12-201-1/+1
| | | | this needs isNone, not isSome
* Separate out character encoding support from htmlparserbptato2023-12-2014-730/+584
| | | | | | | This removes Chakasu as a hard dependency. Now users of this library must either implement encoding support themselves, or use minidom_cs (which still depends on Chakasu).
* htmlparser: add getDocument getter to DOMBuilderbptato2023-12-202-17/+29
| | | | | | | This replaces DOMBuilder.document with a getter function, mainly for consistency and flexibility. (Also, it removes the need to convert back DOMBuilder.document into a document node after parsing has finished.)
* htmlparser: add callbacks for rewinding the input streambptato2023-12-181-5/+27
|
* htmlparser: remove superflous setPosition callsbptato2023-12-181-2/+0
| | | | | These were made disregarding canReinterpret and would crash any parseHTML call with a stream that cannot be re-interpreted.
* Fix typosbptato2023-12-051-6/+6
|
* htmlparser: take Option[Handle] for `before' in insertTextbptato2023-12-042-4/+5
| | | | had to be fixed too
* Update readmebptato2023-12-031-7/+11
|
* htmlparser: take Option[Handle] for `before' in insertBeforebptato2023-12-032-14/+15
| | | | | Passing `nil' there was an unfortunate mistake that requires an API breakage to fix.
* Version 0.13.0bptato2023-12-032-2/+2
|
* tests/tree: add tests 4-8bptato2023-12-031-17/+73
|
* Various fixes & improvements in all modulesbptato2023-12-033-58/+136
| | | | | | | | | | | | | | | | | | minidom: * add fragment parsing algorithms * document parseHTML htmlparser: * fix table body/in caption being mixed up in resetInsertionMode * fix frameset-ok not being initialized to true * fix opts.ctx not being used * naively parse tags in `match' instead of using the tokenizer htmltokenizer: * remove special-cased compile-time tokenizer mode * change sbuf to an array (from a seq), and store length in a separate variable instead of constantly resizing it * do not check for eof in emit_current (it never occurs)
* entity: use pre-generated filebptato2023-11-204-13/+1087
| | | | | Nim's JSON parser is slow, in nimvm even more so. Use a pre-generated entity_gen.nim file instead.
* tests/tree: add tests2, tests3bptato2023-11-191-3/+9
|
* minidom: fix insertText if before is first in parentbptato2023-11-191-1/+5
|
* htmlparser, tests: make tests1.dat run without errorsbptato2023-11-193-99/+147
| | | | | | | | * Fix several bugs in adoptionAgencyAlgorithm, and factor out several "find index" operations * Fix some frameset, table col related bugs * minidom: simplify moveChildren, assert on adding children with an existing parent
* tests/tree: fix comment handling, log databptato2023-11-181-26/+16
|
* htmltokenizer: formatbptato2023-11-181-2/+2
|
* htmlparser: adoption agency algorithm fixesbptato2023-11-181-13/+20
| | | | | * Fix misunderstanding: the stack grows *downwards*. * Add some comments
* tests: incomplete support for tree builder testsbptato2023-11-181-0/+275
|
* Update chakasubptato2023-11-181-1/+1
|
* tokenizer: move flush_chars into a procbptato2023-10-271-28/+28
|
* Add null character token typebptato2023-10-273-47/+42
| | | | So that we do not have to replace it in the parser.
* Version 0.12.0bptato2023-10-232-3/+3
|
* Add pushInTemplate for fragment parsingbptato2023-10-231-0/+5
|
* Reduce nil usage for Handlesbptato2023-10-231-9/+13
| | | | Still not nil-free, because insertBefore & insertText needs nil.
* htmlparser: add openElementsInit, formInit to optsbptato2023-10-231-1/+12
| | | | | Makes it possible to set an initial value for openElements and the form pointer, as required by the HTML fragment parsing algorithm.
* parser: add initial tokenizer state option; tokenizer: allow any kind of streambptato2023-10-224-27/+61
| | | | | Use this to enable the unicodeCharsProblematic test, by importing runestream.
* update chakasubptato2023-10-221-1/+1
|
* Version 0.11.2bptato2023-09-302-2/+2
|
* Fix potential OOB seq access in peek_charbptato2023-09-301-1/+2
| | | | | Call consume() so that the buffer is filled if we are not at EOF yet (through checkBufLen).
* tolower -> toLowerAsciibptato2023-09-241-1/+1
|
* twtstr: remove unused functionsbptato2023-09-241-307/+0
|
* Version 0.11.1bptato2023-09-242-2/+2
|
* remove unused functionsbptato2023-09-241-8/+1
|
* update chakasubptato2023-09-241-1/+1
|
* Version 0.11.0bptato2023-09-192-3/+3
|
* tags: clean upbptato2023-09-191-72/+1
| | | | | | * InputType, ButtonType have nothing to do with the parser. * Neither do many categories included in the module, these have been removed too. (Many of these are remnants of the previous HTML parser.)
* Version 0.10.1bptato2023-09-142-2/+2
|
* htmlparser: add whitespace handling to text & in table statesbptato2023-09-141-2/+2
| | | | a rather problematic omission
* Version 0.10.0bptato2023-09-142-3/+3
|
* htmlparser: check for moveChildren not being nilbptato2023-09-141-0/+1
|
* Update chakasubptato2023-09-142-2/+3
|
* tests: disable unicodeCharsProblematicbptato2023-09-031-2/+10
| | | | This really just won't work with what we have right now.