Commit message (Collapse) | Author | Age | Files | Lines | ||
---|---|---|---|---|---|---|
... | ||||||
* | Add string interning support | bptato | 2023-12-27 | 12 | -484/+694 | |
| | | | | WIP | |||||
* | htmltokenizer: refactor EOF handling | bptato | 2023-12-26 | 1 | -267/+303 | |
| | | | | Now it is done outside of the main loop. | |||||
* | tests/tokenizer: remove unused param | bptato | 2023-12-26 | 1 | -5/+5 | |
| | ||||||
* | Re-add chakasu for tests | bptato | 2023-12-26 | 1 | -0/+2 | |
| | | | | doesn't work otherwise | |||||
* | Remove chakasu from nimble file | bptato | 2023-12-20 | 1 | -1/+0 | |
| | | | | not a mandatory dependency anymore | |||||
* | add missing tests/shared folder | bptato | 2023-12-20 | 1 | -0/+262 | |
| | | | | Needed for tree tests, but gitignore blocked it. | |||||
* | htmlparser: fix bug in reconstructActiveFormatting | bptato | 2023-12-20 | 1 | -1/+1 | |
| | | | | this needs isNone, not isSome | |||||
* | Separate out character encoding support from htmlparser | bptato | 2023-12-20 | 14 | -730/+584 | |
| | | | | | | | This removes Chakasu as a hard dependency. Now users of this library must either implement encoding support themselves, or use minidom_cs (which still depends on Chakasu). | |||||
* | htmlparser: add getDocument getter to DOMBuilder | bptato | 2023-12-20 | 2 | -17/+29 | |
| | | | | | | | This replaces DOMBuilder.document with a getter function, mainly for consistency and flexibility. (Also, it removes the need to convert back DOMBuilder.document into a document node after parsing has finished.) | |||||
* | htmlparser: add callbacks for rewinding the input stream | bptato | 2023-12-18 | 1 | -5/+27 | |
| | ||||||
* | htmlparser: remove superflous setPosition calls | bptato | 2023-12-18 | 1 | -2/+0 | |
| | | | | | These were made disregarding canReinterpret and would crash any parseHTML call with a stream that cannot be re-interpreted. | |||||
* | Fix typos | bptato | 2023-12-05 | 1 | -6/+6 | |
| | ||||||
* | htmlparser: take Option[Handle] for `before' in insertText | bptato | 2023-12-04 | 2 | -4/+5 | |
| | | | | had to be fixed too | |||||
* | Update readme | bptato | 2023-12-03 | 1 | -7/+11 | |
| | ||||||
* | htmlparser: take Option[Handle] for `before' in insertBefore | bptato | 2023-12-03 | 2 | -14/+15 | |
| | | | | | Passing `nil' there was an unfortunate mistake that requires an API breakage to fix. | |||||
* | Version 0.13.0 | bptato | 2023-12-03 | 2 | -2/+2 | |
| | ||||||
* | tests/tree: add tests 4-8 | bptato | 2023-12-03 | 1 | -17/+73 | |
| | ||||||
* | Various fixes & improvements in all modules | bptato | 2023-12-03 | 3 | -58/+136 | |
| | | | | | | | | | | | | | | | | | | minidom: * add fragment parsing algorithms * document parseHTML htmlparser: * fix table body/in caption being mixed up in resetInsertionMode * fix frameset-ok not being initialized to true * fix opts.ctx not being used * naively parse tags in `match' instead of using the tokenizer htmltokenizer: * remove special-cased compile-time tokenizer mode * change sbuf to an array (from a seq), and store length in a separate variable instead of constantly resizing it * do not check for eof in emit_current (it never occurs) | |||||
* | entity: use pre-generated file | bptato | 2023-11-20 | 4 | -13/+1087 | |
| | | | | | Nim's JSON parser is slow, in nimvm even more so. Use a pre-generated entity_gen.nim file instead. | |||||
* | tests/tree: add tests2, tests3 | bptato | 2023-11-19 | 1 | -3/+9 | |
| | ||||||
* | minidom: fix insertText if before is first in parent | bptato | 2023-11-19 | 1 | -1/+5 | |
| | ||||||
* | htmlparser, tests: make tests1.dat run without errors | bptato | 2023-11-19 | 3 | -99/+147 | |
| | | | | | | | | * Fix several bugs in adoptionAgencyAlgorithm, and factor out several "find index" operations * Fix some frameset, table col related bugs * minidom: simplify moveChildren, assert on adding children with an existing parent | |||||
* | tests/tree: fix comment handling, log data | bptato | 2023-11-18 | 1 | -26/+16 | |
| | ||||||
* | htmltokenizer: format | bptato | 2023-11-18 | 1 | -2/+2 | |
| | ||||||
* | htmlparser: adoption agency algorithm fixes | bptato | 2023-11-18 | 1 | -13/+20 | |
| | | | | | * Fix misunderstanding: the stack grows *downwards*. * Add some comments | |||||
* | tests: incomplete support for tree builder tests | bptato | 2023-11-18 | 1 | -0/+275 | |
| | ||||||
* | Update chakasu | bptato | 2023-11-18 | 1 | -1/+1 | |
| | ||||||
* | tokenizer: move flush_chars into a proc | bptato | 2023-10-27 | 1 | -28/+28 | |
| | ||||||
* | Add null character token type | bptato | 2023-10-27 | 3 | -47/+42 | |
| | | | | So that we do not have to replace it in the parser. | |||||
* | Version 0.12.0 | bptato | 2023-10-23 | 2 | -3/+3 | |
| | ||||||
* | Add pushInTemplate for fragment parsing | bptato | 2023-10-23 | 1 | -0/+5 | |
| | ||||||
* | Reduce nil usage for Handles | bptato | 2023-10-23 | 1 | -9/+13 | |
| | | | | Still not nil-free, because insertBefore & insertText needs nil. | |||||
* | htmlparser: add openElementsInit, formInit to opts | bptato | 2023-10-23 | 1 | -1/+12 | |
| | | | | | Makes it possible to set an initial value for openElements and the form pointer, as required by the HTML fragment parsing algorithm. | |||||
* | parser: add initial tokenizer state option; tokenizer: allow any kind of stream | bptato | 2023-10-22 | 4 | -27/+61 | |
| | | | | | Use this to enable the unicodeCharsProblematic test, by importing runestream. | |||||
* | update chakasu | bptato | 2023-10-22 | 1 | -1/+1 | |
| | ||||||
* | Version 0.11.2 | bptato | 2023-09-30 | 2 | -2/+2 | |
| | ||||||
* | Fix potential OOB seq access in peek_char | bptato | 2023-09-30 | 1 | -1/+2 | |
| | | | | | Call consume() so that the buffer is filled if we are not at EOF yet (through checkBufLen). | |||||
* | tolower -> toLowerAscii | bptato | 2023-09-24 | 1 | -1/+1 | |
| | ||||||
* | twtstr: remove unused functions | bptato | 2023-09-24 | 1 | -307/+0 | |
| | ||||||
* | Version 0.11.1 | bptato | 2023-09-24 | 2 | -2/+2 | |
| | ||||||
* | remove unused functions | bptato | 2023-09-24 | 1 | -8/+1 | |
| | ||||||
* | update chakasu | bptato | 2023-09-24 | 1 | -1/+1 | |
| | ||||||
* | Version 0.11.0 | bptato | 2023-09-19 | 2 | -3/+3 | |
| | ||||||
* | tags: clean up | bptato | 2023-09-19 | 1 | -72/+1 | |
| | | | | | | * InputType, ButtonType have nothing to do with the parser. * Neither do many categories included in the module, these have been removed too. (Many of these are remnants of the previous HTML parser.) | |||||
* | Version 0.10.1 | bptato | 2023-09-14 | 2 | -2/+2 | |
| | ||||||
* | htmlparser: add whitespace handling to text & in table states | bptato | 2023-09-14 | 1 | -2/+2 | |
| | | | | a rather problematic omission | |||||
* | Version 0.10.0 | bptato | 2023-09-14 | 2 | -3/+3 | |
| | ||||||
* | htmlparser: check for moveChildren not being nil | bptato | 2023-09-14 | 1 | -0/+1 | |
| | ||||||
* | Update chakasu | bptato | 2023-09-14 | 2 | -2/+3 | |
| | ||||||
* | tests: disable unicodeCharsProblematic | bptato | 2023-09-03 | 1 | -2/+10 | |
| | | | | This really just won't work with what we have right now. |