chawan - Chawan - a web browser for your terminal (mirror)

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| *	de-iteratorize tokenizer	bptato	2024-01-21	3	-257/+287
\| \|
\| *	Fix copy-paste error attempt 2	bptato	2024-01-21	1	-1/+1
\| \|
\| *	Fix copy-paste error	bptato	2024-01-21	1	-1/+1
\| \| \| \| \| \| \| \|	whoops
\| *	htmlparser: set script already started on <script>	bptato	2024-01-21	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	We already have the callback, so this is less confusing than having to special-case it in consumer code.
\| *	Get rid of radixtree	bptato	2024-01-18	7	-230/+154
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	entityMap was a global variable initialized at runtime; plus it wasn't even efficient because radixtree used a heap-allocated tree, Nim strings as keys, a closure for searching, etc. We could have just brute forced the problem using a hash table, but a) we would need to store hashes + dummy entries for prefixes; that's a waste of bytes, and b) with std tables at least, we would need to re-hash for every char consumed. Instead, we just use what used to be called entityTable (renamed to entityMap), and pseudo-linear search it: * Use a jump table to find the starting entry's index * Go back in entityMap until either there are no more entries with the desired character or a matching entry found.
\| *	tags: remove NodeType and various sets	bptato	2024-01-15	7	-123/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* NodeType is somewhat convenient, but adds a 1-word overhead to each node and makes object construction more error-prone. If needed, library users can still add it without us defining the enum. * Now that we have atoms, the tagType function is useless. * SpecialElements is only used in the specification's parser section, and is not even complete because it should also contain non-HTML tags. Moved to htmlparser. * AllTagTypes can be expressed in a simpler way.
\| *	Avoid having to import htmltokenizer	bptato	2024-01-15	4	-45/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just export the types needed for htmlparser to work. Also, create the initial Tokens inside htmlparser, so it does not become part of the interface.
\| *	tags: add NAMESPACE_UNKNOWN, PREFIX_UNKNOWN	bptato	2024-01-14	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Null values for when consumer code needs it. (NAMESPACE_UNKNOWN in particular could be returned from getNamespaceImpl for user-defined namespaces.)
\| *	htmlparseriface: split out <html> element creation	bptato	2024-01-14	3	-37/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplifies createElementForToken; this way, we no longer have to pass an Option[Handle] just for a single special case. While we're at it, also remove some dead generic functions.
\| *	htmlparser: make it possible to implement custom elements	bptato	2024-01-06	3	-18/+40
\| \| \| \| \| \| \| \| \| \|	Intended parent must be passed to createElement, or it becomes impossible to get intended parent's node document.
\| *	Update todo, readme	bptato	2024-01-03	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Removing getParentNode seems to be too much effort that would yield dubious benefits (if any) in a corner case. * As far as I can tell, custom elements can be implemented using the existing set of hooks in htmlparser, so no extra work in Chame is necessary.
\| *	Update readme	bptato	2024-01-02	1	-3/+1
\| \|
\| *	minidom: UTF8 validate strings in strToAtom	bptato	2024-01-02	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we UTF8 validate, then let's go all the way. Doing this in atomToStr would be horrible, because then different atoms could potentially be represented by the same string. To avoid this problem, we validate in strToAtom instead. Since htmltokenizer/htmlparser does not keep record of strings used to produce atoms, and never compares the string representation of atoms, this is safe to do. (In fact, no atom stringifier is required or used by the parser at all.)
\| *	Do not use dynamic dispatch for optional hooks	bptato	2024-01-02	8	-78/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now we use `when compiles' for statically checking if implementations exists for these. minidom still uses dynamic dispatch for setEncodingImpl so that minidom_cs can override minidom's implementation.
\| *	htmlparseriface: make addAttrsIfMissingImpl non-optional	bptato	2024-01-02	2	-18/+15
\| \| \| \| \| \| \| \| \| \| \| \|	The "kind of functional implementation" angle has already been dropped by making getTemplateContentImpl is non-optional, so it makes no sense to provide a broken default implementation for addAttrsIfMissingImpl.
\| *	htmlparseriface: use converter instead of manual casting	bptato	2024-01-02	1	-72/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This way, we will be able to replace the method hack with static dispatch. (Not done yet, because I have to think of a way to provide interface definitions for optional procs. Maybe just a comment will suffice...)
\| *	Reduce func use, reduce dead code	bptato	2024-01-02	1	-37/+30
\| \| \| \| \| \| \| \| \| \|	* Remove func from procs that call external code * Remove unused findLastActiveFormatting overload
\| *	Separate HTML attributes from adjusted XML attrs	bptato	2024-01-02	7	-177/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use a table in tokenizer for attributes * Pass a separate xmlAttrs seq for adjusted XML attributes This way, deduplication is easier, and atoms only need a hash function instead of a cmp function. Also, the number of copies is reduced in the most common case (which is elements with html-only attributes). Since the HTML standard does not specify the ordering of attributes, it seems wiser to just output the table we use for de-duplication and leave any further sorting to the consumer than to implement it inside the library.
\| *	htmlparser: use popElementsIncl more	bptato	2024-01-01	1	-4/+3
\| \|
\| *	htmlparser: get rid of some todos	bptato	2024-01-01	2	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* No namespace is correct. * Queue a microtask can be implemented in elementPoppedImpl, because the stack of open elements is not visible to scripts. * We can just change <image> tag name and reproces, no problem * Update todo file
\| *	htmltokenizer: fix some todos	bptato	2024-01-01	1	-41/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Only check for start tag attributes where a start tag with attributes could be emitted * Do not unnecessarily hash tag names for is_appropriate_end_tag_token * Unify code paths for the numeric char ref end tag state
\| *	htmltokenizer: remove large internal buffer	bptato	2024-01-01	6	-128/+138
\| \| \| \| \| \| \| \| \| \| \| \|	We still store characters retrieved for peek in a separate buffer; otherwise, the getChar routine is used (which is provided by DOMBuilder).
\| *	New interface part 2	bptato	2023-12-31	7	-182/+178
\| \| \| \| \| \| \| \| \| \|	Now we have a somewhat hacky solution for defining optional procs. Still WIP...
\| *	New interface part 1	bptato	2023-12-30	9	-426/+455
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Use mixins for mandatory functions (through htmlparseriface) * Get rid of AtomFactory; instead, pass DOMBuilder to tokenizer and specify atomToStr etc. on DOMBuilder * Call atomToTagType with dombuilder as a param
\| *	Do not emit EOF token in tokenizer	bptato	2023-12-30	2	-4/+4
\| \| \| \| \| \| \| \| \| \|	Instead, pretend that one had been emitted after the main constructTree loop. (This way we can get rid of an unnecessary test.)
\| *	Get rid of tagNameEquals, reduce getLocalName use	bptato	2023-12-30	2	-22/+12
\| \| \| \| \| \| \| \| \| \|	Compare token tag names where we can get away with it, since that's faster.
\| *	Remove unnecessary getNamespace calls	bptato	2023-12-30	1	-4/+2
\| \| \| \| \| \| \| \|	getTagType already checks for it
\| *	Pass all tree construction tests	bptato	2023-12-30	6	-191/+383
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Many bugfixes: * getTagType now always returns TAG_UNKNOWN for non-HTML namespaced elements * Fix doctype public identifiers being compared case-sensitively * Fix adoption agency algorithm iteration count (again :D) * Add <font color> etc. to foreign content accepted element list * Fix pushInTemplate parser option * Fix SVG/MathML tags being used in table scope * Use table scope where appropriate (IN_TABLE) * minidom: fix parseHTMLFragment for non-HTML namespaces * minidom: fix UTF-8 validator/converter/whatever * Also, fix some test case parsing bugs/omissions (so they actually run :P) * Update readme
\| *	tests21	bptato	2023-12-28	3	-15/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Fix CDATA section bracket state bug * Fix peekStr bug * Simplify peekStrNoCase * Replace toUpperAscii calls with toLowerAscii
\| *	tests 18 .. 20	bptato	2023-12-28	5	-23/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Implement template stuff in minidom * Foreign content fixes * Fix </tbody> in "in row" switching to "in body" instead of "in table body"
\| *	Fix tests 10 .. 17, add todo	bptato	2023-12-28	5	-48/+139
\| \|
\| *	htmltokenizer: refactor peek_str, peek_str_nocase	bptato	2023-12-28	1	-41/+33
\| \| \| \| \| \| \| \| \| \| \| \|	Now they are functions. Also, slightly reduce the number of nested templates.
\| *	htmltokenizer: use static assertions	bptato	2023-12-28	1	-8/+11
\| \|
\| *	Update readme	bptato	2023-12-28	1	-6/+4
\| \|
\| *	htmltokenizer: null -> \0	bptato	2023-12-28	1	-28/+26
\| \| \| \| \| \| \| \|	aesthetics
\| *	Comment out function overload for testing	bptato	2023-12-28	1	-0/+3
\| \|
\| *	Fixes for test9.dat	bptato	2023-12-28	9	-175/+419
\| \| \| \| \| \| \| \|	Now it runs without errors.
\| *	Add string interning for attribute names	bptato	2023-12-27	9	-132/+249
\| \|
\| *	Add string interning support	bptato	2023-12-27	12	-484/+694
\| \| \| \| \| \| \| \|	WIP
\| *	htmltokenizer: refactor EOF handling	bptato	2023-12-26	1	-267/+303
\| \| \| \| \| \| \| \|	Now it is done outside of the main loop.
\| *	tests/tokenizer: remove unused param	bptato	2023-12-26	1	-5/+5
\| \|
\| *	Re-add chakasu for tests	bptato	2023-12-26	1	-0/+2
\| \| \| \| \| \| \| \|	doesn't work otherwise
\| *	Remove chakasu from nimble file	bptato	2023-12-20	1	-1/+0
\| \| \| \| \| \| \| \|	not a mandatory dependency anymore
\| *	add missing tests/shared folder	bptato	2023-12-20	1	-0/+262
\| \| \| \| \| \| \| \|	Needed for tree tests, but gitignore blocked it.
\| *	htmlparser: fix bug in reconstructActiveFormatting	bptato	2023-12-20	1	-1/+1
\| \| \| \| \| \| \| \|	this needs isNone, not isSome
\| *	Separate out character encoding support from htmlparser	bptato	2023-12-20	14	-730/+584
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This removes Chakasu as a hard dependency. Now users of this library must either implement encoding support themselves, or use minidom_cs (which still depends on Chakasu).
\| *	htmlparser: add getDocument getter to DOMBuilder	bptato	2023-12-20	2	-17/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces DOMBuilder.document with a getter function, mainly for consistency and flexibility. (Also, it removes the need to convert back DOMBuilder.document into a document node after parsing has finished.)
\| *	htmlparser: add callbacks for rewinding the input stream	bptato	2023-12-18	1	-5/+27
\| \|
\| *	htmlparser: remove superflous setPosition calls	bptato	2023-12-18	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	These were made disregarding canReinterpret and would crash any parseHTML call with a stream that cannot be re-interpreted.
\| *	Fix typos	bptato	2023-12-05	1	-6/+6
\| \|