| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
| | |
|
| | |
|
| |
| |
| |
| | |
whoops
|
| |
| |
| |
| |
| | |
We already have the callback, so this is less confusing than having
to special-case it in consumer code.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
entityMap was a global variable initialized at runtime; plus it wasn't
even efficient because radixtree used a heap-allocated tree, Nim
strings as keys, a closure for searching, etc.
We could have just brute forced the problem using a hash table, but
a) we would need to store hashes + dummy entries for prefixes; that's
a waste of bytes, and
b) with std tables at least, we would need to re-hash for every char
consumed.
Instead, we just use what used to be called entityTable (renamed to
entityMap), and pseudo-linear search it:
* Use a jump table to find the starting entry's index
* Go back in entityMap until either there are no more entries with the
desired character or a matching entry found.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* NodeType is somewhat convenient, but adds a 1-word overhead to
each node and makes object construction more error-prone. If needed,
library users can still add it without us defining the enum.
* Now that we have atoms, the tagType function is useless.
* SpecialElements is only used in the specification's parser section,
and is not even complete because it should also contain non-HTML
tags. Moved to htmlparser.
* AllTagTypes can be expressed in a simpler way.
|
| |
| |
| |
| |
| |
| |
| | |
Just export the types needed for htmlparser to work.
Also, create the initial Tokens inside htmlparser, so it does not
become part of the interface.
|
| |
| |
| |
| |
| |
| | |
Null values for when consumer code needs it. (NAMESPACE_UNKNOWN in
particular could be returned from getNamespaceImpl for user-defined
namespaces.)
|
| |
| |
| |
| |
| |
| |
| | |
Simplifies createElementForToken; this way, we no longer have to pass
an Option[Handle] just for a single special case.
While we're at it, also remove some dead generic functions.
|
| |
| |
| |
| |
| | |
Intended parent must be passed to createElement, or it becomes
impossible to get intended parent's node document.
|
| |
| |
| |
| |
| |
| |
| | |
* Removing getParentNode seems to be too much effort that would yield
dubious benefits (if any) in a corner case.
* As far as I can tell, custom elements can be implemented using the existing
set of hooks in htmlparser, so no extra work in Chame is necessary.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we UTF8 validate, then let's go all the way.
Doing this in atomToStr would be horrible, because then different
atoms could potentially be represented by the same string.
To avoid this problem, we validate in strToAtom instead. Since
htmltokenizer/htmlparser does not keep record of strings used to
produce atoms, and never compares the string representation of atoms,
this is safe to do. (In fact, no atom stringifier is required or used
by the parser at all.)
|
| |
| |
| |
| |
| |
| |
| |
| | |
Now we use `when compiles' for statically checking if implementations
exists for these.
minidom still uses dynamic dispatch for setEncodingImpl so that
minidom_cs can override minidom's implementation.
|
| |
| |
| |
| |
| |
| | |
The "kind of functional implementation" angle has already been dropped
by making getTemplateContentImpl is non-optional, so it makes no sense
to provide a broken default implementation for addAttrsIfMissingImpl.
|
| |
| |
| |
| |
| |
| |
| | |
This way, we will be able to replace the method hack with static
dispatch. (Not done yet, because I have to think of a way to provide
interface definitions for optional procs. Maybe just a comment will
suffice...)
|
| |
| |
| |
| |
| | |
* Remove func from procs that call external code
* Remove unused findLastActiveFormatting overload
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Use a table in tokenizer for attributes
* Pass a separate xmlAttrs seq for adjusted XML attributes
This way, deduplication is easier, and atoms only need a hash function
instead of a cmp function. Also, the number of copies is reduced in
the most common case (which is elements with html-only attributes).
Since the HTML standard does not specify the ordering of attributes,
it seems wiser to just output the table we use for de-duplication and
leave any further sorting to the consumer than to implement it inside
the library.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
* No namespace is correct.
* Queue a microtask can be implemented in elementPoppedImpl, because
the stack of open elements is not visible to scripts.
* We can just change <image> tag name and reproces, no problem
* Update todo file
|
| |
| |
| |
| |
| |
| |
| | |
* Only check for start tag attributes where a start tag with attributes
could be emitted
* Do not unnecessarily hash tag names for is_appropriate_end_tag_token
* Unify code paths for the numeric char ref end tag state
|
| |
| |
| |
| |
| |
| | |
We still store characters retrieved for peek in a separate buffer;
otherwise, the getChar routine is used (which is provided by
DOMBuilder).
|
| |
| |
| |
| |
| | |
Now we have a somewhat hacky solution for defining optional procs.
Still WIP...
|
| |
| |
| |
| |
| |
| |
| | |
* Use mixins for mandatory functions (through htmlparseriface)
* Get rid of AtomFactory; instead, pass DOMBuilder to tokenizer
and specify atomToStr etc. on DOMBuilder
* Call atomToTagType with dombuilder as a param
|
| |
| |
| |
| |
| | |
Instead, pretend that one had been emitted after the main constructTree
loop. (This way we can get rid of an unnecessary test.)
|
| |
| |
| |
| |
| | |
Compare token tag names where we can get away with it, since that's
faster.
|
| |
| |
| |
| | |
getTagType already checks for it
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Many bugfixes:
* getTagType now always returns TAG_UNKNOWN for non-HTML namespaced
elements
* Fix doctype public identifiers being compared case-sensitively
* Fix adoption agency algorithm iteration count (again :D)
* Add <font color> etc. to foreign content accepted element list
* Fix pushInTemplate parser option
* Fix SVG/MathML tags being used in table scope
* Use table scope where appropriate (IN_TABLE)
* minidom: fix parseHTMLFragment for non-HTML namespaces
* minidom: fix UTF-8 validator/converter/whatever
* Also, fix some test case parsing bugs/omissions (so they actually
run :P)
* Update readme
|
| |
| |
| |
| |
| |
| |
| | |
* Fix CDATA section bracket state bug
* Fix peekStr bug
* Simplify peekStrNoCase
* Replace toUpperAscii calls with toLowerAscii
|
| |
| |
| |
| |
| |
| |
| | |
* Implement template stuff in minidom
* Foreign content fixes
* Fix </tbody> in "in row" switching to "in body" instead of
"in table body"
|
| | |
|
| |
| |
| |
| |
| |
| | |
Now they are functions.
Also, slightly reduce the number of nested templates.
|
| | |
|
| | |
|
| |
| |
| |
| | |
aesthetics
|
| | |
|
| |
| |
| |
| | |
Now it runs without errors.
|
| | |
|
| |
| |
| |
| | |
WIP
|
| |
| |
| |
| | |
Now it is done outside of the main loop.
|
| | |
|
| |
| |
| |
| | |
doesn't work otherwise
|
| |
| |
| |
| | |
not a mandatory dependency anymore
|
| |
| |
| |
| | |
Needed for tree tests, but gitignore blocked it.
|
| |
| |
| |
| | |
this needs isNone, not isSome
|
| |
| |
| |
| |
| |
| |
| | |
This removes Chakasu as a hard dependency.
Now users of this library must either implement encoding support
themselves, or use minidom_cs (which still depends on Chakasu).
|
| |
| |
| |
| |
| |
| |
| | |
This replaces DOMBuilder.document with a getter function, mainly
for consistency and flexibility. (Also, it removes the need to
convert back DOMBuilder.document into a document node after parsing
has finished.)
|
| | |
|
| |
| |
| |
| |
| | |
These were made disregarding canReinterpret and would crash any
parseHTML call with a stream that cannot be re-interpreted.
|
| | |
|