summary refs log tree commit diff stats
path: root/lib/pure/collections/sets.nim
Commit message (Collapse)AuthorAgeFilesLines
* Add `hashWangYi1` (#13823)c-blake2020-04-151-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Unwind just the "pseudorandom probing" (whole hash-code-keyed variable stride double hashing) part of recent sets & tables changes (which has still been causing bugs over a month later (e.g., two days ago https://github.com/nim-lang/Nim/issues/13794) as well as still having several "figure this out" implementation question comments in them (see just diffs of this PR). This topic has been discussed in many places: https://github.com/nim-lang/Nim/issues/13393 https://github.com/nim-lang/Nim/pull/13418 https://github.com/nim-lang/Nim/pull/13440 https://github.com/nim-lang/Nim/issues/13794 Alternative/non-mandatory stronger integer hashes (or vice-versa opt-in identity hashes) are a better solution that is more general (no illusion of one hard-coded sequence solving all problems) while retaining the virtues of linear probing such as cache obliviousness and age-less tables under delete-heavy workloads (still untested after a month of this change). The only real solution for truly adversarial keys is a hash keyed off of data unobservable to attackers. That all fits better with a few families of user-pluggable/define-switchable hashes which can be provided in a separate PR more about `hashes.nim`. This PR carefully preserves the better (but still hard coded!) probing of the `intsets` and other recent fixes like `move` annotations, hash order invariant tests, `intsets.missingOrExcl` fixing, and the move of `rightSize` into `hashcommon.nim`. * Fix `data.len` -> `dataLen` problem. * This is an alternate resolution to https://github.com/nim-lang/Nim/issues/13393 (which arguably could be resolved outside the stdlib). Add version1 of Wang Yi's hash specialized to 8 byte integers. This gives simple help to users having trouble with overly colliding hash(key)s. I.e., A) `import hashes; proc hash(x: myInt): Hash = hashWangYi1(int(x))` in the instantiation context of a `HashSet` or `Table` or B) more globally, compile with `nim c -d:hashWangYi1`. No hash can be all things to all use cases, but this one is A) vetted to scramble well by the SMHasher test suite (a necessarily limited but far more thorough test than prior proposals here), B) only a few ALU ops on many common CPUs, and C) possesses an easy via "grade school multi-digit multiplication" fall back for weaker deployment contexts. Some people might want to stampede ahead unbridled, but my view is that a good plan is to A) include this in the stdlib for a release or three to let people try it on various key sets nim-core could realistically never access/test (maybe mentioning it in the changelog so people actually try it out), B) have them report problems (if any), C) if all seems good, make the stdlib more novice friendly by adding `hashIdentity(x)=x` and changing the default `hash() = hashWangYi1` with some `when defined` rearranging so users can `-d:hashIdentity` if they want the old behavior back. This plan is compatible with any number of competing integer hashes if people want to add them. I would strongly recommend they all *at least* pass the SMHasher suite since the idea here is to become more friendly to novices who do not generally understand hashing failure modes. * Re-organize to work around `when nimvm` limitations; Add some tests; Add a changelog.md entry. * Add less than 64-bit CPU when fork. * Fix decl instead of call typo. * First attempt at fixing range error on 32-bit platforms; Still do the arithmetic in doubled up 64-bit, but truncate the hash to the lower 32-bits, but then still return `uint64` to be the same. So, type correct but truncated hash value. Update `thashes.nim` as well. * A second try at making 32-bit mode CI work. * Use a more systematic identifier convention than Wang Yi's code. * Fix test that was wrong for as long as `toHashSet` used `rightSize` (a very long time, I think). `$a`/`$b` depend on iteration order which varies with table range reduced hash order which varies with range for some `hash()`. With 3 elements, 3!=6 is small and we've just gotten lucky with past experimental `hash()` changes. An alternate fix here would be to not stringify but use the HashSet operators, but it is not clear that doesn't alter the "spirit" of the test. * Fix another stringified test depending upon hash order. * Oops - revert the string-keyed test. * Fix another stringify test depending on hash order. * Add a better than always zero `defined(js)` branch. * It turns out to be easy to just work all in `BigInt` inside JS and thus guarantee the same low order bits of output hashes (for `isSafeInteger` input numbers). Since `hashWangYi1` output bits are equally random in all their bits, this means that tables will be safely scrambled for table sizes up to 2**32 or 4 gigaentries which is probably fine, as long as the integer keys are all < 2**53 (also likely fine). (I'm unsure why the infidelity with C/C++ back ends cut off is 32, not 53 bits.) Since HashSet & Table only use the low order bits, a quick corollary of this is that `$` on most int-keyed sets/tables will be the same in all the various back ends which seems a nice-to-have trait. * These string hash tests fail for me locally. Maybe this is what causes the CI hang for testament pcat collections? * Oops. That failure was from me manually patching string hash in hashes. Revert. * Import more test improvements from https://github.com/nim-lang/Nim/pull/13410 * Fix bug where I swapped order when reverting the test. Ack. * Oh, just accept either order like more and more hash tests. * Iterate in the same order. * `return` inside `emit` made us skip `popFrame` causing weird troubles. * Oops - do Windows branch also. * `nimV1hash` -> multiply-mnemonic, type-scoped `nimIntHash1` (mnemonic resolutions are "1 == identity", 1 for Nim Version 1, 1 for first/simplest/fastest in a series of possibilities. Should be very easy to remember.) * Re-organize `when nimvm` logic to be a strict `when`-`else`. * Merge other changes. * Lift constants to a common area. * Fall back to identity hash when `BigInt` is unavailable. * Increase timeout slightly (probably just real-time perturbation of CI system performance).
* Unwind just the "pseudorandom probing" part of recent sets,tables changes ↵c-blake2020-03-311-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#13816) * Unwind just the "pseudorandom probing" (whole hash-code-keyed variable stride double hashing) part of recent sets & tables changes (which has still been causing bugs over a month later (e.g., two days ago https://github.com/nim-lang/Nim/issues/13794) as well as still having several "figure this out" implementation question comments in them (see just diffs of this PR). This topic has been discussed in many places: https://github.com/nim-lang/Nim/issues/13393 https://github.com/nim-lang/Nim/pull/13418 https://github.com/nim-lang/Nim/pull/13440 https://github.com/nim-lang/Nim/issues/13794 Alternative/non-mandatory stronger integer hashes (or vice-versa opt-in identity hashes) are a better solution that is more general (no illusion of one hard-coded sequence solving all problems) while retaining the virtues of linear probing such as cache obliviousness and age-less tables under delete-heavy workloads (still untested after a month of this change). The only real solution for truly adversarial keys is a hash keyed off of data unobservable to attackers. That all fits better with a few families of user-pluggable/define-switchable hashes which can be provided in a separate PR more about `hashes.nim`. This PR carefully preserves the better (but still hard coded!) probing of the `intsets` and other recent fixes like `move` annotations, hash order invariant tests, `intsets.missingOrExcl` fixing, and the move of `rightSize` into `hashcommon.nim`. * Fix `data.len` -> `dataLen` problem.
* fixes hash(HashSet) which was wrong as it didn't respect tombstones; refs #13649Araq2020-03-181-1/+2
|
* [backport] pseudorandom probing for hash collision (#13418)Timothee Cour2020-02-191-21/+19
|
* fix several typos in documentation and comments (#12553)Nindaleth2019-10-301-1/+1
|
* Fix word wrappingJjp1372019-10-221-4/+4
|
* Fix many broken linksJjp1372019-10-221-2/+2
| | | | | | Note that contrary to what docgen.rst currently says, the ids have to match exactly or else most web browsers will not jump to the intended symbol.
* fixes #11764, faster hashing of (u)int (#12407)Miran2019-10-151-2/+2
|
* [other] prettify collections (#11695)Miran2019-07-091-7/+7
|
* [refactoring] refactor the compiler and stdlib to deprecation warnings (#11419)Arne Döring2019-06-111-2/+2
|
* Render deprecated pragmas (#8886)LemonBoy2019-06-031-7/+2
| | | | | | | | | * Render deprecated pragmas * fix the expected html * clean up the documentation regarding deprecations * fix typo * fix system.nim * fix random
* sets: minor documentation fixes [ci skip] (#11377)Jjp1372019-06-021-3/+3
| | | | | | | | Mainly replace a backslash, which was supposed to represent set difference, with the Unicode symbol for set difference (U+2216). The backslash did not appear in the output since it is used to escape characters in reST. Fix a few typos as well.
* Initialized collections (#11094)Miran2019-04-291-313/+160
| | | | | | | | | | | | * tables: initialized by default * sets: initialized by default * DRY: extract shared functionality * add a changelog entry * fix errors * don't test include files * make it work for sharedtables * fix discovered bugs * add exhaustive tests
* make sets.nim useful for selective 'from import'sAraq2019-04-051-57/+54
|
* stdlib: use system.default if it existsAndreas Rumpf2019-03-051-9/+6
|
* use `initHashSet` and `toHashSet`, fixes #10730 (#10736)Miran2019-02-251-86/+94
|
* Better docs for sets and intsets (#10362)Miran2019-01-221-534/+779
| | | | | | * better docs: sets * better docs: intsets
* Remove long deprecated stuff (#10332)Miran2019-01-181-7/+0
|
* make the stdlib work with the changed docgenAraq2019-01-111-1/+1
|
* removes deprecated T/P typesAraq2018-11-161-4/+0
|
* Fix OrderedSet.excl (#9287)Oscar Nihlgård2018-10-111-34/+29
|
* make more things compile without isNilAraq2018-08-221-1/+1
|
* even more strict isNil handling for strings/seqs in order to detect bugsAraq2018-08-221-1/+1
|
* add sets.pop procedure (analogue to python) (#8383)skilchen2018-07-211-0/+12
|
* Modify hash for HashSet to use `xor` to mix hash of items.Lolo Iccl2018-05-091-5/+2
|
* Modify previous commit and add testsLolo Iccl2018-05-091-2/+5
|
* Modify previous commitLolo Iccl2018-05-091-4/+8
| | | | | Modify previous commit to use data[h].hcode in proc hash for HashSet and for OrderedSet.
* Add proc hash for HashSet and for OrderedSetLolo Iccl2018-05-091-0/+10
| | | | close #7772
* Fix documentation link for set type (#7465)Roman Ovseitsev2018-04-031-1/+1
|
* Improved collection-to-string behavior (#6825)Fabian Keller2017-12-141-1/+1
|
* fix ordered set equality (#6791)andri lim2017-11-241-5/+21
|
* Sets enhancements, fixes #2467 (#6158)GULPF2017-09-201-4/+94
|
* Add counterpart to containsOrIncl for excl (#6360)superfunc2017-09-151-11/+29
|
* Added clear() function for OrderedSet and HashSet. (#5545)GrundleTrundle2017-03-161-0/+25
|
* More workarounds for #5098Yuriy Glukhov2016-12-071-1/+3
|
* expr and stmt are now deprecatedAndreas Rumpf2016-07-301-2/+2
|
* stdlib and compiler don't use .immediate anymoreAndreas Rumpf2016-07-291-2/+2
|
* Update sets examples so they work again.Matthew Baulch2016-07-061-3/+3
|
* attempt to fix a critical memory leak in Nim's collectionsAndreas Rumpf2016-06-151-0/+4
|
* Removed unused import of 'os' module from module 'sets'Rostyslav Dzinko2016-03-041-1/+1
|
* Don't expect all keys in hashsets to have $ definedSamantha Doran2016-03-011-1/+5
|
* nimrod -> nimErik Johansson Andersson2016-02-051-1/+1
|
* udpated the compiler and tester to use getOrDefaultAraq2015-10-131-2/+2
|
* Merge branch 'mget' of https://github.com/def-/Nim into def--mgetAraq2015-10-131-1/+8
|\ | | | | | | | | | | | | | | | | | | Conflicts: lib/pure/collections/critbits.nim lib/pure/collections/tables.nim lib/pure/xmltree.nim lib/system/sets.nim tests/collections/ttables.nim tests/collections/ttablesref.nim
| * Rename mget to `[]`def2015-03-311-1/+8
| | | | | | | | | | | | | | | | | | - In sets, tables, strtabs, critbits, xmltree - This uses the new var parameter overloading - mget variants still exist, but are deprecated in favor of `[]` - Includes tests and fixed tests and usages of mget - The non-var `[]` now throws an exception instead of returning binary 0 or an empty string
* | lib/pure/a-c - Dropped 'T' from typespdw2015-06-041-25/+25
| |
* | Don't run non-test code when defined(testing)Oleh Prypin2015-04-211-1/+2
| |
* | Use more Natural and Positive numbers in proc parametersdef2015-04-061-1/+1
| | | | | | | | | | - Didn't go through all modules, only the main ones I thought of - Building the compiler and tests still work
* | Fix warning about sets.testModule() not used.ReneSac2015-04-041-174/+175
|/
* assignment -> shallowCopy for efficiency.Charles Blake2015-02-131-1/+1
|