Merge: #20647: Update dictobject.c comments to account for randomized string hashes.

This commit is contained in:
R David Murray 2016-07-10 12:40:03 -04:00
commit ce85acff3a

View File

@ -88,20 +88,17 @@ it's USABLE_FRACTION (currently two-thirds) full.
/* /*
Major subtleties ahead: Most hash schemes depend on having a "good" hash Major subtleties ahead: Most hash schemes depend on having a "good" hash
function, in the sense of simulating randomness. Python doesn't: its most function, in the sense of simulating randomness. Python doesn't: its most
important hash functions (for strings and ints) are very regular in common important hash functions (for ints) are very regular in common
cases: cases:
>>> map(hash, (0, 1, 2, 3)) >>>[hash(i) for i in range(4)]
[0, 1, 2, 3] [0, 1, 2, 3]
>>> map(hash, ("namea", "nameb", "namec", "named"))
[-1658398457, -1658398460, -1658398459, -1658398462]
>>>
This isn't necessarily bad! To the contrary, in a table of size 2**i, taking This isn't necessarily bad! To the contrary, in a table of size 2**i, taking
the low-order i bits as the initial table index is extremely fast, and there the low-order i bits as the initial table index is extremely fast, and there
are no collisions at all for dicts indexed by a contiguous range of ints. are no collisions at all for dicts indexed by a contiguous range of ints. So
The same is approximately true when keys are "consecutive" strings. So this this gives better-than-random behavior in common cases, and that's very
gives better-than-random behavior in common cases, and that's very desirable. desirable.
OTOH, when collisions occur, the tendency to fill contiguous slices of the OTOH, when collisions occur, the tendency to fill contiguous slices of the
hash table makes a good collision resolution strategy crucial. Taking only hash table makes a good collision resolution strategy crucial. Taking only