public interface Hash
Warning: the following comments are here for historical reasons,
and apply just to the double hash classes that can be optionally generated.
The standard fastutil
distribution since 6.1.0 uses linear-probing hash
tables, and tables are always sized as powers of two.
The classes in fastutil
are built around open-addressing hashing
implemented via double hashing. Following Knuth's suggestions in the third volume of The Art of Computer
Programming, we use for the table size a prime p such that
p-2 is also prime. In this way hashing is implemented with modulo p,
and secondary hashing with modulo p-2.
Entries in a table can be in three states: FREE
, OCCUPIED
or REMOVED
.
The naive handling of removed entries requires that you search for a free entry as if they were occupied. However,
fastutil
implements two useful optimizations, based on the following invariant:
Let i0, i1, …, ip-1 be the permutation of the table indices induced by the key k, that is, i0 is the hash of k and the following indices are obtained by adding (modulo p) the secondary hash plus one. If there is aOCCUPIED
entry with key k, its index in the sequence above comes before the indices of anyREMOVED
entries with key k.
When we search for the key k we scan the entries in the
sequence i0, i1, …,
ip-1 and stop when k is found,
when we finished the sequence or when we find a FREE
entry. Note
that the correctness of this procedure it is not completely trivial. Indeed,
when we stop at a REMOVED
entry with key k we must rely
on the invariant to be sure that no OCCUPIED
entry with the same
key can appear later. If we insert and remove frequently the same entries,
this optimization can be very effective (note, however, that when using
objects as keys or values deleted entries are set to a special fixed value to
optimize garbage collection).
Moreover, during the probe we keep the index of the first REMOVED
entry we meet.
If we actually have to insert a new element, we use that
entry if we can, thus avoiding to pollute another FREE
entry. Since this position comes
a fortiori before any REMOVED
entries with the same key, we are also keeping the invariant true.
Modifier and Type | Interface and Description |
---|---|
static interface |
Hash.Strategy<K>
A generic hash strategy.
|
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_GROWTH_FACTOR
The default growth factor of a hash table.
|
static int |
DEFAULT_INITIAL_SIZE
The initial default size of a hash table.
|
static float |
DEFAULT_LOAD_FACTOR
The default load factor of a hash table.
|
static float |
FAST_LOAD_FACTOR
The load factor for a (usually small) table that is meant to be particularly fast.
|
static byte |
FREE
The state of a free hash table entry.
|
static byte |
OCCUPIED
The state of a occupied hash table entry.
|
static int[] |
PRIMES
A list of primes to be used as table sizes.
|
static byte |
REMOVED
The state of a hash table entry freed by a deletion.
|
static float |
VERY_FAST_LOAD_FACTOR
The load factor for a (usually very small) table that is meant to be extremely fast.
|
static final int DEFAULT_INITIAL_SIZE
static final float DEFAULT_LOAD_FACTOR
static final float FAST_LOAD_FACTOR
static final float VERY_FAST_LOAD_FACTOR
static final int DEFAULT_GROWTH_FACTOR
static final byte FREE
static final byte OCCUPIED
static final byte REMOVED
static final int[] PRIMES