Class GOV3Function<T>
- java.lang.Object
-
- it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T>
-
- it.unimi.dsi.sux4j.mph.GOV3Function<T>
-
- All Implemented Interfaces:
it.unimi.dsi.fastutil.Function<T,java.lang.Long>
,it.unimi.dsi.fastutil.objects.Object2LongFunction<T>
,it.unimi.dsi.fastutil.Size64
,java.io.Serializable
,java.util.function.Function<T,java.lang.Long>
,java.util.function.ToLongFunction<T>
public class GOV3Function<T> extends it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T> implements java.io.Serializable, it.unimi.dsi.fastutil.Size64
An immutable function stored quasi-succinctly using the Genuzio-Ottaviano-Vigna method to solve F2-linear systems.Instances of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a
LongIterable
. If you do not specify values, each key will be assigned its rank (e.g., its position in iteration order starting from zero).For convenience, this class provides a main method that reads from standard input a (possibly
gzip
'd) sequence of newline-separated strings, and writes a serialised function mapping each element of the list to its position, or to a given list of values.Signing
Optionally, it is possible to sign a
GOV3Function
. Signing is possible if no list of values has been specified (otherwise, there is no way to associate a key with its signature). A w-bit signature will be associated with each key, so thatgetLong(Object)
will return a default return value (by default, -1) on strings that are not in the original key set. As usual, false positives are possible with probability 2-w.If you're not interested in the rank of a key, but just to know whether the key was in the original set, you can turn the function into an approximate dictionary. In this case, the value associated by the function with a key is exactly its signature, which means that the only space used by the function is that occupied by signatures: this is one of the fastest and most compact way of storing a static approximate dictionary. In this case, the only returned value is one, and the default return value is set to zero.
Building a function
This class provides a great amount of flexibility when creating a new function; such flexibility is exposed through the builder. To exploit the various possibilities, you must understand some details of the construction.
In a first phase, we build a
BucketedHashStore
containing hashes of the keys. By default, the store will associate each hash with the rank of the key. If you specify values, the store will associate with each hash the corresponding value.However, if you further require an indirect construction the store will associate again each hash with the rank of the corresponding key, and access randomly the values (which must be either a
LongList
or aLongBigList
). Indirect construction is useful only in complex, multi-layer hashes (such as anLcpMonotoneMinimalPerfectHashFunction
) in which we want to reuse a checkedBucketedHashStore
. Storing values in theBucketedHashStore
is extremely scalable because the values must just be aLongIterable
that will be scanned sequentially during the store construction. On the other hand, if you have already a store that associates ordinal positions, and you want to build a new function for which aLongList
orLongBigList
of values needs little space (e.g., because it is described implicitly), you can opt for an indirect construction using the already built store.Note that if you specify a store it will be used before building a new one (possibly because of a
DuplicateException
), with obvious benefits in terms of performance. If the store is not checked, and aDuplicateException
is thrown, the constructor will try to rebuild the store, but this requires, of course, that the keys, and possibly the values, are available. Note that it is your responsibility to pass a correct store.Multithreading
This implementation is multithreaded: each bucket returned by the
BucketedHashStore
is processed independently. By default, this class usesRuntime.availableProcessors()
parallel threads, but by default no more than 4. If you wish to set a specific number of threads, you can do so through the system property "it.unimi.dsi.sux4j.mph.threads".Implementation Details
The detail of the data structure can be found in “Fast Scalable Construction of (Minimal Perfect Hash) Functions”, by Marco Genuzio, Giuseppe Ottaviano and Sebastiano Vigna, 15th International Symposium on Experimental Algorithms — SEA 2016, Lecture Notes in Computer Science, Springer, 2016. We generate a random 3-regular linear system on F2, where the known term of the k-th equation is the output value for the k-th key. Then, we solve it and store the solution. Since the system must have ≈10% more variables than equations to be solvable, an r-bit
GOV3Function
on n keys requires 1.1rn bits.Optionally, you may require a compacted function, that is, a function storing nonzero values in a separate array and using a ranked marker array to record the positions of nonzero values. In this case, the function requires just (1.1 + r)n bits (plus the bits that are necessary for the ranking structure; the current implementation uses
Rank16
), but has slightly slower lookups.- Since:
- 4.0.0
- Author:
- Sebastiano Vigna
- See Also:
GOV4Function
, Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GOV3Function.Builder<T>
A builder class forGOV3Function
.
-
Field Summary
Fields Modifier and Type Field Description static int
BUCKET_SIZE
The expected bucket size.static double
C
The ratio between variables and equations.protected it.unimi.dsi.fastutil.longs.LongBigList
data
The final magick—the list of values that define the output of the function.protected long
globalSeed
The seed used to generate the initial signature.protected long
m
The number of variables.protected it.unimi.dsi.bits.LongArrayBitVector
marker
protected long
n
The number of keys.static java.lang.String
NUMBER_OF_THREADS_PROPERTY
The system property used to set the number of parallel threads.protected long[]
offsetAndSeed
A long containing the start offset of each bucket in the lower 56 bits, and the local seed of each bucket in the upper 8 bits.protected Rank16
rank
The ranking structure onmarker
.protected long
signatureMask
The mask to compare signatures, or zero for no signatures.protected it.unimi.dsi.fastutil.longs.LongBigList
signatures
The signatures.protected it.unimi.dsi.bits.TransformationStrategy<? super T>
transform
The transformation strategy to turn objects of typeT
into bit vectors.protected int
width
The data width.
-
Constructor Summary
Constructors Modifier Constructor Description protected
GOV3Function(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, int signatureWidth, it.unimi.dsi.fastutil.longs.LongIterable values, int dataWidth, boolean compacted, java.io.File tempDir, BucketedHashStore<T> bucketedHashStore, boolean indirect)
Creates a new function for the given keys and values.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
containsKey(java.lang.Object o)
void
dump(java.lang.String file)
long
getLong(java.lang.Object o)
long
getLongBySignature(long[] signature)
Low-level access to the output of this function.static void
main(java.lang.String[] arg)
long
numBits()
Returns the number of bits used by this structure.int
size()
Deprecated.long
size64()
Returns the number of keys in the function domain.-
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defaultReturnValue, defaultReturnValue
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction
andThen, andThenByte, andThenChar, andThenDouble, andThenFloat, andThenInt, andThenLong, andThenObject, andThenReference, andThenShort, applyAsLong, composeByte, composeChar, composeDouble, composeFloat, composeInt, composeLong, composeObject, composeReference, composeShort, get, getOrDefault, getOrDefault, put, put, remove, removeLong
-
-
-
-
Field Detail
-
C
public static double C
The ratio between variables and equations.
-
NUMBER_OF_THREADS_PROPERTY
public static final java.lang.String NUMBER_OF_THREADS_PROPERTY
The system property used to set the number of parallel threads.- See Also:
- Constant Field Values
-
BUCKET_SIZE
public static final int BUCKET_SIZE
The expected bucket size.- See Also:
- Constant Field Values
-
n
protected final long n
The number of keys.
-
m
protected final long m
The number of variables.
-
width
protected final int width
The data width.
-
globalSeed
protected final long globalSeed
The seed used to generate the initial signature.
-
offsetAndSeed
protected final long[] offsetAndSeed
A long containing the start offset of each bucket in the lower 56 bits, and the local seed of each bucket in the upper 8 bits.
-
data
protected final it.unimi.dsi.fastutil.longs.LongBigList data
The final magick—the list of values that define the output of the function.
-
marker
protected final it.unimi.dsi.bits.LongArrayBitVector marker
-
transform
protected final it.unimi.dsi.bits.TransformationStrategy<? super T> transform
The transformation strategy to turn objects of typeT
into bit vectors.
-
signatureMask
protected final long signatureMask
The mask to compare signatures, or zero for no signatures.
-
signatures
protected final it.unimi.dsi.fastutil.longs.LongBigList signatures
The signatures.
-
-
Constructor Detail
-
GOV3Function
protected GOV3Function(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, int signatureWidth, it.unimi.dsi.fastutil.longs.LongIterable values, int dataWidth, boolean compacted, java.io.File tempDir, BucketedHashStore<T> bucketedHashStore, boolean indirect) throws java.io.IOException
Creates a new function for the given keys and values.- Parameters:
keys
- the keys in the domain of the function, ornull
.transform
- a transformation strategy for the keys.signatureWidth
- a positive number for a signature width, 0 for no signature, a negative value for a self-signed function; if nonzero,values
must benull
andwidth
must be -1.values
- values to be assigned to each element, in the same order of the iterator returned bykeys
; ifnull
, the assigned value will the ordinal number of each element.dataWidth
- the bit width of thevalues
, or -1 ifvalues
isnull
.compacted
- if true, the coefficients will be compacted.tempDir
- a temporary directory for the store files, ornull
for the standard temporary directory.bucketedHashStore
- a bucketed hash store containing the keys associated with their ranks (if there are no values, orindirect
is true) or values, ornull
; the store can be unchecked, but in this casekeys
andtransform
must be non-null
.indirect
- if true,bucketedHashStore
contains ordinal positions, andvalues
is aLongIterable
that must be accessed to retrieve the actual values.- Throws:
java.io.IOException
-
-
Method Detail
-
getLong
public long getLong(java.lang.Object o)
- Specified by:
getLong
in interfaceit.unimi.dsi.fastutil.objects.Object2LongFunction<T>
-
getLongBySignature
public long getLongBySignature(long[] signature)
Low-level access to the output of this function.This method makes it possible to build several kind of functions on the same
BucketedHashStore
and then retrieve the resulting values by generating a single signature. The methodTwoStepsGOV3Function.getLong(Object)
is a good example of this technique.- Parameters:
signature
- a signature generated as documented inBucketedHashStore
.- Returns:
- the output of the function.
-
size64
public long size64()
Returns the number of keys in the function domain.- Specified by:
size64
in interfaceit.unimi.dsi.fastutil.Size64
- Returns:
- the number of the keys in the function domain.
-
size
@Deprecated public int size()
Deprecated.- Specified by:
size
in interfaceit.unimi.dsi.fastutil.Function<T,java.lang.Long>
- Specified by:
size
in interfaceit.unimi.dsi.fastutil.Size64
-
numBits
public long numBits()
Returns the number of bits used by this structure.- Returns:
- the number of bits used by this structure.
-
containsKey
public boolean containsKey(java.lang.Object o)
- Specified by:
containsKey
in interfaceit.unimi.dsi.fastutil.Function<T,java.lang.Long>
-
dump
public void dump(java.lang.String file) throws java.io.IOException
- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] arg) throws java.lang.NoSuchMethodException, java.io.IOException, com.martiansoftware.jsap.JSAPException
- Throws:
java.lang.NoSuchMethodException
java.io.IOException
com.martiansoftware.jsap.JSAPException
-
-