Package it.unimi.dsi.sux4j.mph
Class TwoStepsGOV3Function<T>
- java.lang.Object
-
- it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<K>
-
- it.unimi.dsi.sux4j.mph.AbstractHashFunction<T>
-
- it.unimi.dsi.sux4j.mph.TwoStepsGOV3Function<T>
-
- All Implemented Interfaces:
it.unimi.dsi.fastutil.Function<T,java.lang.Long>
,it.unimi.dsi.fastutil.objects.Object2LongFunction<T>
,it.unimi.dsi.fastutil.Size64
,java.io.Serializable
,java.util.function.Function<T,java.lang.Long>
,java.util.function.ToLongFunction<T>
public class TwoStepsGOV3Function<T> extends AbstractHashFunction<T> implements java.io.Serializable, it.unimi.dsi.fastutil.Size64
A function stored using two GOV3Functions—one for frequent values, and one for infrequent values. This naive idea turns out to be very effective in reducing the function size when the distribution of values is skewed (e.g., as it happens in aTwoStepsLcpMonotoneMinimalPerfectHashFunction
).To create an instance, we perform a pre-scan of the values to be assigned. If possible, we finds the best possible r such that the 2r − 1 most frequent values can be stored in a
GOV3Function
and suitably remapped when read. The function uses 2r − 1 as an escape symbol for all other values, which are stored in a separate function.Warning: during the construction phase, a filter will be set on the
BucketedHashStore
used to store the keys. If you are passing a store, you will have to reset it to its previous state.- Since:
- 4.0
- Author:
- Sebastiano Vigna
- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TwoStepsGOV3Function.Builder<T>
A builder class forTwoStepsGOV3Function
.
-
Field Summary
Fields Modifier and Type Field Description protected int
escape
The escape value returned byfirstFunction
to suggest thatsecondFunction
should be queried instead, provided that there is a first function.protected GOV3Function<T>
firstFunction
The first function, ornull
.protected long
n
The number of keys.protected double
rankMean
The mean of the rank distribution.protected long[]
remap
A mapping from values of the first function to actual values, provided that there is a first function.protected GOV3Function<T>
secondFunction
The second function.protected long
seed
The seed to be used when converting keys to signatures.static long
serialVersionUID
protected it.unimi.dsi.bits.TransformationStrategy<? super T>
transform
The transformation strategy to turn objects of typeT
into bit vectors.protected int
width
The width of the output of this function, in bits.
-
Constructor Summary
Constructors Modifier Constructor Description protected
TwoStepsGOV3Function(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, it.unimi.dsi.fastutil.longs.LongBigList values, java.io.File tempDir, BucketedHashStore<T> bucketedHashStore)
Creates a new two-step function for the given keys and values.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description long
getLong(java.lang.Object o)
long
getLongBySignature(long[] signature)
static void
main(java.lang.String[] arg)
long
numBits()
Returns the number of bits used by this structure.long
size64()
-
Methods inherited from class it.unimi.dsi.sux4j.mph.AbstractHashFunction
containsKey, size
-
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction
defaultReturnValue, defaultReturnValue
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction
andThen, andThenByte, andThenChar, andThenDouble, andThenFloat, andThenInt, andThenLong, andThenObject, andThenReference, andThenShort, applyAsLong, composeByte, composeChar, composeDouble, composeFloat, composeInt, composeLong, composeObject, composeReference, composeShort, get, getOrDefault, getOrDefault, put, put, remove, removeLong
-
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
- See Also:
- Constant Field Values
-
n
protected final long n
The number of keys.
-
transform
protected final it.unimi.dsi.bits.TransformationStrategy<? super T> transform
The transformation strategy to turn objects of typeT
into bit vectors.
-
firstFunction
protected final GOV3Function<T> firstFunction
The first function, ornull
. The special output valueescape
denotes thatsecondFunction
should be queried instead.
-
secondFunction
protected final GOV3Function<T> secondFunction
The second function. All queries for whichfirstFunction
returnsescape
(or simply all queries, iffirstFunction
isnull
) will be rerouted here.
-
remap
protected final long[] remap
A mapping from values of the first function to actual values, provided that there is a first function.
-
escape
protected final int escape
The escape value returned byfirstFunction
to suggest thatsecondFunction
should be queried instead, provided that there is a first function.
-
seed
protected long seed
The seed to be used when converting keys to signatures.
-
width
protected final int width
The width of the output of this function, in bits.
-
rankMean
protected final double rankMean
The mean of the rank distribution.
-
-
Constructor Detail
-
TwoStepsGOV3Function
protected TwoStepsGOV3Function(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, it.unimi.dsi.fastutil.longs.LongBigList values, java.io.File tempDir, BucketedHashStore<T> bucketedHashStore) throws java.io.IOException
Creates a new two-step function for the given keys and values.- Parameters:
keys
- the keys in the domain of the function.transform
- a transformation strategy for the keys.values
- values to be assigned to each key, in the same order of the iterator returned bykeys
; ifnull
, the assigned value will the ordinal number of each key.tempDir
- a temporary directory for the store files, ornull
for the standard temporary directory.bucketedHashStore
- a bucketed hash store containing the keys associated with their rank, ornull
; the store can be unchecked, but in this casekeys
andtransform
must be non-null
.- Throws:
java.io.IOException
-
-
Method Detail
-
getLong
public long getLong(java.lang.Object o)
- Specified by:
getLong
in interfaceit.unimi.dsi.fastutil.objects.Object2LongFunction<T>
-
getLongBySignature
public long getLongBySignature(long[] signature)
-
size64
public long size64()
- Specified by:
size64
in interfaceit.unimi.dsi.fastutil.Size64
- Overrides:
size64
in classAbstractHashFunction<T>
-
numBits
public long numBits()
Returns the number of bits used by this structure.- Returns:
- the number of bits used by this structure.
-
main
public static void main(java.lang.String[] arg) throws java.lang.NoSuchMethodException, java.io.IOException, com.martiansoftware.jsap.JSAPException
- Throws:
java.lang.NoSuchMethodException
java.io.IOException
com.martiansoftware.jsap.JSAPException
-
-