Class TwoStepsGOV3Function<T>

  • All Implemented Interfaces:
    it.unimi.dsi.fastutil.Function<T,​java.lang.Long>, it.unimi.dsi.fastutil.objects.Object2LongFunction<T>, it.unimi.dsi.fastutil.Size64, java.io.Serializable, java.util.function.Function<T,​java.lang.Long>, java.util.function.ToLongFunction<T>

    public class TwoStepsGOV3Function<T>
    extends AbstractHashFunction<T>
    implements java.io.Serializable, it.unimi.dsi.fastutil.Size64
    A function stored using two GOV3Functions—one for frequent values, and one for infrequent values. This naive idea turns out to be very effective in reducing the function size when the distribution of values is skewed (e.g., as it happens in a TwoStepsLcpMonotoneMinimalPerfectHashFunction).

    To create an instance, we perform a pre-scan of the values to be assigned. If possible, we finds the best possible r such that the 2r − 1 most frequent values can be stored in a GOV3Function and suitably remapped when read. The function uses 2r − 1 as an escape symbol for all other values, which are stored in a separate function.

    Warning: during the construction phase, a filter will be set on the BucketedHashStore used to store the keys. If you are passing a store, you will have to reset it to its previous state.

    Since:
    4.0
    Author:
    Sebastiano Vigna
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected int escape
      The escape value returned by firstFunction to suggest that secondFunction should be queried instead, provided that there is a first function.
      protected GOV3Function<T> firstFunction
      The first function, or null.
      protected long n
      The number of keys.
      protected double rankMean
      The mean of the rank distribution.
      protected long[] remap
      A mapping from values of the first function to actual values, provided that there is a first function.
      protected GOV3Function<T> secondFunction
      The second function.
      protected long seed
      The seed to be used when converting keys to signatures.
      static long serialVersionUID  
      protected it.unimi.dsi.bits.TransformationStrategy<? super T> transform
      The transformation strategy to turn objects of type T into bit vectors.
      protected int width
      The width of the output of this function, in bits.
      • Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defRetValue
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected TwoStepsGOV3Function​(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, it.unimi.dsi.fastutil.longs.LongBigList values, java.io.File tempDir, BucketedHashStore<T> bucketedHashStore)
      Creates a new two-step function for the given keys and values.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      long getLong​(java.lang.Object o)  
      long getLongBySignature​(long[] signature)  
      static void main​(java.lang.String[] arg)  
      long numBits()
      Returns the number of bits used by this structure.
      long size64()  
      • Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defaultReturnValue, defaultReturnValue
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface it.unimi.dsi.fastutil.Function

        apply, clear
      • Methods inherited from interface java.util.function.Function

        compose
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction

        andThen, andThenByte, andThenChar, andThenDouble, andThenFloat, andThenInt, andThenLong, andThenObject, andThenReference, andThenShort, applyAsLong, composeByte, composeChar, composeDouble, composeFloat, composeInt, composeLong, composeObject, composeReference, composeShort, get, getOrDefault, getOrDefault, put, put, remove, removeLong
      • Methods inherited from interface it.unimi.dsi.fastutil.Size64

        size
    • Field Detail

      • n

        protected final long n
        The number of keys.
      • transform

        protected final it.unimi.dsi.bits.TransformationStrategy<? super T> transform
        The transformation strategy to turn objects of type T into bit vectors.
      • firstFunction

        protected final GOV3Function<T> firstFunction
        The first function, or null. The special output value escape denotes that secondFunction should be queried instead.
      • remap

        protected final long[] remap
        A mapping from values of the first function to actual values, provided that there is a first function.
      • seed

        protected long seed
        The seed to be used when converting keys to signatures.
      • width

        protected final int width
        The width of the output of this function, in bits.
      • rankMean

        protected final double rankMean
        The mean of the rank distribution.
    • Constructor Detail

      • TwoStepsGOV3Function

        protected TwoStepsGOV3Function​(java.lang.Iterable<? extends T> keys,
                                       it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                                       it.unimi.dsi.fastutil.longs.LongBigList values,
                                       java.io.File tempDir,
                                       BucketedHashStore<T> bucketedHashStore)
                                throws java.io.IOException
        Creates a new two-step function for the given keys and values.
        Parameters:
        keys - the keys in the domain of the function.
        transform - a transformation strategy for the keys.
        values - values to be assigned to each key, in the same order of the iterator returned by keys; if null, the assigned value will the ordinal number of each key.
        tempDir - a temporary directory for the store files, or null for the standard temporary directory.
        bucketedHashStore - a bucketed hash store containing the keys associated with their rank, or null; the store can be unchecked, but in this case keys and transform must be non-null.
        Throws:
        java.io.IOException
    • Method Detail

      • getLong

        public long getLong​(java.lang.Object o)
        Specified by:
        getLong in interface it.unimi.dsi.fastutil.objects.Object2LongFunction<T>
      • getLongBySignature

        public long getLongBySignature​(long[] signature)
      • size64

        public long size64()
        Specified by:
        size64 in interface it.unimi.dsi.fastutil.Size64
        Overrides:
        size64 in class AbstractHashFunction<T>
      • numBits

        public long numBits()
        Returns the number of bits used by this structure.
        Returns:
        the number of bits used by this structure.
      • main

        public static void main​(java.lang.String[] arg)
                         throws java.lang.NoSuchMethodException,
                                java.io.IOException,
                                com.martiansoftware.jsap.JSAPException
        Throws:
        java.lang.NoSuchMethodException
        java.io.IOException
        com.martiansoftware.jsap.JSAPException