Class MWHCFunction<T>

  • All Implemented Interfaces:
    it.unimi.dsi.fastutil.Function<T,​java.lang.Long>, it.unimi.dsi.fastutil.objects.Object2LongFunction<T>, it.unimi.dsi.fastutil.Size64, java.io.Serializable, java.util.function.Function<T,​java.lang.Long>, java.util.function.ToLongFunction<T>

    @Deprecated
    public class MWHCFunction<T>
    extends it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction<T>
    implements java.io.Serializable, it.unimi.dsi.fastutil.Size64
    Deprecated.
    Please a GOV3Function or a GOV4Function.
    An immutable function stored quasi-succinctly using the Majewski-Wormald-Havas-Czech 3-hypergraph technique.

    Instances of this class store a function from keys to values. Keys are provided by an iterable object (whose iterators must return elements in a consistent order), whereas values are provided by a LongIterable. If you do nost specify values, each key will be assigned its rank (e.g., its position in iteration order starting from zero).

    For convenience, this class provides a main method that reads from standard input a (possibly gzip'd) sequence of newline-separated strings, and writes a serialised function mapping each element of the list to its position, or to a given list of values.

    Signing

    Optionally, it is possible to sign an MWHCFunction. Signing is possible if no list of values has been specified (otherwise, there is no way to associate a key with its signature). A w-bit signature will be associated with each key, so that getLong(Object) will return a default return value (by default, -1) on strings that are not in the original key set. As usual, false positives are possible with probability 2-w.

    If you're not interested in the rank of a key, but just to know whether the key was in the original set, you can turn the function into an approximate dictionary. In this case, the value associated by the function with a key is exactly its signature, which means that the only space used by the function is that occupied by signatures: this is one of the fastest and most compact way of storing a static approximate dictionary. In this case, the only returned value is one, and the default return value is set to zero.

    Building a function

    This class provides a great amount of flexibility when creating a new function; such flexibility is exposed through the builder. To exploit the various possibilities, you must understand some details of the construction.

    In a first phase, we build a ChunkedHashStore containing hashes of the keys. By default, the store will associate each hash with the rank of the key. If you specify values, the store will associate with each hash the corresponding value.

    However, if you further require an indirect construction the store will associate again each hash with the rank of the corresponding key, and access randomly the values (which must be either a LongList or a LongBigList). Indirect construction is useful only in complex, multi-layer hashes (such as an LcpMonotoneMinimalPerfectHashFunction) in which we want to reuse a checked ChunkedHashStore. Storing values in the ChunkedHashStore is extremely scalable because the values must just be a LongIterable that will be scanned sequentially during the store construction. On the other hand, if you have already a store that associates ordinal positions, and you want to build a new function for which a LongList or LongBigList of values needs little space (e.g., because it is described implicitly), you can opt for an indirect construction using the already built store.

    Note that if you specify a store it will be used before building a new one (possibly because of a ChunkedHashStore.DuplicateException), with obvious benefits in terms of performance. If the store is not checked, and a ChunkedHashStore.DuplicateException is thrown, the constructor will try to rebuild the store, but this requires, of course, that the keys, and possibly the values, are available. Note that it is your responsibility to pass a correct store.

    Implementation Details

    After generating a random 3-hypergraph, we sort its 3-hyperedges to that a distinguished vertex in each 3-hyperedge, the hinge, never appeared before. We then assign to each vertex a value in such a way that for each 3-hyperedge the XOR of the three values associated to its vertices is the required value for the corresponding element of the function domain (this is the standard Majewski-Wormald-Havas-Czech construction).

    Then, we measure whether it is favourable to compact the function, that is, to store nonzero values in a separate array, using a ranked marker array to record the positions of nonzero values.

    A non-compacted, r-bit MWHCFunction on n keys requires γrn bits, whereas the compacted version takes just (γ + r)n bits (plus the bits that are necessary for the ranking structure; the current implementation uses Rank16). This class will transparently choose the most space-efficient method.

    Since:
    0.2
    Author:
    Sebastiano Vigna
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected it.unimi.dsi.fastutil.longs.LongBigList data
      Deprecated.
      The final magick—the list of modulo-3 values that define the output of the minimal hash function.
      protected long globalSeed
      Deprecated.
      The seed used to generate the initial hash triple.
      static int LOG2_CHUNK_SIZE
      Deprecated.
      The logarithm of the desired chunk size.
      protected long m
      Deprecated.
      The number of vertices of the intermediate hypergraph.
      protected it.unimi.dsi.bits.LongArrayBitVector marker
      Deprecated.
      Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.
      protected long n
      Deprecated.
      The number of keys.
      protected long[] offset
      Deprecated.
      The start offset of each block.
      protected Rank16 rank
      Deprecated.
      The ranking structure on marker.
      protected long[] seed
      Deprecated.
      The seed of the underlying 3-hypergraphs.
      protected long signatureMask
      Deprecated.
      The mask to compare signatures, or zero for no signatures.
      protected it.unimi.dsi.fastutil.longs.LongBigList signatures
      Deprecated.
      The signatures.
      protected it.unimi.dsi.bits.TransformationStrategy<? super T> transform
      Deprecated.
      The transformation strategy to turn objects of type T into bit vectors.
      protected int width
      Deprecated.
      The data width.
      • Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defRetValue
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected MWHCFunction​(java.lang.Iterable<? extends T> keys, it.unimi.dsi.bits.TransformationStrategy<? super T> transform, int signatureWidth, it.unimi.dsi.fastutil.longs.LongIterable values, int dataWidth, java.io.File tempDir, ChunkedHashStore<T> chunkedHashStore, boolean indirect)
      Deprecated.
      Creates a new function for the given keys and values.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      boolean containsKey​(java.lang.Object o)
      Deprecated.
       
      long getLong​(java.lang.Object o)
      Deprecated.
       
      long getLongByTriple​(long[] triple)
      Deprecated.
      Low-level access to the output of this function.
      static void main​(java.lang.String[] arg)
      Deprecated.
       
      long numBits()
      Deprecated.
      Returns the number of bits used by this structure.
      int size()
      Deprecated.
      long size64()
      Deprecated.
      Returns the number of keys in the function domain.
      • Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defaultReturnValue, defaultReturnValue
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface it.unimi.dsi.fastutil.Function

        apply, clear
      • Methods inherited from interface java.util.function.Function

        compose
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction

        andThen, andThenByte, andThenChar, andThenDouble, andThenFloat, andThenInt, andThenLong, andThenObject, andThenReference, andThenShort, applyAsLong, composeByte, composeChar, composeDouble, composeFloat, composeInt, composeLong, composeObject, composeReference, composeShort, get, getOrDefault, getOrDefault, put, put, remove, removeLong
    • Field Detail

      • LOG2_CHUNK_SIZE

        public static final int LOG2_CHUNK_SIZE
        Deprecated.
        The logarithm of the desired chunk size.
        See Also:
        Constant Field Values
      • n

        protected final long n
        Deprecated.
        The number of keys.
      • m

        protected final long m
        Deprecated.
        The number of vertices of the intermediate hypergraph.
      • width

        protected final int width
        Deprecated.
        The data width.
      • globalSeed

        protected final long globalSeed
        Deprecated.
        The seed used to generate the initial hash triple.
      • seed

        protected final long[] seed
        Deprecated.
        The seed of the underlying 3-hypergraphs.
      • offset

        protected final long[] offset
        Deprecated.
        The start offset of each block.
      • data

        protected final it.unimi.dsi.fastutil.longs.LongBigList data
        Deprecated.
        The final magick—the list of modulo-3 values that define the output of the minimal hash function.
      • marker

        protected final it.unimi.dsi.bits.LongArrayBitVector marker
        Deprecated.
        Optionally, a rank structure built on this bit array is used to mark positions containing non-zero value; indexing in data is made by ranking if this field is non-null.
      • rank

        protected final Rank16 rank
        Deprecated.
        The ranking structure on marker.
      • transform

        protected final it.unimi.dsi.bits.TransformationStrategy<? super T> transform
        Deprecated.
        The transformation strategy to turn objects of type T into bit vectors.
      • signatureMask

        protected final long signatureMask
        Deprecated.
        The mask to compare signatures, or zero for no signatures.
      • signatures

        protected final it.unimi.dsi.fastutil.longs.LongBigList signatures
        Deprecated.
        The signatures.
    • Constructor Detail

      • MWHCFunction

        protected MWHCFunction​(java.lang.Iterable<? extends T> keys,
                               it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                               int signatureWidth,
                               it.unimi.dsi.fastutil.longs.LongIterable values,
                               int dataWidth,
                               java.io.File tempDir,
                               ChunkedHashStore<T> chunkedHashStore,
                               boolean indirect)
                        throws java.io.IOException
        Deprecated.
        Creates a new function for the given keys and values.
        Parameters:
        keys - the keys in the domain of the function, or null.
        transform - a transformation strategy for the keys.
        signatureWidth - a positive number for a signature width, 0 for no signature, a negative value for a self-signed function; if nonzero, values must be null and width must be -1.
        values - values to be assigned to each element, in the same order of the iterator returned by keys; if null, the assigned value will the ordinal number of each element.
        dataWidth - the bit width of the values, or -1 if values is null.
        tempDir - a temporary directory for the store files, or null for the standard temporary directory.
        chunkedHashStore - a chunked hash store containing the keys associated with their ranks (if there are no values, or indirect is true) or values, or null; the store can be unchecked, but in this case keys and transform must be non-null.
        indirect - if true, chunkedHashStore contains ordinal positions, and values is a LongIterable that must be accessed to retrieve the actual values.
        Throws:
        java.io.IOException
    • Method Detail

      • getLong

        public long getLong​(java.lang.Object o)
        Deprecated.
        Specified by:
        getLong in interface it.unimi.dsi.fastutil.objects.Object2LongFunction<T>
      • getLongByTriple

        public long getLongByTriple​(long[] triple)
        Deprecated.
        Low-level access to the output of this function.

        This method makes it possible to build several kind of functions on the same ChunkedHashStore and then retrieve the resulting values by generating a single triple of hashes. The method TwoStepsMWHCFunction.getLong(Object) is a good example of this technique.

        Parameters:
        triple - a triple generated as documented in ChunkedHashStore.
        Returns:
        the output of the function.
      • size64

        public long size64()
        Deprecated.
        Returns the number of keys in the function domain.
        Specified by:
        size64 in interface it.unimi.dsi.fastutil.Size64
        Returns:
        the number of the keys in the function domain.
      • size

        @Deprecated
        public int size()
        Deprecated.
        Specified by:
        size in interface it.unimi.dsi.fastutil.Function<T,​java.lang.Long>
        Specified by:
        size in interface it.unimi.dsi.fastutil.Size64
      • numBits

        public long numBits()
        Deprecated.
        Returns the number of bits used by this structure.
        Returns:
        the number of bits used by this structure.
      • containsKey

        public boolean containsKey​(java.lang.Object o)
        Deprecated.
        Specified by:
        containsKey in interface it.unimi.dsi.fastutil.Function<T,​java.lang.Long>
      • main

        public static void main​(java.lang.String[] arg)
                         throws java.lang.NoSuchMethodException,
                                java.io.IOException,
                                com.martiansoftware.jsap.JSAPException
        Deprecated.
        Throws:
        java.lang.NoSuchMethodException
        java.io.IOException
        com.martiansoftware.jsap.JSAPException