Class ChunkedHashStore<T>

  • All Implemented Interfaces:
    it.unimi.dsi.io.SafelyCloseable, java.io.Closeable, java.io.Serializable, java.lang.AutoCloseable, java.lang.Iterable<ChunkedHashStore.Chunk>

    @Deprecated
    public class ChunkedHashStore<T>
    extends java.lang.Object
    implements java.io.Serializable, it.unimi.dsi.io.SafelyCloseable, java.lang.Iterable<ChunkedHashStore.Chunk>
    Deprecated.
    Please use a BucketedHashStore, which provides accurate bucket sizing.
    A temporary store of hash triples virtually divided into chunks.

    A chunked hash store accumulates elements (objects of type T) by turning them into bit vectors (using a provided TransformationStrategy) and then hashing such vectors into a triple of longs (i.e., overall we get a hash of 192 bits). Elements can be added one by one or in batches. Elements must be distinct, or, more precisely, they must be transformed into distinct bit vectors.

    Besides the hashes, we store some data associated with each element: if no data is specified, we store the rank of each element added (the first element added has rank 0, the second one has rank 1, and so on), unless you specified at construction time a nonzero hash width: in that case, the value stored by add(Object) will be given the lowest bits of the first hash of the triple associated with the object (the hash width is the number of bits stored). This feature makes it possible, for example, to implement a static dictionary using a GOV3Function.

    Once all elements have been added, they can be gathered into chunks whose tentative size can be set by calling log2Chunks(int). More precisely, if the latter method is called with argument k, then each chunk will be formed by grouping triples by the k most significant bits of their first hash.

    To obtain triples, one calls iterator(), which returns chunks one at a time (in their natural order); triples within each chunk are returned by increasing hash. Actually, the iterator provided by a chunk returns a quadruple whose last element is the data associated with the element that generated the triple.

    It is possible (albeit very unlikely) that different elements generate the same hash. This event is detected during chunk iteration (not while accumulating hashes), and it will throw a ChunkedHashStore.DuplicateException. At that point, the caller must handle the exception by resetting the store and trying again from scratch. Note that after a few (say, three) exceptions you can safely assume that there are duplicate elements. If you need to force a check on the whole store you can call check(). If all your elements come from an Iterable, checkAndRetry(Iterable, LongIterable) will try three times to build a checked chunked hash store.

    Every reset(long) changes the seed used by the store to generate triples. So, if this seed has to be stored this must happen after the last call to reset(long). To help tracking this fact, a call to seed() will lock the store; any further call to reset(long) will throw an IllegalStateException. In case the store needs to be reused, you can call clear(), that will bring back the store to after-creation state.

    When you have finished using a chunked hash store, you should close() it. This class implements SafelyCloseable, and thus provides a safety-net finalizer.

    Filtering

    You can at any time set a predicate that will filter the triples returned by the store.

    Computing frequencies

    If you specify so at construction time, a chunked hash store will compute for you a a map from values to their frequency.

    Implementation details

    Internally, a chunked hash store has a notion of disk chunk: triples are stored on disk using a fixed number of bits. Once the user chooses a chunk size, the store exhibits the data on disk by grouping disk chunks or splitting them in a suitable way. This process is transparent to the user.

    An instance of this class will save triples into DISK_CHUNKS disk chunks. Triples have to be loaded into memory only chunk by chunk, so to be sorted and tested for uniqueness. As long as DISK_CHUNKS is larger than eight, the store will need less than one bit per element of main memory. DISK_CHUNKS can be increased arbitrarily at compile time, but each store will open DISK_CHUNKS files at the same time. (For the same reason, it is strongly suggested that you close your stores as soon as you do not need them).

    Intended usage

    Chunked hash stores should be built by classes that need to manipulate elements in chunks of approximate given size without needing access to the elements themselves, but just to their triples, a typical example being GOV3Function, which uses the triples to compute a 3-hyperedge. Once a chunked hash store is built, it can be passed on to further substructures, reducing greatly the computation time (as the original collection need not to be scanned again).

    To compute the chunk corresponding to given element, use

     final long[] h = new long[3];
     Hashes.spooky4(transform.toBitVector(key), seed, h);
     final int chunk = chunkShift == Long.SIZE ? 0 : (int)(h[0] >>> chunkShift);
     
    where seed is the store seed, and chunkShift is the return value of log2Chunks(int) and should be stored by the caller. Note that you do not need the chunked hash store to compute these data, but just its seed and the chunk shift.
    Since:
    1.0.4
    Author:
    Sebastiano Vigna
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int BUFFER_SIZE
      Deprecated.
      The size of the output buffers.
      static int DISK_CHUNKS
      Deprecated.
      The number of physical disk chunks.
      static int DISK_CHUNKS_SHIFT
      Deprecated.
      The shift for physical disk chunks.
      protected long filteredSize
      Deprecated.
      The number of elements that pass the current filter, or -1 we it must be recomputed.
      static int LOG2_DISK_CHUNKS
      Deprecated.
      The logarithm of the number of physical disk chunks.
      protected long seed
      Deprecated.
      The seed used to generate the hash triples.
      static long serialVersionUID
      Deprecated.
       
      protected long size
      Deprecated.
      The number of elements ever added.
    • Constructor Summary

      Constructors 
      Constructor Description
      ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform)
      Deprecated.
      Creates a chunked hash store with given transformation strategy.
      ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform, it.unimi.dsi.logging.ProgressLogger pl)
      Deprecated.
      Creates a chunked hash store with given transformation strategy.
      ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform, java.io.File tempDir)
      Deprecated.
      Creates a chunked hash store with given transformation strategy and temporary file directory.
      ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform, java.io.File tempDir, int hashWidthOrCountValues, it.unimi.dsi.logging.ProgressLogger pl)
      Deprecated.
      Creates a chunked hash store with given transformation strategy, hash width and progress logger.
      ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform, java.io.File tempDir, it.unimi.dsi.logging.ProgressLogger pl)
      Deprecated.
      Creates a chunked hash store with given transformation strategy and progress logger.
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void add​(T o)
      Deprecated.
      Adds an element to this store, associating it with its ordinal position.
      void add​(T o, long value)
      Deprecated.
      Adds an element to this store, associating it with a specified value.
      void addAll​(java.util.Iterator<? extends T> elements)
      Deprecated.
      Adds the elements returned by an iterator to this store, associating them with their ordinal position.
      void addAll​(java.util.Iterator<? extends T> elements, it.unimi.dsi.fastutil.longs.LongIterator values)
      Deprecated.
      Adds the elements returned by an iterator to this store, associating them with specified values.
      void addAll​(java.util.Iterator<? extends T> elements, it.unimi.dsi.fastutil.longs.LongIterator values, boolean requiresValue2CountMap)
      Deprecated.
      Adds the elements returned by an iterator to this store, associating them with specified values, possibly building the associated value frequency map.
      void check()
      Deprecated.
      Checks that this store has no duplicate triples, throwing an exception if this fails to happen.
      void checkAndRetry​(java.lang.Iterable<? extends T> iterable)
      Deprecated.
      Checks that this store has no duplicate triples, and try to rebuild if this fails to happen.
      void checkAndRetry​(java.lang.Iterable<? extends T> iterable, it.unimi.dsi.fastutil.longs.LongIterable values)
      Deprecated.
      Checks that this store has no duplicate triples, and try to rebuild if this fails to happen.
      void clear()
      Deprecated.
      Clears this store.
      void close()
      Deprecated.
      Closes this store, disposing all associated resources.
      void filter​(org.apache.commons.collections4.Predicate<long[]> filter)
      Deprecated.
      Sets a filter for this store.
      protected void finalize()
      Deprecated.
       
      java.util.Iterator<ChunkedHashStore.Chunk> iterator()
      Deprecated.
      Returns an iterator over the chunks of this chunked hash store.
      int log2Chunks​(int log2chunks)
      Deprecated.
      Sets the number of chunks.
      void reset​(long seed)
      Deprecated.
      Resets this store using a new seed.
      long seed()
      Deprecated.
      Return the current seed of this chunked hash store.
      it.unimi.dsi.fastutil.longs.LongBigList signatures​(int signatureWidth, it.unimi.dsi.logging.ProgressLogger pl)
      Deprecated.
      Generate a list of signatures using the lowest bits of the first hash in this store.
      long size()
      Deprecated.
      Returns the size of this store.
      java.io.File tempDir()
      Deprecated.
      Return the temporary directory of this chunked hash store, or null.
      it.unimi.dsi.bits.TransformationStrategy<? super T> transform()
      Deprecated.
      Return the transformation strategy provided at construction time.
      it.unimi.dsi.fastutil.longs.Long2LongOpenHashMap value2FrequencyMap()
      Deprecated.
      Return the current value frequency map.
      • Methods inherited from class java.lang.Object

        clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.lang.Iterable

        forEach, spliterator
    • Field Detail

      • serialVersionUID

        public static final long serialVersionUID
        Deprecated.
        See Also:
        Constant Field Values
      • BUFFER_SIZE

        public static final int BUFFER_SIZE
        Deprecated.
        The size of the output buffers.
        See Also:
        Constant Field Values
      • LOG2_DISK_CHUNKS

        public static final int LOG2_DISK_CHUNKS
        Deprecated.
        The logarithm of the number of physical disk chunks.
        See Also:
        Constant Field Values
      • DISK_CHUNKS

        public static final int DISK_CHUNKS
        Deprecated.
        The number of physical disk chunks.
        See Also:
        Constant Field Values
      • DISK_CHUNKS_SHIFT

        public static final int DISK_CHUNKS_SHIFT
        Deprecated.
        The shift for physical disk chunks.
        See Also:
        Constant Field Values
      • size

        protected long size
        Deprecated.
        The number of elements ever added.
      • filteredSize

        protected long filteredSize
        Deprecated.
        The number of elements that pass the current filter, or -1 we it must be recomputed.
      • seed

        protected long seed
        Deprecated.
        The seed used to generate the hash triples.
    • Constructor Detail

      • ChunkedHashStore

        public ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform)
                         throws java.io.IOException
        Deprecated.
        Creates a chunked hash store with given transformation strategy.
        Parameters:
        transform - a transformation strategy for the elements.
        Throws:
        java.io.IOException
      • ChunkedHashStore

        public ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                                java.io.File tempDir)
                         throws java.io.IOException
        Deprecated.
        Creates a chunked hash store with given transformation strategy and temporary file directory.
        Parameters:
        transform - a transformation strategy for the elements.
        tempDir - a temporary directory for the store files, or null for the current directory.
        Throws:
        java.io.IOException
      • ChunkedHashStore

        public ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                                it.unimi.dsi.logging.ProgressLogger pl)
                         throws java.io.IOException
        Deprecated.
        Creates a chunked hash store with given transformation strategy.
        Parameters:
        transform - a transformation strategy for the elements.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
      • ChunkedHashStore

        public ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                                java.io.File tempDir,
                                it.unimi.dsi.logging.ProgressLogger pl)
                         throws java.io.IOException
        Deprecated.
        Creates a chunked hash store with given transformation strategy and progress logger.
        Parameters:
        transform - a transformation strategy for the elements.
        tempDir - a temporary directory for the store files, or null for the current directory.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
      • ChunkedHashStore

        public ChunkedHashStore​(it.unimi.dsi.bits.TransformationStrategy<? super T> transform,
                                java.io.File tempDir,
                                int hashWidthOrCountValues,
                                it.unimi.dsi.logging.ProgressLogger pl)
                         throws java.io.IOException
        Deprecated.
        Creates a chunked hash store with given transformation strategy, hash width and progress logger.
        Parameters:
        transform - a transformation strategy for the elements.
        tempDir - a temporary directory for the store files, or null for the current directory.
        hashWidthOrCountValues - if positive, no associated data is saved in the store: ChunkedHashStore.Chunk.data(long) will return this many lower bits of the first of the three hashes associated with the key; zero, values are stored; if negative, values are stored and a map from values to their frequency is computed.
        pl - a progress logger, or null.
        Throws:
        java.io.IOException
    • Method Detail

      • seed

        public long seed()
        Deprecated.
        Return the current seed of this chunked hash store. After calling this method, no reset(long) will be allowed (unless the store is cleared).
        Returns:
        the current seed of this chunked hash store.
      • tempDir

        public java.io.File tempDir()
        Deprecated.
        Return the temporary directory of this chunked hash store, or null.
        Returns:
        the temporary directory of this chunked hash store, or null.
      • transform

        public it.unimi.dsi.bits.TransformationStrategy<? super T> transform()
        Deprecated.
        Return the transformation strategy provided at construction time.
        Returns:
        the transformation strategy provided at construction time.
      • add

        public void add​(T o,
                        long value)
                 throws java.io.IOException
        Deprecated.
        Adds an element to this store, associating it with a specified value.
        Parameters:
        o - the element to be added.
        value - the associated value.
        Throws:
        java.io.IOException
      • add

        public void add​(T o)
                 throws java.io.IOException
        Deprecated.
        Adds an element to this store, associating it with its ordinal position.
        Parameters:
        o - the element to be added.
        Throws:
        java.io.IOException
      • addAll

        public void addAll​(java.util.Iterator<? extends T> elements,
                           it.unimi.dsi.fastutil.longs.LongIterator values,
                           boolean requiresValue2CountMap)
                    throws java.io.IOException
        Deprecated.
        Adds the elements returned by an iterator to this store, associating them with specified values, possibly building the associated value frequency map.
        Parameters:
        elements - an iterator returning elements.
        values - an iterator on values parallel to elements.
        requiresValue2CountMap - whether to build the value frequency map (associating with each value its frequency).
        Throws:
        java.io.IOException
      • addAll

        public void addAll​(java.util.Iterator<? extends T> elements,
                           it.unimi.dsi.fastutil.longs.LongIterator values)
                    throws java.io.IOException
        Deprecated.
        Adds the elements returned by an iterator to this store, associating them with specified values.
        Parameters:
        elements - an iterator returning elements.
        values - an iterator on values parallel to elements.
        Throws:
        java.io.IOException
      • addAll

        public void addAll​(java.util.Iterator<? extends T> elements)
                    throws java.io.IOException
        Deprecated.
        Adds the elements returned by an iterator to this store, associating them with their ordinal position.
        Parameters:
        elements - an iterator returning elements.
        Throws:
        java.io.IOException
      • size

        public long size()
                  throws java.io.IOException
        Deprecated.
        Returns the size of this store. Note that if you set up a filter, the first call to this method will require a scan to the whole store.
        Returns:
        the number of (possibly filtered) triples of this store.
        Throws:
        java.io.IOException
      • clear

        public void clear()
                   throws java.io.IOException
        Deprecated.
        Clears this store. After a call to this method, the store can be reused.
        Throws:
        java.io.IOException
      • value2FrequencyMap

        public it.unimi.dsi.fastutil.longs.Long2LongOpenHashMap value2FrequencyMap()
        Deprecated.
        Return the current value frequency map.
        Returns:
        the current value frequency map.
        Throws:
        java.lang.IllegalStateException - if this chunked hash store does not contain a value frequency map.
      • finalize

        protected void finalize()
                         throws java.lang.Throwable
        Deprecated.
        Overrides:
        finalize in class java.lang.Object
        Throws:
        java.lang.Throwable
      • close

        public void close()
                   throws java.io.IOException
        Deprecated.
        Closes this store, disposing all associated resources.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • reset

        public void reset​(long seed)
                   throws java.io.IOException
        Deprecated.
        Resets this store using a new seed. All accumulated data are cleared, and a new seed is reinstated.
        Parameters:
        seed - the new seed.
        Throws:
        java.lang.IllegalStateException - if this store was locked by a call to seed(), and never cleared thereafter.
        java.io.IOException
      • checkAndRetry

        public void checkAndRetry​(java.lang.Iterable<? extends T> iterable,
                                  it.unimi.dsi.fastutil.longs.LongIterable values)
                           throws java.io.IOException
        Deprecated.
        Checks that this store has no duplicate triples, and try to rebuild if this fails to happen.
        Parameters:
        iterable - the elements with which the store will be refilled if there are duplicate triples.
        values - the values that will be associated with the elements returned by iterable.
        Throws:
        java.lang.IllegalArgumentException - if after a few trials the store still contains duplicate triples.
        java.io.IOException
      • checkAndRetry

        public void checkAndRetry​(java.lang.Iterable<? extends T> iterable)
                           throws java.io.IOException
        Deprecated.
        Checks that this store has no duplicate triples, and try to rebuild if this fails to happen.

        Warning: the actions are executed exactly in the specified order—first check, then retry. If you invoke this method on an empty store you'll get a checked empty store.

        Parameters:
        iterable - the elements with which the store will be refilled if there are duplicate triples.
        Throws:
        java.lang.IllegalArgumentException - if after a few trials the store still contains duplicate triples.
        java.io.IOException
      • signatures

        public it.unimi.dsi.fastutil.longs.LongBigList signatures​(int signatureWidth,
                                                                  it.unimi.dsi.logging.ProgressLogger pl)
                                                           throws java.io.IOException
        Deprecated.
        Generate a list of signatures using the lowest bits of the first hash in this store.

        For this method to work, this store must contain ranks.

        Parameters:
        signatureWidth - the width in bits of the signatures.
        pl - a progress logger.
        Throws:
        java.io.IOException
      • log2Chunks

        public int log2Chunks​(int log2chunks)
        Deprecated.
        Sets the number of chunks.

        Once the store is filled, you must call this method to set the number of chunks. The store will take care of merging or fragmenting disk chunks to get exactly the desired chunks.

        Parameters:
        log2chunks - the base-2 logarithm of the number of chunks.
        Returns:
        the shift to be applied to the first hash of a triple to get the chunk number (see the introduction).
      • filter

        public void filter​(org.apache.commons.collections4.Predicate<long[]> filter)
        Deprecated.
        Sets a filter for this store.
        Parameters:
        filter - a predicate that will be used to filter triples.
      • iterator

        public java.util.Iterator<ChunkedHashStore.Chunk> iterator()
        Deprecated.
        Returns an iterator over the chunks of this chunked hash store.

        Note that at each iteration part of the state of this chunked hash store is reused. Thus, after each call to next() the previously returned ChunkedHashStore.Chunk will be no longer valid. Please use the provided copy constructor if you need to process in parallel several chunks.

        Specified by:
        iterator in interface java.lang.Iterable<T>
        Returns:
        an iterator over the chunks of this chunked hash store.