Class ImmutableExternalPrefixMap

  • All Implemented Interfaces:
    PrefixMap<MutableString>, StringMap<MutableString>, it.unimi.dsi.fastutil.Function<java.lang.CharSequence,​java.lang.Long>, it.unimi.dsi.fastutil.objects.Object2LongFunction<java.lang.CharSequence>, it.unimi.dsi.fastutil.Size64, java.io.Serializable, java.util.function.Function<java.lang.CharSequence,​java.lang.Long>, java.util.function.ToLongFunction<java.lang.CharSequence>

    public class ImmutableExternalPrefixMap
    extends AbstractPrefixMap
    implements java.io.Serializable
    An immutable prefix map mostly stored in external memory.
    Since:
    2.0
    Author:
    Sebastiano Vigna
    See Also:
    ImmutableExternalPrefixMap, Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected long[][] blockOffset
      A big array array parallel to blockStart giving the offset in blocks in the dump file of the corresponding word in blockStart.
      protected long blockSize
      The block size of this (in bits).
      protected long[][] blockStart
      The index of the first word in each block, plus an additional entry containing Function.size().
      protected it.unimi.dsi.fastutil.chars.Char2IntOpenHashMap char2symbol
      A map from characters to symbols of the coder.
      protected Decoder decoder
      A decoder used to read data from the dump stream.
      protected InputBitStream dumpStream
      A reference to the dump stream.
      protected ImmutableBinaryTrie<java.lang.CharSequence> intervalApproximator
      The in-memory data structure used to approximate intervals..
      protected boolean iteratorIsUsable
      If true, the creation of the last DumpStreamIterator was not followed by a call to any get method.
      protected boolean selfContained
      Whether this map is self-contained.
      static long serialVersionUID  
      protected long size
      The number of terms in this map.
      static int STD_BLOCK_SIZE
      The standard block size (in bytes).
      protected char[] symbol2char
      A map (given by an array) from symbols in the coder to characters.
      • Fields inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defRetValue
    • Constructor Summary

      Constructors 
      Constructor Description
      ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms)
      Creates an external prefix map with block size STD_BLOCK_SIZE.
      ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms, int blockSizeInBytes)
      Creates an external prefix map with specified block size.
      ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms, int blockSizeInBytes, java.lang.CharSequence dumpStreamFilename)
      Creates an external prefix map with specified block size and dump stream.
      ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms, java.lang.CharSequence dumpStreamFilename)
      Creates an external prefix map with block size STD_BLOCK_SIZE and specified dump stream.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean containsKey​(java.lang.Object term)  
      LongInterval getInterval​(java.lang.CharSequence prefix)
      Returns the range of strings having a given prefix.
      long getLong​(java.lang.Object o)  
      protected MutableString getTerm​(long index, MutableString s)
      Writes a string specified by index into a MutableString.
      it.unimi.dsi.fastutil.objects.ObjectIterator<java.lang.CharSequence> iterator()
      Returns an iterator over the map.
      static void main​(java.lang.String[] arg)  
      void setDumpStream​(InputBitStream dumpStream)
      Sets the dump stream of this external prefix map to a given input bit stream.
      void setDumpStream​(java.lang.CharSequence dumpStreamFilename)
      Sets the dump stream of this external prefix map to a given filename.
      long size64()
      Returns the intended number of keys in this function, or -1 if no such number exists.
      • Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction

        defaultReturnValue, defaultReturnValue
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface it.unimi.dsi.fastutil.Function

        apply, clear
      • Methods inherited from interface java.util.function.Function

        compose
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.Object2LongFunction

        andThen, andThenByte, andThenChar, andThenDouble, andThenFloat, andThenInt, andThenLong, andThenObject, andThenReference, andThenShort, applyAsLong, composeByte, composeChar, composeDouble, composeFloat, composeInt, composeLong, composeObject, composeReference, composeShort, defaultReturnValue, defaultReturnValue, get, getOrDefault, getOrDefault, put, put, remove, removeLong
      • Methods inherited from interface it.unimi.dsi.fastutil.Size64

        size
      • Methods inherited from interface it.unimi.dsi.big.util.StringMap

        size
    • Field Detail

      • STD_BLOCK_SIZE

        public static final int STD_BLOCK_SIZE
        The standard block size (in bytes).
        See Also:
        Constant Field Values
      • intervalApproximator

        protected final ImmutableBinaryTrie<java.lang.CharSequence> intervalApproximator
        The in-memory data structure used to approximate intervals..
      • blockSize

        protected final long blockSize
        The block size of this (in bits).
      • decoder

        protected final Decoder decoder
        A decoder used to read data from the dump stream.
      • symbol2char

        protected final char[] symbol2char
        A map (given by an array) from symbols in the coder to characters.
      • char2symbol

        protected final it.unimi.dsi.fastutil.chars.Char2IntOpenHashMap char2symbol
        A map from characters to symbols of the coder.
      • size

        protected final long size
        The number of terms in this map.
      • blockStart

        protected final long[][] blockStart
        The index of the first word in each block, plus an additional entry containing Function.size().
      • blockOffset

        protected final long[][] blockOffset
        A big array array parallel to blockStart giving the offset in blocks in the dump file of the corresponding word in blockStart. If there are no overflows, this will just be an initial segment of the natural numbers, but overflows cause jumps.
      • selfContained

        protected final boolean selfContained
        Whether this map is self-contained.
      • iteratorIsUsable

        protected transient boolean iteratorIsUsable
        If true, the creation of the last DumpStreamIterator was not followed by a call to any get method.
      • dumpStream

        protected transient InputBitStream dumpStream
        A reference to the dump stream.
    • Constructor Detail

      • ImmutableExternalPrefixMap

        public ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms,
                                          int blockSizeInBytes,
                                          java.lang.CharSequence dumpStreamFilename)
                                   throws java.io.IOException
        Creates an external prefix map with specified block size and dump stream.

        This constructor does not assume that CharSequence instances returned by terms.iterator() will be distinct. Thus, it can be safely used with FileLinesMutableStringIterable.

        Parameters:
        terms - an iterable whose iterator will enumerate in lexicographical order the terms for the map.
        blockSizeInBytes - the block size (in bytes).
        dumpStreamFilename - the name of the dump stream, or null for a self-contained map.
        Throws:
        java.io.IOException
      • ImmutableExternalPrefixMap

        public ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms,
                                          java.lang.CharSequence dumpStreamFilename)
                                   throws java.io.IOException
        Creates an external prefix map with block size STD_BLOCK_SIZE and specified dump stream.

        This constructor does not assume that CharSequence instances returned by terms.iterator() will be distinct. Thus, it can be safely used with FileLinesMutableStringIterable.

        Parameters:
        terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.
        dumpStreamFilename - the name of the dump stream, or null for a self-contained map.
        Throws:
        java.io.IOException
      • ImmutableExternalPrefixMap

        public ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms,
                                          int blockSizeInBytes)
                                   throws java.io.IOException
        Creates an external prefix map with specified block size.

        This constructor does not assume that CharSequence instances returned by terms.iterator() will be distinct. Thus, it can be safely used with FileLinesMutableStringIterable.

        Parameters:
        blockSizeInBytes - the block size (in bytes).
        terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.
        Throws:
        java.io.IOException
      • ImmutableExternalPrefixMap

        public ImmutableExternalPrefixMap​(java.lang.Iterable<? extends java.lang.CharSequence> terms)
                                   throws java.io.IOException
        Creates an external prefix map with block size STD_BLOCK_SIZE.

        This constructor does not assume that strings returned by terms.iterator() will be distinct. Thus, it can be safely used with FileLinesMutableStringIterable.

        Parameters:
        terms - a collection whose iterator will enumerate in lexicographical order the terms for the map.
        Throws:
        java.io.IOException
    • Method Detail

      • setDumpStream

        public void setDumpStream​(java.lang.CharSequence dumpStreamFilename)
                           throws java.io.FileNotFoundException
        Sets the dump stream of this external prefix map to a given filename.

        This method sets the dump file used by this map, and should be only called after deserialisation, providing exactly the file generated at creation time. Essentially anything can happen if you do not follow the rules.

        Note that this method will attempt to close the old stream, if present.

        Parameters:
        dumpStreamFilename - the name of the dump file.
        Throws:
        java.io.FileNotFoundException
        See Also:
        setDumpStream(InputBitStream)
      • setDumpStream

        public void setDumpStream​(InputBitStream dumpStream)
        Sets the dump stream of this external prefix map to a given input bit stream.

        This method sets the dump file used by this map, and should be only called after deserialisation, providing a repositionable stream containing exactly the file generated at creation time. Essentially anything can happen if you do not follow the rules.

        Using this method you can load an external prefix map in core memory, enjoying the compactness of the data structure, but getting much more speed.

        Note that this method will attemp to close the old stream, if present.

        Parameters:
        dumpStream - a repositionable input bit stream containing exactly the dump stream generated at creation time.
        See Also:
        setDumpStream(CharSequence)
      • getInterval

        public LongInterval getInterval​(java.lang.CharSequence prefix)
        Description copied from class: AbstractPrefixMap
        Returns the range of strings having a given prefix.
        Specified by:
        getInterval in class AbstractPrefixMap
        Parameters:
        prefix - a prefix.
        Returns:
        the corresponding range of strings as an interval.
      • containsKey

        public boolean containsKey​(java.lang.Object term)
        Specified by:
        containsKey in interface it.unimi.dsi.fastutil.Function<java.lang.CharSequence,​java.lang.Long>
      • getLong

        public long getLong​(java.lang.Object o)
        Specified by:
        getLong in interface it.unimi.dsi.fastutil.objects.Object2LongFunction<java.lang.CharSequence>
      • iterator

        public it.unimi.dsi.fastutil.objects.ObjectIterator<java.lang.CharSequence> iterator()
        Returns an iterator over the map.

        The iterator returned by this method scans directly the dump stream.

        Note that the returned iterator uses the same stream as all get methods. Calling such methods while the iterator is being used will produce an IllegalStateException.

        Returns:
        an iterator over the map that just scans the dump stream.
      • size64

        public long size64()
        Description copied from interface: StringMap
        Returns the intended number of keys in this function, or -1 if no such number exists.

        Most function implementations will have some knowledge of the intended number of keys in their domain. In some cases, however, this might not be possible. This default implementation, in particular, returns -1.

        Specified by:
        size64 in interface it.unimi.dsi.fastutil.Size64
        Specified by:
        size64 in interface StringMap<MutableString>
        Returns:
        the intended number of keys in this function, or -1 if that number is not available.
      • main

        public static void main​(java.lang.String[] arg)
                         throws java.lang.ClassNotFoundException,
                                java.io.IOException,
                                com.martiansoftware.jsap.JSAPException,
                                java.lang.SecurityException,
                                java.lang.NoSuchMethodException
        Throws:
        java.lang.ClassNotFoundException
        java.io.IOException
        com.martiansoftware.jsap.JSAPException
        java.lang.SecurityException
        java.lang.NoSuchMethodException