Class FrontCodedStringList

  • All Implemented Interfaces:
    it.unimi.dsi.fastutil.objects.ObjectCollection<MutableString>, it.unimi.dsi.fastutil.objects.ObjectIterable<MutableString>, it.unimi.dsi.fastutil.objects.ObjectList<MutableString>, it.unimi.dsi.fastutil.Stack<MutableString>, java.io.Serializable, java.lang.Comparable<java.util.List<? extends MutableString>>, java.lang.Iterable<MutableString>, java.util.Collection<MutableString>, java.util.List<MutableString>, java.util.RandomAccess

    public class FrontCodedStringList
    extends it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
    implements java.util.RandomAccess, java.io.Serializable
    Compact storage of strings using front-coding compression (also known as compression by prefix omission).

    This class stores a list of strings using front-coding (also known as prefix-omission) compression; the compression will be reasonable only if the list is sorted, but you could also use instances of this class just as a handy way to manage a large amount of strings. It implements an immutable ObjectList that returns the i-th string (as a MutableString) when the get(int) method is called with argument i. The returned mutable string may be freely modified.

    As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.

    Implementation Details

    To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on the value of the utf8 parameter at creation time. In the first case, if the strings are ASCII-oriented the resulting array will be much smaller, but access times will increase manifold, as each string must be UTF-8 decoded before being returned.

    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

        it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectRandomAccessSubList<K extends java.lang.Object>, it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectSubList<K extends java.lang.Object>
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList byteFrontCodedList
      The underlying ByteArrayFrontCodedList, or null.
      protected it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList charFrontCodedList
      The underlying CharArrayFrontCodedList, or null.
      static long serialVersionUID  
      protected boolean utf8
      Whether this front-coded list is UTF-8 encoded.
    • Constructor Summary

      Constructors 
      Constructor Description
      FrontCodedStringList​(java.util.Collection<? extends java.lang.CharSequence> c, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences contained in the given collection.
      FrontCodedStringList​(java.util.Iterator<? extends java.lang.CharSequence> words, int ratio, boolean utf8)
      Creates a new front-coded string list containing the character sequences returned by the given iterator.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected static char[] byte2Char​(byte[] a, char[] s)  
      protected static int countUTF8Chars​(byte[] a)  
      MutableString get​(int index)
      Returns the element at the specified position in this front-coded string list as a mutable string.
      void get​(int index, MutableString s)
      Returns the element at the specified position in this front-coded string list by storing it in a mutable string.
      it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> listIterator​(int k)  
      static void main​(java.lang.String[] arg)  
      int ratio()
      Returns the ratio of the underlying front-coded list.
      int size()  
      boolean utf8()
      Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
      • Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

        add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
      • Methods inherited from class java.util.AbstractCollection

        containsAll, isEmpty, remove, removeAll, retainAll
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.util.Collection

        parallelStream, removeIf, stream, toArray
      • Methods inherited from interface java.util.List

        containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection

        spliterator
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList

        addAll, addAll, setElements, setElements, sort, spliterator, unstableSort
      • Methods inherited from interface it.unimi.dsi.fastutil.Stack

        isEmpty
    • Field Detail

      • byteFrontCodedList

        protected final it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList byteFrontCodedList
        The underlying ByteArrayFrontCodedList, or null.
      • charFrontCodedList

        protected final it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList charFrontCodedList
        The underlying CharArrayFrontCodedList, or null.
      • utf8

        protected final boolean utf8
        Whether this front-coded list is UTF-8 encoded.
    • Constructor Detail

      • FrontCodedStringList

        public FrontCodedStringList​(java.util.Iterator<? extends java.lang.CharSequence> words,
                                    int ratio,
                                    boolean utf8)
        Creates a new front-coded string list containing the character sequences returned by the given iterator.
        Parameters:
        words - an iterator returning character sequences.
        ratio - the desired ratio.
        utf8 - if true, the strings will be stored as UTF-8 byte arrays.
      • FrontCodedStringList

        public FrontCodedStringList​(java.util.Collection<? extends java.lang.CharSequence> c,
                                    int ratio,
                                    boolean utf8)
        Creates a new front-coded string list containing the character sequences contained in the given collection.
        Parameters:
        c - a collection containing character sequences.
        ratio - the desired ratio.
        utf8 - if true, the strings will be stored as UTF-8 byte arrays.
    • Method Detail

      • utf8

        public boolean utf8()
        Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
        Returns:
        true if this front-coded string list is keeping its data as an array of UTF-8 encoded bytes.
      • ratio

        public int ratio()
        Returns the ratio of the underlying front-coded list.
        Returns:
        the ratio of the underlying front-coded list.
      • get

        public MutableString get​(int index)
        Returns the element at the specified position in this front-coded string list as a mutable string.
        Specified by:
        get in interface java.util.List<MutableString>
        Parameters:
        index - an index in the list.
        Returns:
        a MutableString that will contain the string at the specified position. The string may be freely modified.
      • get

        public void get​(int index,
                        MutableString s)
        Returns the element at the specified position in this front-coded string list by storing it in a mutable string.
        Parameters:
        index - an index in the list.
        s - a mutable string that will contain the string at the specified position.
      • countUTF8Chars

        protected static int countUTF8Chars​(byte[] a)
      • byte2Char

        protected static char[] byte2Char​(byte[] a,
                                          char[] s)
      • listIterator

        public it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> listIterator​(int k)
        Specified by:
        listIterator in interface java.util.List<MutableString>
        Specified by:
        listIterator in interface it.unimi.dsi.fastutil.objects.ObjectList<MutableString>
        Overrides:
        listIterator in class it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
      • size

        public int size()
        Specified by:
        size in interface java.util.Collection<MutableString>
        Specified by:
        size in interface java.util.List<MutableString>
        Specified by:
        size in class java.util.AbstractCollection<MutableString>
      • main

        public static void main​(java.lang.String[] arg)
                         throws java.io.IOException,
                                com.martiansoftware.jsap.JSAPException,
                                java.lang.NoSuchMethodException
        Throws:
        java.io.IOException
        com.martiansoftware.jsap.JSAPException
        java.lang.NoSuchMethodException