Class PermutedFrontCodedStringList

  • All Implemented Interfaces:
    it.unimi.dsi.fastutil.objects.ObjectCollection<java.lang.CharSequence>, it.unimi.dsi.fastutil.objects.ObjectIterable<java.lang.CharSequence>, it.unimi.dsi.fastutil.objects.ObjectList<java.lang.CharSequence>, it.unimi.dsi.fastutil.Stack<java.lang.CharSequence>, java.io.Serializable, java.lang.Comparable<java.util.List<? extends java.lang.CharSequence>>, java.lang.Iterable<java.lang.CharSequence>, java.util.Collection<java.lang.CharSequence>, java.util.List<java.lang.CharSequence>

    public class PermutedFrontCodedStringList
    extends it.unimi.dsi.fastutil.objects.AbstractObjectList<java.lang.CharSequence>
    implements java.io.Serializable
    A FrontCodedStringList whose indices are permuted.

    It may happen that a list of strings compresses very well using front coding, but unfortunately alphabetical order is not the right order for the strings in the list. Instances of this class wrap an instance of FrontCodedStringList together with a permutation π: inquiries with index i will actually return the string with index πi.

    In case you start from a newline-delimited non-sorted list of UTF-8 strings, the simplest way to build an instance of this map is obtaining a front-coded string list and a permutation with a simple UN*X pipe (which also avoids storing the sorted strings):

     nl -v0 -nln | sort -k2 | tee >(cut -f1 >perm.txt) \
            | cut -f2 | java it.unimi.dsi.util.FrontCodedStringList tmp-lex.fcl
     
    The above command will read a list of strings from standard input, output a their sorted index list in perm.txt and create a tmp-lex.fcl front-coded string list containing the sorted list of strings.

    Important: you must be sure to be using the byte-by-byte collation order—in UN*X, be sure that LC_COLLATE=C. Failure to do so will result in an order-of-magnitude-slower sorting and worse compression.

    Now, in perm.txt you will find the permutation that you have to pass to this class (given that you will use the option -i). So the last step is just

     java it.unimi.dsi.util.PermutedFrontCodedStringList -i -t tmp-lex.fcl perm.txt your.fcl
     
    See Also:
    Serialized Form
    • Nested Class Summary

      • Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

        it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectRandomAccessSubList<K extends java.lang.Object>, it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectSubList<K extends java.lang.Object>
    • Constructor Summary

      Constructors 
      Constructor Description
      PermutedFrontCodedStringList​(FrontCodedStringList frontCodedStringList, int[] permutation)
      Creates a new permuted front-coded string list using a given front-coded string list and permutation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      MutableString get​(int index)  
      void get​(int index, MutableString s)
      Returns the element at the specified position in this front-coded list by storing it in a mutable string.
      it.unimi.dsi.fastutil.objects.ObjectListIterator<java.lang.CharSequence> listIterator​(int k)  
      static void main​(java.lang.String[] arg)  
      int size()  
      • Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

        add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
      • Methods inherited from class java.util.AbstractCollection

        containsAll, isEmpty, remove, removeAll, retainAll
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
      • Methods inherited from interface java.util.Collection

        parallelStream, removeIf, stream, toArray
      • Methods inherited from interface java.util.List

        containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection

        spliterator
      • Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList

        addAll, addAll, setElements, setElements, sort, spliterator, unstableSort
      • Methods inherited from interface it.unimi.dsi.fastutil.Stack

        isEmpty
    • Field Detail

      • frontCodedStringList

        protected final FrontCodedStringList frontCodedStringList
        The underlying front-coded string list.
      • permutation

        protected final int[] permutation
        The permutation.
    • Constructor Detail

      • PermutedFrontCodedStringList

        public PermutedFrontCodedStringList​(FrontCodedStringList frontCodedStringList,
                                            int[] permutation)
        Creates a new permuted front-coded string list using a given front-coded string list and permutation.
        Parameters:
        frontCodedStringList - the underlying front-coded string list.
        permutation - the underlying permutation.
    • Method Detail

      • get

        public MutableString get​(int index)
        Specified by:
        get in interface java.util.List<java.lang.CharSequence>
      • get

        public void get​(int index,
                        MutableString s)
        Returns the element at the specified position in this front-coded list by storing it in a mutable string.
        Parameters:
        index - an index in the list.
        s - a mutable string that will contain the string at the specified position.
      • size

        public int size()
        Specified by:
        size in interface java.util.Collection<java.lang.CharSequence>
        Specified by:
        size in interface java.util.List<java.lang.CharSequence>
        Specified by:
        size in class java.util.AbstractCollection<java.lang.CharSequence>
      • listIterator

        public it.unimi.dsi.fastutil.objects.ObjectListIterator<java.lang.CharSequence> listIterator​(int k)
        Specified by:
        listIterator in interface java.util.List<java.lang.CharSequence>
        Specified by:
        listIterator in interface it.unimi.dsi.fastutil.objects.ObjectList<java.lang.CharSequence>
        Overrides:
        listIterator in class it.unimi.dsi.fastutil.objects.AbstractObjectList<java.lang.CharSequence>
      • main

        public static void main​(java.lang.String[] arg)
                         throws java.io.IOException,
                                java.lang.ClassNotFoundException,
                                com.martiansoftware.jsap.JSAPException
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException
        com.martiansoftware.jsap.JSAPException