Class PermutedFrontCodedStringList

java.lang.Object
java.util.AbstractCollection<CharSequence>
it.unimi.dsi.fastutil.objects.AbstractObjectCollection<CharSequence>
it.unimi.dsi.fastutil.objects.AbstractObjectList<CharSequence>
it.unimi.dsi.util.PermutedFrontCodedStringList
All Implemented Interfaces:
it.unimi.dsi.fastutil.objects.ObjectCollection<CharSequence>, it.unimi.dsi.fastutil.objects.ObjectIterable<CharSequence>, it.unimi.dsi.fastutil.objects.ObjectList<CharSequence>, it.unimi.dsi.fastutil.Stack<CharSequence>, Serializable, Comparable<List<? extends CharSequence>>, Iterable<CharSequence>, Collection<CharSequence>, List<CharSequence>, SequencedCollection<CharSequence>

public class PermutedFrontCodedStringList extends it.unimi.dsi.fastutil.objects.AbstractObjectList<CharSequence> implements Serializable
A FrontCodedStringList whose indices are permuted.

It may happen that a list of strings compresses very well using front coding, but unfortunately alphabetical order is not the right order for the strings in the list. Instances of this class wrap an instance of FrontCodedStringList together with a permutation π: inquiries with index i will actually return the string with index πi.

In case you start from a newline-delimited non-sorted list of UTF-8 strings, the simplest way to build an instance of this map is obtaining a front-coded string list and a permutation with a simple UN*X pipe (which also avoids storing the sorted strings):

 nl -v0 -nln | sort -k2 | tee >(cut -f1 >perm.txt) \
        | cut -f2 | java it.unimi.dsi.util.FrontCodedStringList tmp-lex.fcl
 
The above command will read a list of strings from standard input, output a their sorted index list in perm.txt and create a tmp-lex.fcl front-coded string list containing the sorted list of strings.

Important: you must be sure to be using the byte-by-byte collation order—in UN*X, be sure that LC_COLLATE=C. Failure to do so will result in an order-of-magnitude-slower sorting and worse compression.

Now, in perm.txt you will find the permutation that you have to pass to this class (given that you will use the option -i). So the last step is just

 java it.unimi.dsi.util.PermutedFrontCodedStringList -i -t tmp-lex.fcl perm.txt your.fcl
 
See Also:
  • Nested Class Summary

    Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

    it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectRandomAccessSubList<K>, it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectSubList<K>
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected final FrontCodedStringList
    The underlying front-coded string list.
    protected final int[]
    The permutation.
    static final long
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList, int[] permutation)
    Creates a new permuted front-coded string list using a given front-coded string list and permutation.
  • Method Summary

    Modifier and Type
    Method
    Description
    get(int index)
     
    void
    get(int index, MutableString s)
    Returns the element at the specified position in this front-coded list by storing it in a mutable string.
    it.unimi.dsi.fastutil.objects.ObjectListIterator<CharSequence>
    listIterator(int k)
     
    static void
    main(String[] arg)
     
    int
     

    Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

    add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString

    Methods inherited from class java.util.AbstractCollection

    containsAll, isEmpty, remove, removeAll, retainAll

    Methods inherited from class java.lang.Object

    clone, finalize, getClass, notify, notifyAll, wait, wait, wait

    Methods inherited from interface java.util.Collection

    parallelStream, removeIf, stream, toArray

    Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection

    spliterator

    Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectList

    addAll, addAll, setElements, setElements, sort, spliterator, unstableSort

    Methods inherited from interface it.unimi.dsi.fastutil.Stack

    isEmpty
  • Field Details

    • serialVersionUID

      public static final long serialVersionUID
      See Also:
    • frontCodedStringList

      protected final FrontCodedStringList frontCodedStringList
      The underlying front-coded string list.
    • permutation

      protected final int[] permutation
      The permutation.
  • Constructor Details

    • PermutedFrontCodedStringList

      public PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList, int[] permutation)
      Creates a new permuted front-coded string list using a given front-coded string list and permutation.
      Parameters:
      frontCodedStringList - the underlying front-coded string list.
      permutation - the underlying permutation.
  • Method Details