Class PermutedFrontCodedStringList
- java.lang.Object
-
- java.util.AbstractCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectList<java.lang.CharSequence>
-
- it.unimi.dsi.util.PermutedFrontCodedStringList
-
- All Implemented Interfaces:
it.unimi.dsi.fastutil.objects.ObjectCollection<java.lang.CharSequence>
,it.unimi.dsi.fastutil.objects.ObjectIterable<java.lang.CharSequence>
,it.unimi.dsi.fastutil.objects.ObjectList<java.lang.CharSequence>
,it.unimi.dsi.fastutil.Stack<java.lang.CharSequence>
,java.io.Serializable
,java.lang.Comparable<java.util.List<? extends java.lang.CharSequence>>
,java.lang.Iterable<java.lang.CharSequence>
,java.util.Collection<java.lang.CharSequence>
,java.util.List<java.lang.CharSequence>
public class PermutedFrontCodedStringList extends it.unimi.dsi.fastutil.objects.AbstractObjectList<java.lang.CharSequence> implements java.io.Serializable
AFrontCodedStringList
whose indices are permuted.It may happen that a list of strings compresses very well using front coding, but unfortunately alphabetical order is not the right order for the strings in the list. Instances of this class wrap an instance of
FrontCodedStringList
together with a permutation π: inquiries with index i will actually return the string with index πi.In case you start from a newline-delimited non-sorted list of UTF-8 strings, the simplest way to build an instance of this map is obtaining a front-coded string list and a permutation with a simple UN*X pipe (which also avoids storing the sorted strings):
nl -v0 -nln | sort -k2 | tee >(cut -f1 >perm.txt) \ | cut -f2 | java it.unimi.dsi.util.FrontCodedStringList tmp-lex.fcl
The above command will read a list of strings from standard input, output a their sorted index list inperm.txt
and create atmp-lex.fcl
front-coded string list containing the sorted list of strings.Important: you must be sure to be using the byte-by-byte collation order—in UN*X, be sure that
LC_COLLATE=C
. Failure to do so will result in an order-of-magnitude-slower sorting and worse compression.Now, in
perm.txt
you will find the permutation that you have to pass to this class (given that you will use the option-i
). So the last step is justjava it.unimi.dsi.util.PermutedFrontCodedStringList -i -t tmp-lex.fcl perm.txt your.fcl
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected FrontCodedStringList
frontCodedStringList
The underlying front-coded string list.protected int[]
permutation
The permutation.static long
serialVersionUID
-
Constructor Summary
Constructors Constructor Description PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList, int[] permutation)
Creates a new permuted front-coded string list using a given front-coded string list and permutation.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description MutableString
get(int index)
void
get(int index, MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.it.unimi.dsi.fastutil.objects.ObjectListIterator<java.lang.CharSequence>
listIterator(int k)
static void
main(java.lang.String[] arg)
int
size()
-
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
-
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.util.List
containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
-
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
- See Also:
- Constant Field Values
-
frontCodedStringList
protected final FrontCodedStringList frontCodedStringList
The underlying front-coded string list.
-
permutation
protected final int[] permutation
The permutation.
-
-
Constructor Detail
-
PermutedFrontCodedStringList
public PermutedFrontCodedStringList(FrontCodedStringList frontCodedStringList, int[] permutation)
Creates a new permuted front-coded string list using a given front-coded string list and permutation.- Parameters:
frontCodedStringList
- the underlying front-coded string list.permutation
- the underlying permutation.
-
-
Method Detail
-
get
public MutableString get(int index)
- Specified by:
get
in interfacejava.util.List<java.lang.CharSequence>
-
get
public void get(int index, MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.- Parameters:
index
- an index in the list.s
- a mutable string that will contain the string at the specified position.
-
size
public int size()
- Specified by:
size
in interfacejava.util.Collection<java.lang.CharSequence>
- Specified by:
size
in interfacejava.util.List<java.lang.CharSequence>
- Specified by:
size
in classjava.util.AbstractCollection<java.lang.CharSequence>
-
listIterator
public it.unimi.dsi.fastutil.objects.ObjectListIterator<java.lang.CharSequence> listIterator(int k)
- Specified by:
listIterator
in interfacejava.util.List<java.lang.CharSequence>
- Specified by:
listIterator
in interfaceit.unimi.dsi.fastutil.objects.ObjectList<java.lang.CharSequence>
- Overrides:
listIterator
in classit.unimi.dsi.fastutil.objects.AbstractObjectList<java.lang.CharSequence>
-
main
public static void main(java.lang.String[] arg) throws java.io.IOException, java.lang.ClassNotFoundException, com.martiansoftware.jsap.JSAPException
- Throws:
java.io.IOException
java.lang.ClassNotFoundException
com.martiansoftware.jsap.JSAPException
-
-