Class FrontCodedStringList
- java.lang.Object
-
- java.util.AbstractCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectCollection<K>
-
- it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
-
- it.unimi.dsi.util.FrontCodedStringList
-
- All Implemented Interfaces:
it.unimi.dsi.fastutil.objects.ObjectCollection<MutableString>
,it.unimi.dsi.fastutil.objects.ObjectIterable<MutableString>
,it.unimi.dsi.fastutil.objects.ObjectList<MutableString>
,it.unimi.dsi.fastutil.Stack<MutableString>
,java.io.Serializable
,java.lang.Comparable<java.util.List<? extends MutableString>>
,java.lang.Iterable<MutableString>
,java.util.Collection<MutableString>
,java.util.List<MutableString>
,java.util.RandomAccess
public class FrontCodedStringList extends it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString> implements java.util.RandomAccess, java.io.Serializable
Compact storage of strings using front-coding compression (also known as compression by prefix omission).This class stores a list of strings using front-coding (also known as prefix-omission) compression; the compression will be reasonable only if the list is sorted, but you could also use instances of this class just as a handy way to manage a large amount of strings. It implements an immutable
ObjectList
that returns the i-th string (as aMutableString
) when theget(int)
method is called with argument i. The returned mutable string may be freely modified.As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated strings, and writes a corresponding serialized front-coded string list.
Implementation Details
To store the list of strings, we use either a UTF-8 coded
ByteArrayFrontCodedList
, or aCharArrayFrontCodedList
, depending on the value of theutf8
parameter at creation time. In the first case, if the strings are ASCII-oriented the resulting array will be much smaller, but access times will increase manifold, as each string must be UTF-8 decoded before being returned.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description protected it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList
byteFrontCodedList
The underlyingByteArrayFrontCodedList
, ornull
.protected it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList
charFrontCodedList
The underlyingCharArrayFrontCodedList
, ornull
.static long
serialVersionUID
protected boolean
utf8
Whether this front-coded list is UTF-8 encoded.
-
Constructor Summary
Constructors Constructor Description FrontCodedStringList(java.util.Collection<? extends java.lang.CharSequence> c, int ratio, boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection.FrontCodedStringList(java.util.Iterator<? extends java.lang.CharSequence> words, int ratio, boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static char[]
byte2Char(byte[] a, char[] s)
protected static int
countUTF8Chars(byte[] a)
MutableString
get(int index)
Returns the element at the specified position in this front-coded string list as a mutable string.void
get(int index, MutableString s)
Returns the element at the specified position in this front-coded string list by storing it in a mutable string.it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString>
listIterator(int k)
static void
main(java.lang.String[] arg)
int
ratio()
Returns the ratio of the underlying front-coded list.int
size()
boolean
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.-
Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, clear, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, forEach, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, peek, pop, push, remove, removeElements, set, setElements, size, subList, toArray, toArray, top, toString
-
Methods inherited from class java.util.AbstractCollection
containsAll, isEmpty, remove, removeAll, retainAll
-
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface java.util.List
containsAll, isEmpty, remove, removeAll, replaceAll, retainAll
-
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
- See Also:
- Constant Field Values
-
byteFrontCodedList
protected final it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList byteFrontCodedList
The underlyingByteArrayFrontCodedList
, ornull
.
-
charFrontCodedList
protected final it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList charFrontCodedList
The underlyingCharArrayFrontCodedList
, ornull
.
-
utf8
protected final boolean utf8
Whether this front-coded list is UTF-8 encoded.
-
-
Constructor Detail
-
FrontCodedStringList
public FrontCodedStringList(java.util.Iterator<? extends java.lang.CharSequence> words, int ratio, boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator.- Parameters:
words
- an iterator returning character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.
-
FrontCodedStringList
public FrontCodedStringList(java.util.Collection<? extends java.lang.CharSequence> c, int ratio, boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection.- Parameters:
c
- a collection containing character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.
-
-
Method Detail
-
utf8
public boolean utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.- Returns:
- true if this front-coded string list is keeping its data as an array of UTF-8 encoded bytes.
-
ratio
public int ratio()
Returns the ratio of the underlying front-coded list.- Returns:
- the ratio of the underlying front-coded list.
-
get
public MutableString get(int index)
Returns the element at the specified position in this front-coded string list as a mutable string.- Specified by:
get
in interfacejava.util.List<MutableString>
- Parameters:
index
- an index in the list.- Returns:
- a
MutableString
that will contain the string at the specified position. The string may be freely modified.
-
get
public void get(int index, MutableString s)
Returns the element at the specified position in this front-coded string list by storing it in a mutable string.- Parameters:
index
- an index in the list.s
- a mutable string that will contain the string at the specified position.
-
countUTF8Chars
protected static int countUTF8Chars(byte[] a)
-
byte2Char
protected static char[] byte2Char(byte[] a, char[] s)
-
listIterator
public it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> listIterator(int k)
- Specified by:
listIterator
in interfacejava.util.List<MutableString>
- Specified by:
listIterator
in interfaceit.unimi.dsi.fastutil.objects.ObjectList<MutableString>
- Overrides:
listIterator
in classit.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
-
size
public int size()
- Specified by:
size
in interfacejava.util.Collection<MutableString>
- Specified by:
size
in interfacejava.util.List<MutableString>
- Specified by:
size
in classjava.util.AbstractCollection<MutableString>
-
main
public static void main(java.lang.String[] arg) throws java.io.IOException, com.martiansoftware.jsap.JSAPException, java.lang.NoSuchMethodException
- Throws:
java.io.IOException
com.martiansoftware.jsap.JSAPException
java.lang.NoSuchMethodException
-
-