Package morfologik.fsa.builders
Class CFSA2Serializer
- java.lang.Object
-
- morfologik.fsa.builders.CFSA2Serializer
-
- All Implemented Interfaces:
FSASerializer
public final class CFSA2Serializer extends java.lang.Object implements FSASerializer
Serializes in-memoryFSA
graphs toCFSA2
.It is possible to serialize the automaton with numbers required for perfect hashing. See
withNumbers()
method.- See Also:
CFSA2
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.EnumSet<FSAFlags>
flags
Supported flags.private byte[]
labelsIndex
The most frequent labels for integrating with the flags field.private int[]
labelsInvIndex
Inverted index of labels to be integrated with flags field.private java.util.logging.Logger
logger
private static int
NO_STATE
No-state id.private com.carrotsearch.hppc.IntIntHashMap
numbers
A hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMap
offsets
A hash map of [state, offset] pairs.private byte[]
scratch
Scratch array for serializing vints.private boolean
withNumbers
true
if we should serialize with numbers.
-
Constructor Summary
Constructors Constructor Description CFSA2Serializer()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private int[]
computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)
Compute the set of states that should be linearized first to minimize other states goto length.private com.carrotsearch.hppc.IntIntHashMap
computeInlinkCount(FSA fsa)
Compute in-link count for each state.private void
computeLabelsIndex(FSA fsa)
Compute a set of labels to be integrated with the flags field.private int
emitArc(java.io.OutputStream os, int flags, byte label, int targetOffset)
private int
emitNodeArcs(FSA fsa, java.io.OutputStream os, int state, int nextState)
Emit all arcs of a single node.private int
emitNodeData(java.io.OutputStream os, int number)
private int
emitNodes(FSA fsa, java.io.OutputStream os, com.carrotsearch.hppc.IntArrayList linearized)
Update arc offsets assuming the given goto length.java.util.Set<FSAFlags>
getFlags()
Return supported flags.private com.carrotsearch.hppc.IntArrayList
linearize(FSA fsa)
Linearization of states.private int
linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets)
Linearize all states, puttingstates
in front of the automaton and calculating stable state offsets.private void
linearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, java.util.BitSet visited, int node)
Add a state to linearized list.private void
log(java.util.logging.Level level, java.lang.String msg, java.lang.Object... args)
<T extends java.io.OutputStream>
Tserialize(FSA fsa, T os)
CFSA2Serializer
withAnnotationSeparator(byte annotationSeparator)
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).CFSA2Serializer
withFiller(byte filler)
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).CFSA2Serializer
withNumbers()
Serialize the automaton with the number of right-language sequences in each node.(package private) static int
writeVInt(byte[] array, int offset, int value)
Write a v-int to a byte array.
-
-
-
Field Detail
-
logger
private final java.util.logging.Logger logger
-
flags
private static final java.util.EnumSet<FSAFlags> flags
Supported flags.
-
NO_STATE
private static final int NO_STATE
No-state id.- See Also:
- Constant Field Values
-
withNumbers
private boolean withNumbers
true
if we should serialize with numbers.- See Also:
withNumbers()
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsets
A hash map of [state, offset] pairs.
-
numbers
private com.carrotsearch.hppc.IntIntHashMap numbers
A hash map of [state, right-language-count] pairs.
-
scratch
private final byte[] scratch
Scratch array for serializing vints.
-
labelsIndex
private byte[] labelsIndex
The most frequent labels for integrating with the flags field.
-
labelsInvIndex
private int[] labelsInvIndex
Inverted index of labels to be integrated with flags field. A label at indexi
has the index or zero (no integration).
-
-
Method Detail
-
withNumbers
public CFSA2Serializer withNumbers()
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbers
in interfaceFSASerializer
- Returns:
- Returns the same object for easier call chaining.
-
serialize
public <T extends java.io.OutputStream> T serialize(FSA fsa, T os) throws java.io.IOException
- Specified by:
serialize
in interfaceFSASerializer
- Type Parameters:
T
- A subclass ofOutputStream
, returned for chaining.- Parameters:
fsa
- The automaton to serialize.os
- The output stream to serialize to.- Returns:
- Returns
os
for chaining. - Throws:
java.io.IOException
- Rethrown if an I/O error occurs.- See Also:
withNumbers()
-
computeLabelsIndex
private void computeLabelsIndex(FSA fsa)
Compute a set of labels to be integrated with the flags field.
-
getFlags
public java.util.Set<FSAFlags> getFlags()
Return supported flags.- Specified by:
getFlags
in interfaceFSASerializer
- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
private com.carrotsearch.hppc.IntArrayList linearize(FSA fsa) throws java.io.IOException
Linearization of states.- Throws:
java.io.IOException
-
log
private void log(java.util.logging.Level level, java.lang.String msg, java.lang.Object... args)
-
linearizeAndCalculateOffsets
private int linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) throws java.io.IOException
Linearize all states, puttingstates
in front of the automaton and calculating stable state offsets.- Throws:
java.io.IOException
-
linearizeState
private void linearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, java.util.BitSet visited, int node)
Add a state to linearized list.
-
computeFirstStates
private int[] computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)
Compute the set of states that should be linearized first to minimize other states goto length.
-
computeInlinkCount
private com.carrotsearch.hppc.IntIntHashMap computeInlinkCount(FSA fsa)
Compute in-link count for each state.
-
emitNodes
private int emitNodes(FSA fsa, java.io.OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) throws java.io.IOException
Update arc offsets assuming the given goto length.- Throws:
java.io.IOException
-
emitNodeArcs
private int emitNodeArcs(FSA fsa, java.io.OutputStream os, int state, int nextState) throws java.io.IOException
Emit all arcs of a single node.- Throws:
java.io.IOException
-
emitArc
private int emitArc(java.io.OutputStream os, int flags, byte label, int targetOffset) throws java.io.IOException
- Throws:
java.io.IOException
-
emitNodeData
private int emitNodeData(java.io.OutputStream os, int number) throws java.io.IOException
- Throws:
java.io.IOException
-
withFiller
public CFSA2Serializer withFiller(byte filler)
Description copied from interface:FSASerializer
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withFiller
in interfaceFSASerializer
- Parameters:
filler
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
withAnnotationSeparator
public CFSA2Serializer withAnnotationSeparator(byte annotationSeparator)
Description copied from interface:FSASerializer
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withAnnotationSeparator
in interfaceFSASerializer
- Parameters:
annotationSeparator
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
writeVInt
static int writeVInt(byte[] array, int offset, int value)
Write a v-int to a byte array.
-
-