Package morfologik.fsa.builders
Class CFSA2Serializer
java.lang.Object
morfologik.fsa.builders.CFSA2Serializer
- All Implemented Interfaces:
FSASerializer
Serializes in-memory
FSA
graphs to CFSA2
.
It is possible to serialize the automaton with numbers required for perfect
hashing. See withNumbers()
method.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionSupported flags.private byte[]
The most frequent labels for integrating with the flags field.private int[]
Inverted index of labels to be integrated with flags field.private final Logger
private static final int
No-state id.private com.carrotsearch.hppc.IntIntHashMap
A hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMap
A hash map of [state, offset] pairs.private final byte[]
Scratch array for serializing vints.private boolean
true
if we should serialize with numbers. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate int[]
computeFirstStates
(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount) Compute the set of states that should be linearized first to minimize other states goto length.private com.carrotsearch.hppc.IntIntHashMap
computeInlinkCount
(FSA fsa) Compute in-link count for each state.private void
computeLabelsIndex
(FSA fsa) Compute a set of labels to be integrated with the flags field.private int
emitArc
(OutputStream os, int flags, byte label, int targetOffset) private int
emitNodeArcs
(FSA fsa, OutputStream os, int state, int nextState) Emit all arcs of a single node.private int
emitNodeData
(OutputStream os, int number) private int
emitNodes
(FSA fsa, OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) Update arc offsets assuming the given goto length.getFlags()
Return supported flags.private com.carrotsearch.hppc.IntArrayList
Linearization of states.private int
linearizeAndCalculateOffsets
(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) Linearize all states, puttingstates
in front of the automaton and calculating stable state offsets.private void
linearizeState
(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, BitSet visited, int node) Add a state to linearized list.private void
<T extends OutputStream>
TwithAnnotationSeparator
(byte annotationSeparator) Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).withFiller
(byte filler) Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).Serialize the automaton with the number of right-language sequences in each node.(package private) static int
writeVInt
(byte[] array, int offset, int value) Write a v-int to a byte array.
-
Field Details
-
logger
-
flags
Supported flags. -
NO_STATE
private static final int NO_STATENo-state id.- See Also:
-
withNumbers
private boolean withNumberstrue
if we should serialize with numbers.- See Also:
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsetsA hash map of [state, offset] pairs. -
numbers
private com.carrotsearch.hppc.IntIntHashMap numbersA hash map of [state, right-language-count] pairs. -
scratch
private final byte[] scratchScratch array for serializing vints. -
labelsIndex
private byte[] labelsIndexThe most frequent labels for integrating with the flags field. -
labelsInvIndex
private int[] labelsInvIndexInverted index of labels to be integrated with flags field. A label at indexi
has the index or zero (no integration).
-
-
Constructor Details
-
CFSA2Serializer
public CFSA2Serializer()
-
-
Method Details
-
withNumbers
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbers
in interfaceFSASerializer
- Returns:
- Returns the same object for easier call chaining.
-
serialize
- Specified by:
serialize
in interfaceFSASerializer
- Type Parameters:
T
- A subclass ofOutputStream
, returned for chaining.- Parameters:
fsa
- The automaton to serialize.os
- The output stream to serialize to.- Returns:
- Returns
os
for chaining. - Throws:
IOException
- Rethrown if an I/O error occurs.- See Also:
-
computeLabelsIndex
Compute a set of labels to be integrated with the flags field. -
getFlags
Return supported flags.- Specified by:
getFlags
in interfaceFSASerializer
- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
Linearization of states.- Throws:
IOException
-
log
-
linearizeAndCalculateOffsets
private int linearizeAndCalculateOffsets(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets) throws IOException Linearize all states, puttingstates
in front of the automaton and calculating stable state offsets.- Throws:
IOException
-
linearizeState
private void linearizeState(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, BitSet visited, int node) Add a state to linearized list. -
computeFirstStates
private int[] computeFirstStates(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount) Compute the set of states that should be linearized first to minimize other states goto length. -
computeInlinkCount
Compute in-link count for each state. -
emitNodes
private int emitNodes(FSA fsa, OutputStream os, com.carrotsearch.hppc.IntArrayList linearized) throws IOException Update arc offsets assuming the given goto length.- Throws:
IOException
-
emitNodeArcs
Emit all arcs of a single node.- Throws:
IOException
-
emitArc
- Throws:
IOException
-
emitNodeData
- Throws:
IOException
-
withFiller
Description copied from interface:FSASerializer
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withFiller
in interfaceFSASerializer
- Parameters:
filler
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
withAnnotationSeparator
Description copied from interface:FSASerializer
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withAnnotationSeparator
in interfaceFSASerializer
- Parameters:
annotationSeparator
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
writeVInt
static int writeVInt(byte[] array, int offset, int value) Write a v-int to a byte array.
-