Package morfologik.fsa.builders
Class FSA5Serializer
- java.lang.Object
-
- morfologik.fsa.builders.FSA5Serializer
-
- All Implemented Interfaces:
FSASerializer
public final class FSA5Serializer extends java.lang.Object implements FSASerializer
Serializes in-memoryFSA
graphs to a binary format compatible with Jan Daciuk'sfsa
's packageFSA5
format.It is possible to serialize the automaton with numbers required for perfect hashing. See
withNumbers()
method.- See Also:
FSA5
,FSA.read(java.io.InputStream)
-
-
Field Summary
Fields Modifier and Type Field Description byte
annotationByte
byte
fillerByte
private static java.util.EnumSet<FSAFlags>
flags
Supported flags.private static int
MAX_ARC_SIZE
Maximum number of bytes for a serialized arc.private static int
MAX_NODE_DATA_SIZE
Maximum number of bytes for per-node data.private com.carrotsearch.hppc.IntIntHashMap
numbers
A hash map of [state, right-language-count] pairs.private com.carrotsearch.hppc.IntIntHashMap
offsets
A hash map of [state, offset] pairs.private static int
SIZEOF_FLAGS
Number of bytes for the arc's flags header (arc representation without the goto address).private boolean
withNumbers
true
if we should serialize with numbers.
-
Constructor Summary
Constructors Constructor Description FSA5Serializer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private int
emitArc(java.nio.ByteBuffer bb, java.io.OutputStream os, int gtl, int flags, byte label, int targetOffset)
private boolean
emitArcs(FSA fsa, java.io.OutputStream os, int[] linearized, int gtl, int nodeDataLength)
Update arc offsets assuming the given goto length.private int
emitNodeData(java.nio.ByteBuffer bb, java.io.OutputStream os, int nodeDataLength, int number)
java.util.Set<FSAFlags>
getFlags()
Return supported flags.private int[]
linearize(FSA fsa)
Linearization of states.<T extends java.io.OutputStream>
Tserialize(FSA fsa, T os)
Serialize root states
to an output stream inFSA5
format.FSA5Serializer
withAnnotationSeparator(byte annotationSeparator)
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).FSA5Serializer
withFiller(byte filler)
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).FSA5Serializer
withNumbers()
Serialize the automaton with the number of right-language sequences in each node.
-
-
-
Field Detail
-
MAX_ARC_SIZE
private static final int MAX_ARC_SIZE
Maximum number of bytes for a serialized arc.- See Also:
- Constant Field Values
-
MAX_NODE_DATA_SIZE
private static final int MAX_NODE_DATA_SIZE
Maximum number of bytes for per-node data.- See Also:
- Constant Field Values
-
SIZEOF_FLAGS
private static final int SIZEOF_FLAGS
Number of bytes for the arc's flags header (arc representation without the goto address).- See Also:
- Constant Field Values
-
flags
private static final java.util.EnumSet<FSAFlags> flags
Supported flags.
-
fillerByte
public byte fillerByte
- See Also:
FSA5.filler
-
annotationByte
public byte annotationByte
- See Also:
FSA5.annotation
-
withNumbers
private boolean withNumbers
true
if we should serialize with numbers.- See Also:
withNumbers()
-
offsets
private com.carrotsearch.hppc.IntIntHashMap offsets
A hash map of [state, offset] pairs.
-
numbers
private com.carrotsearch.hppc.IntIntHashMap numbers
A hash map of [state, right-language-count] pairs.
-
-
Method Detail
-
withNumbers
public FSA5Serializer withNumbers()
Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.- Specified by:
withNumbers
in interfaceFSASerializer
- Returns:
- Returns the same object for easier call chaining.
-
withFiller
public FSA5Serializer withFiller(byte filler)
Sets the filler separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withFiller
in interfaceFSASerializer
- Parameters:
filler
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
withAnnotationSeparator
public FSA5Serializer withAnnotationSeparator(byte annotationSeparator)
Sets the annotation separator (only ifFSASerializer.getFlags()
returnsFSAFlags.SEPARATORS
).- Specified by:
withAnnotationSeparator
in interfaceFSASerializer
- Parameters:
annotationSeparator
- The filler separator byte.- Returns:
- Returns
this
for call chaining.
-
serialize
public <T extends java.io.OutputStream> T serialize(FSA fsa, T os) throws java.io.IOException
Serialize root states
to an output stream inFSA5
format.- Specified by:
serialize
in interfaceFSASerializer
- Type Parameters:
T
- A subclass ofOutputStream
, returned for chaining.- Parameters:
fsa
- The automaton to serialize.os
- The output stream to serialize to.- Returns:
- Returns
os
for chaining. - Throws:
java.io.IOException
- Rethrown if an I/O error occurs.- See Also:
withNumbers()
-
getFlags
public java.util.Set<FSAFlags> getFlags()
Return supported flags.- Specified by:
getFlags
in interfaceFSASerializer
- Returns:
- Returns the set of flags supported by the serializer (and the output automaton).
-
linearize
private int[] linearize(FSA fsa)
Linearization of states.
-
emitArcs
private boolean emitArcs(FSA fsa, java.io.OutputStream os, int[] linearized, int gtl, int nodeDataLength) throws java.io.IOException
Update arc offsets assuming the given goto length.- Throws:
java.io.IOException
-
emitArc
private int emitArc(java.nio.ByteBuffer bb, java.io.OutputStream os, int gtl, int flags, byte label, int targetOffset) throws java.io.IOException
- Throws:
java.io.IOException
-
emitNodeData
private int emitNodeData(java.nio.ByteBuffer bb, java.io.OutputStream os, int nodeDataLength, int number) throws java.io.IOException
- Throws:
java.io.IOException
-
-