Package morfologik.fsa
Class FSA5
- java.lang.Object
-
- morfologik.fsa.FSA
-
- morfologik.fsa.FSA5
-
- All Implemented Interfaces:
java.lang.Iterable<java.nio.ByteBuffer>
public final class FSA5 extends FSA
FSA binary format implementation for version 5.Version 5 indicates the dictionary was built with these flags:
FSAFlags.FLEXIBLE
,FSAFlags.STOPBIT
andFSAFlags.NEXTBIT
. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).---- this node header present only if automaton was compiled with NUMBERS option. Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | \ LSB +-+-+-+-+-+-+-+-+ + 1 | | | | | | | | | | number of strings recognized +-+-+-+-+-+-+-+-+ +----- by the automaton starting : : : : : : : : : | from this node. +-+-+-+-+-+-+-+-+ + ctl-1 | | | | | | | | | / MSB +-+-+-+-+-+-+-+-+/ ---- remaining part of the node Byte +-+-+-+-+-+-+-+-+\ 0 | | | | | | | | | +------ label +-+-+-+-+-+-+-+-+/ +------------- node pointed to is next | +----------- the last arc of the node | | +--------- the arc is final | | | +-----------+ | | | | | ___+___ | | | | / \ | | | | MSB LSB | 7 6 5 4 3 2 1 0 | +-+-+-+-+-+-+-+-+ | 1 | | | | | | | | | \ \ +-+-+-+-+-+-+-+-+ \ \ LSB +-+-+-+-+-+-+-+-+ + 2 | | | | | | | | | | +-+-+-+-+-+-+-+-+ | 3 | | | | | | | | | +----- target node address (in bytes) +-+-+-+-+-+-+-+-+ | (not present except for the byte : : : : : : : : : | with flags if the node pointed to +-+-+-+-+-+-+-+-+ + is next) gtl | | | | | | | | | / MSB +-+-+-+-+-+-+-+-+ / gtl+1 (gtl = gotoLength)
-
-
Field Summary
Fields Modifier and Type Field Description static int
ADDRESS_OFFSET
An offset in the arc structure, where the address and flags field begins.byte
annotation
Annotation character.byte[]
arcs
An array of bytes with the internal representation of the automaton.static int
BIT_FINAL_ARC
Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.static int
BIT_LAST_ARC
Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.static int
BIT_TARGET_NEXT
Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).static byte
DEFAULT_ANNOTATION
Default annotation byte.static byte
DEFAULT_FILLER
Default filler byte.byte
filler
Filler character.private java.util.Set<FSAFlags>
flags
Flags for this automaton version.int
gtl
Number of bytes each address takes in full, expanded form (goto length).int
nodeDataLength
The length of the node header structure (if the automaton was compiled withNUMBERS
option).static byte
VERSION
Automaton version as in the file header.
-
Constructor Summary
Constructors Constructor Description FSA5(java.io.InputStream stream)
Read and wrap a binary automaton in FSA version 5.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) static int
decodeFromBytes(byte[] arcs, int start, int n)
Returns an n-byte integer encoded in byte-packed representation.int
getArc(int node, byte label)
byte
getArcLabel(int arc)
(package private) int
getDestinationNodeOffset(int arc)
Returns the address of the node pointed to by this arc.int
getEndNode(int arc)
int
getFirstArc(int node)
java.util.Set<FSAFlags>
getFlags()
int
getNextArc(int arc)
int
getRightLanguageCount(int node)
Returns the number encoded at the given node.int
getRootNode()
Returns the start node of this automaton.boolean
isArcFinal(int arc)
boolean
isArcLast(int arc)
Returnstrue
if this arc hasNEXT
bit set.boolean
isArcTerminal(int arc)
boolean
isNextSet(int arc)
private int
skipArc(int offset)
Read the arc's layout and skip as many bytes, as needed.-
Methods inherited from class morfologik.fsa.FSA
getArcCount, getSequences, getSequences, iterator, read, read, readRemaining, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder
-
-
-
-
Field Detail
-
DEFAULT_FILLER
public static final byte DEFAULT_FILLER
Default filler byte.- See Also:
- Constant Field Values
-
DEFAULT_ANNOTATION
public static final byte DEFAULT_ANNOTATION
Default annotation byte.- See Also:
- Constant Field Values
-
VERSION
public static final byte VERSION
Automaton version as in the file header.- See Also:
- Constant Field Values
-
BIT_FINAL_ARC
public static final int BIT_FINAL_ARC
Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.- See Also:
- Constant Field Values
-
BIT_LAST_ARC
public static final int BIT_LAST_ARC
Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.- See Also:
- Constant Field Values
-
BIT_TARGET_NEXT
public static final int BIT_TARGET_NEXT
Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).- See Also:
- Constant Field Values
-
ADDRESS_OFFSET
public static final int ADDRESS_OFFSET
An offset in the arc structure, where the address and flags field begins. In version 5 of FSA automata, this value is constant (1, skip label).- See Also:
- Constant Field Values
-
arcs
public final byte[] arcs
An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
-
nodeDataLength
public final int nodeDataLength
The length of the node header structure (if the automaton was compiled withNUMBERS
option). Otherwise zero.
-
flags
private java.util.Set<FSAFlags> flags
Flags for this automaton version.
-
gtl
public final int gtl
Number of bytes each address takes in full, expanded form (goto length).
-
filler
public final byte filler
Filler character.
-
annotation
public final byte annotation
Annotation character.
-
-
Method Detail
-
getRootNode
public int getRootNode()
Returns the start node of this automaton.- Specified by:
getRootNode
in classFSA
- Returns:
- Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
-
getFirstArc
public final int getFirstArc(int node)
- Specified by:
getFirstArc
in classFSA
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the identifier of the first arc leaving
node
or 0 if the node has no outgoing arcs.
-
getNextArc
public final int getNextArc(int arc)
- Specified by:
getNextArc
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns the identifier of the next arc after
arc
and leavingnode
. Zero is returned if no more arcs are available for the node.
-
getArc
public int getArc(int node, byte label)
-
getEndNode
public int getEndNode(int arc)
- Specified by:
getEndNode
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the end node pointed to by a given
arc
. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
-
getArcLabel
public byte getArcLabel(int arc)
- Specified by:
getArcLabel
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Return the label associated with a given
arc
.
-
isArcFinal
public boolean isArcFinal(int arc)
- Specified by:
isArcFinal
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if the destination node at the end of thisarc
corresponds to an input sequence created when building this automaton.
-
isArcTerminal
public boolean isArcTerminal(int arc)
- Specified by:
isArcTerminal
in classFSA
- Parameters:
arc
- The arc's identifier.- Returns:
- Returns
true
if thisarc
does not have a terminating node (@linkFSA.getEndNode(int)
will throw an exception). ImpliesFSA.isArcFinal(int)
.
-
getRightLanguageCount
public int getRightLanguageCount(int node)
Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable fromnode
(called its right language).- Overrides:
getRightLanguageCount
in classFSA
- Parameters:
node
- Identifier of the node.- Returns:
- Returns the number of sequences reachable from the given state if
the automaton was compiled with
FSAFlags.NUMBERS
. The size of the right language of the state, in other words.
-
getFlags
public java.util.Set<FSAFlags> getFlags()
For this automaton version, an additional
FSAFlags.NUMBERS
flag may be set to indicate the automaton contains extra fields for each node.
-
isArcLast
public boolean isArcLast(int arc)
Returnstrue
if this arc hasNEXT
bit set.- Parameters:
arc
- The node's arc identifier.- Returns:
- Returns true if the argument is the last arc of a node.
- See Also:
BIT_LAST_ARC
-
isNextSet
public boolean isNextSet(int arc)
- Parameters:
arc
- The node's arc identifier.- Returns:
- Returns true if
BIT_TARGET_NEXT
is set for this arc. - See Also:
BIT_TARGET_NEXT
-
decodeFromBytes
static final int decodeFromBytes(byte[] arcs, int start, int n)
Returns an n-byte integer encoded in byte-packed representation.
-
getDestinationNodeOffset
final int getDestinationNodeOffset(int arc)
Returns the address of the node pointed to by this arc.
-
skipArc
private int skipArc(int offset)
Read the arc's layout and skip as many bytes, as needed.
-
-