Class FSA5

  • All Implemented Interfaces:
    java.lang.Iterable<java.nio.ByteBuffer>

    public final class FSA5
    extends FSA
    FSA binary format implementation for version 5.

    Version 5 indicates the dictionary was built with these flags: FSAFlags.FLEXIBLE, FSAFlags.STOPBIT and FSAFlags.NEXTBIT. The internal representation of the FSA must therefore follow this description (please note this format describes only a single transition (arc), not the entire dictionary file).

     ---- this node header present only if automaton was compiled with NUMBERS option.
     Byte
            +-+-+-+-+-+-+-+-+\
          0 | | | | | | | | | \  LSB
            +-+-+-+-+-+-+-+-+  +
          1 | | | | | | | | |  |      number of strings recognized
            +-+-+-+-+-+-+-+-+  +----- by the automaton starting
            : : : : : : : : :  |      from this node.
            +-+-+-+-+-+-+-+-+  +
      ctl-1 | | | | | | | | | /  MSB
            +-+-+-+-+-+-+-+-+/
            
     ---- remaining part of the node
     
     Byte
           +-+-+-+-+-+-+-+-+\
         0 | | | | | | | | | +------ label
           +-+-+-+-+-+-+-+-+/
     
                      +------------- node pointed to is next
                      | +----------- the last arc of the node
                      | | +--------- the arc is final
                      | | |
                 +-----------+
                 |    | | |  |
             ___+___  | | |  |
            /       \ | | |  |
           MSB           LSB |
            7 6 5 4 3 2 1 0  |
           +-+-+-+-+-+-+-+-+ |
         1 | | | | | | | | | \ \
           +-+-+-+-+-+-+-+-+  \ \  LSB
           +-+-+-+-+-+-+-+-+     +
         2 | | | | | | | | |     |
           +-+-+-+-+-+-+-+-+     |
         3 | | | | | | | | |     +----- target node address (in bytes)
           +-+-+-+-+-+-+-+-+     |      (not present except for the byte
           : : : : : : : : :     |       with flags if the node pointed to
           +-+-+-+-+-+-+-+-+     +       is next)
       gtl | | | | | | | | |    /  MSB
           +-+-+-+-+-+-+-+-+   /
     gtl+1                           (gtl = gotoLength)
     
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int ADDRESS_OFFSET
      An offset in the arc structure, where the address and flags field begins.
      byte annotation
      Annotation character.
      byte[] arcs
      An array of bytes with the internal representation of the automaton.
      static int BIT_FINAL_ARC
      Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
      static int BIT_LAST_ARC
      Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
      static int BIT_TARGET_NEXT
      Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
      static byte DEFAULT_ANNOTATION
      Default annotation byte.
      static byte DEFAULT_FILLER
      Default filler byte.
      byte filler
      Filler character.
      private java.util.Set<FSAFlags> flags
      Flags for this automaton version.
      int gtl
      Number of bytes each address takes in full, expanded form (goto length).
      int nodeDataLength
      The length of the node header structure (if the automaton was compiled with NUMBERS option).
      static byte VERSION
      Automaton version as in the file header.
    • Constructor Summary

      Constructors 
      Constructor Description
      FSA5​(java.io.InputStream stream)
      Read and wrap a binary automaton in FSA version 5.
    • Field Detail

      • DEFAULT_FILLER

        public static final byte DEFAULT_FILLER
        Default filler byte.
        See Also:
        Constant Field Values
      • DEFAULT_ANNOTATION

        public static final byte DEFAULT_ANNOTATION
        Default annotation byte.
        See Also:
        Constant Field Values
      • VERSION

        public static final byte VERSION
        Automaton version as in the file header.
        See Also:
        Constant Field Values
      • BIT_FINAL_ARC

        public static final int BIT_FINAL_ARC
        Bit indicating that an arc corresponds to the last character of a sequence available when building the automaton.
        See Also:
        Constant Field Values
      • BIT_LAST_ARC

        public static final int BIT_LAST_ARC
        Bit indicating that an arc is the last one of the node's list and the following one belongs to another node.
        See Also:
        Constant Field Values
      • BIT_TARGET_NEXT

        public static final int BIT_TARGET_NEXT
        Bit indicating that the target node of this arc follows it in the compressed automaton structure (no goto field).
        See Also:
        Constant Field Values
      • ADDRESS_OFFSET

        public static final int ADDRESS_OFFSET
        An offset in the arc structure, where the address and flags field begins. In version 5 of FSA automata, this value is constant (1, skip label).
        See Also:
        Constant Field Values
      • arcs

        public final byte[] arcs
        An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
      • nodeDataLength

        public final int nodeDataLength
        The length of the node header structure (if the automaton was compiled with NUMBERS option). Otherwise zero.
      • flags

        private java.util.Set<FSAFlags> flags
        Flags for this automaton version.
      • gtl

        public final int gtl
        Number of bytes each address takes in full, expanded form (goto length).
      • filler

        public final byte filler
        Filler character.
      • annotation

        public final byte annotation
        Annotation character.
    • Constructor Detail

      • FSA5

        FSA5​(java.io.InputStream stream)
        throws java.io.IOException
        Read and wrap a binary automaton in FSA version 5.
        Throws:
        java.io.IOException
    • Method Detail

      • getRootNode

        public int getRootNode()
        Returns the start node of this automaton.
        Specified by:
        getRootNode in class FSA
        Returns:
        Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
      • getFirstArc

        public final int getFirstArc​(int node)
        Specified by:
        getFirstArc in class FSA
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
      • getNextArc

        public final int getNextArc​(int arc)
        Specified by:
        getNextArc in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.
      • getArc

        public int getArc​(int node,
                          byte label)
        Specified by:
        getArc in class FSA
        Parameters:
        node - Identifier of the node.
        label - The arc's label.
        Returns:
        Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.
      • getEndNode

        public int getEndNode​(int arc)
        Specified by:
        getEndNode in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
      • getArcLabel

        public byte getArcLabel​(int arc)
        Specified by:
        getArcLabel in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the label associated with a given arc.
      • isArcFinal

        public boolean isArcFinal​(int arc)
        Specified by:
        isArcFinal in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
      • isArcTerminal

        public boolean isArcTerminal​(int arc)
        Specified by:
        isArcTerminal in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).
      • getRightLanguageCount

        public int getRightLanguageCount​(int node)
        Returns the number encoded at the given node. The number equals the count of the set of suffixes reachable from node (called its right language).
        Overrides:
        getRightLanguageCount in class FSA
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.
      • getFlags

        public java.util.Set<FSAFlags> getFlags()

        For this automaton version, an additional FSAFlags.NUMBERS flag may be set to indicate the automaton contains extra fields for each node.

        Specified by:
        getFlags in class FSA
        Returns:
        Returns a set of flags for this FSA instance.
      • isArcLast

        public boolean isArcLast​(int arc)
        Returns true if this arc has NEXT bit set.
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if the argument is the last arc of a node.
        See Also:
        BIT_LAST_ARC
      • isNextSet

        public boolean isNextSet​(int arc)
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if BIT_TARGET_NEXT is set for this arc.
        See Also:
        BIT_TARGET_NEXT
      • decodeFromBytes

        static final int decodeFromBytes​(byte[] arcs,
                                         int start,
                                         int n)
        Returns an n-byte integer encoded in byte-packed representation.
      • getDestinationNodeOffset

        final int getDestinationNodeOffset​(int arc)
        Returns the address of the node pointed to by this arc.
      • skipArc

        private int skipArc​(int offset)
        Read the arc's layout and skip as many bytes, as needed.