Class CFSA2

  • All Implemented Interfaces:
    java.lang.Iterable<java.nio.ByteBuffer>

    public final class CFSA2
    extends FSA
    CFSA (Compact Finite State Automaton) binary format implementation, version 2:
    • BIT_TARGET_NEXT applicable on all arcs, not necessarily the last one.
    • v-coded goto field
    • v-coded perfect hashing numbers, if any
    • 31 most frequent labels integrated with flags byte

    The encoding of automaton body is as follows.

     ---- CFSA header
     Byte                            Description 
           +-+-+-+-+-+-+-+-+\
         0 | | | | | | | | | +------ '\'
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         1 | | | | | | | | | +------ 'f'
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         2 | | | | | | | | | +------ 's'
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         3 | | | | | | | | | +------ 'a'
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         4 | | | | | | | | | +------ version (fixed 0xc6)
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         5 | | | | | | | | | +----\
           +-+-+-+-+-+-+-+-+/      \ flags [MSB first]
           +-+-+-+-+-+-+-+-+\      /
         6 | | | | | | | | | +----/
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         7 | | | | | | | | | +------ label lookup table size
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
      8-32 | | | | | | | | | +------ label value lookup table 
           : : : : : : : : : |
           +-+-+-+-+-+-+-+-+/
     
     ---- Start of a node; only if automaton was compiled with NUMBERS option.
     
     Byte
            +-+-+-+-+-+-+-+-+\
          0 | | | | | | | | | \  
            +-+-+-+-+-+-+-+-+  +
          1 | | | | | | | | |  |      number of strings recognized
            +-+-+-+-+-+-+-+-+  +----- by the automaton starting
            : : : : : : : : :  |      from this node. v-coding
            +-+-+-+-+-+-+-+-+  +
            | | | | | | | | | /  
            +-+-+-+-+-+-+-+-+/
    
     ---- A vector of this node's arcs. An arc's layout depends on the combination of flags.
     
     1) NEXT bit set, mapped arc label. 
     
            +----------------------- node pointed to is next
            | +--------------------- the last arc of the node
            | | +------------------- this arc leads to a final state (acceptor)
            | | |  _______+--------- arc's label; indexed if M > 0, otherwise explicit label follows
            | | | / | | | |
           +-+-+-+-+-+-+-+-+\
         0 |N|L|F|M|M|M|M|M| +------ flags + (M) index of the mapped label.
           +-+-+-+-+-+-+-+-+/
           +-+-+-+-+-+-+-+-+\
         1 | | | | | | | | | +------ optional label if M == 0
           +-+-+-+-+-+-+-+-+/
           : : : : : : : : :
           +-+-+-+-+-+-+-+-+\
           |A|A|A|A|A|A|A|A| +------ v-coded goto address
           +-+-+-+-+-+-+-+-+/
     
    • Field Detail

      • VERSION

        public static final byte VERSION
        Automaton header version value.
        See Also:
        Constant Field Values
      • BIT_TARGET_NEXT

        public static final int BIT_TARGET_NEXT
        The target node of this arc follows the last arc of the current state (no goto field).
        See Also:
        Constant Field Values
      • BIT_LAST_ARC

        public static final int BIT_LAST_ARC
        The arc is the last one from the current node's arcs list.
        See Also:
        Constant Field Values
      • BIT_FINAL_ARC

        public static final int BIT_FINAL_ARC
        The arc corresponds to the last character of a sequence available when building the automaton (acceptor transition).
        See Also:
        Constant Field Values
      • LABEL_INDEX_BITS

        static final int LABEL_INDEX_BITS
        The count of bits assigned to storing an indexed label.
        See Also:
        Constant Field Values
      • LABEL_INDEX_MASK

        static final int LABEL_INDEX_MASK
        Masks only the M bits of a flag byte.
        See Also:
        Constant Field Values
      • LABEL_INDEX_SIZE

        public static final int LABEL_INDEX_SIZE
        Maximum size of the labels index.
        See Also:
        Constant Field Values
      • arcs

        public byte[] arcs
        An array of bytes with the internal representation of the automaton. Please see the documentation of this class for more information on how this structure is organized.
      • flags

        private final java.util.EnumSet<FSAFlags> flags
        Flags for this automaton version.
      • labelMapping

        public final byte[] labelMapping
        Label mapping for M-indexed labels.
      • hasNumbers

        private final boolean hasNumbers
        If true states are prepended with numbers.
    • Constructor Detail

      • CFSA2

        CFSA2​(java.io.InputStream stream)
        throws java.io.IOException
        Reads an automaton from a byte stream.
        Throws:
        java.io.IOException
    • Method Detail

      • getRootNode

        public int getRootNode()
        Specified by:
        getRootNode in class FSA
        Returns:
        Returns the identifier of the root node of this automaton. Returns 0 if the start node is also the end node (the automaton is empty).
      • getFirstArc

        public final int getFirstArc​(int node)
        Specified by:
        getFirstArc in class FSA
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the identifier of the first arc leaving node or 0 if the node has no outgoing arcs.
      • getNextArc

        public final int getNextArc​(int arc)
        Specified by:
        getNextArc in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns the identifier of the next arc after arc and leaving node. Zero is returned if no more arcs are available for the node.
      • getArc

        public int getArc​(int node,
                          byte label)
        Specified by:
        getArc in class FSA
        Parameters:
        node - Identifier of the node.
        label - The arc's label.
        Returns:
        Returns the identifier of an arc leaving node and labeled with label. An identifier equal to 0 means the node has no outgoing arc labeled label.
      • getEndNode

        public int getEndNode​(int arc)
        Specified by:
        getEndNode in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the end node pointed to by a given arc. Terminal arcs (those that point to a terminal state) have no end node representation and throw a runtime exception.
      • getArcLabel

        public byte getArcLabel​(int arc)
        Specified by:
        getArcLabel in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Return the label associated with a given arc.
      • getRightLanguageCount

        public int getRightLanguageCount​(int node)
        Overrides:
        getRightLanguageCount in class FSA
        Parameters:
        node - Identifier of the node.
        Returns:
        Returns the number of sequences reachable from the given state if the automaton was compiled with FSAFlags.NUMBERS. The size of the right language of the state, in other words.
      • isArcFinal

        public boolean isArcFinal​(int arc)
        Specified by:
        isArcFinal in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if the destination node at the end of this arc corresponds to an input sequence created when building this automaton.
      • isArcTerminal

        public boolean isArcTerminal​(int arc)
        Specified by:
        isArcTerminal in class FSA
        Parameters:
        arc - The arc's identifier.
        Returns:
        Returns true if this arc does not have a terminating node (@link FSA.getEndNode(int) will throw an exception). Implies FSA.isArcFinal(int).
      • isArcLast

        public boolean isArcLast​(int arc)
        Returns true if this arc has NEXT bit set.
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if the argument is the last arc of a node.
        See Also:
        BIT_LAST_ARC
      • isNextSet

        public boolean isNextSet​(int arc)
        Parameters:
        arc - The node's arc identifier.
        Returns:
        Returns true if BIT_TARGET_NEXT is set for this arc.
        See Also:
        BIT_TARGET_NEXT
      • getFlags

        public java.util.Set<FSAFlags> getFlags()
        Specified by:
        getFlags in class FSA
        Returns:
        Returns a set of flags for this FSA instance.
      • getDestinationNodeOffset

        final int getDestinationNodeOffset​(int arc)
        Returns the address of the node pointed to by this arc.
      • skipArc

        private int skipArc​(int offset)
        Read the arc's layout and skip as many bytes, as needed, to skip it.
      • readVInt

        static int readVInt​(byte[] array,
                            int offset)
        Read a v-int.
      • vIntLength

        static int vIntLength​(int value)
        Return the byte-length of a v-coded int.
      • skipVInt

        private int skipVInt​(int offset)
        Skip a v-int.