Class FSA5Serializer

  • All Implemented Interfaces:
    FSASerializer

    public final class FSA5Serializer
    extends java.lang.Object
    implements FSASerializer
    Serializes in-memory FSA graphs to a binary format compatible with Jan Daciuk's fsa's package FSA5 format.

    It is possible to serialize the automaton with numbers required for perfect hashing. See withNumbers() method.

    See Also:
    FSA5, FSA.read(java.io.InputStream)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      byte annotationByte  
      byte fillerByte  
      private static java.util.EnumSet<FSAFlags> flags
      Supported flags.
      private static int MAX_ARC_SIZE
      Maximum number of bytes for a serialized arc.
      private static int MAX_NODE_DATA_SIZE
      Maximum number of bytes for per-node data.
      private com.carrotsearch.hppc.IntIntHashMap numbers
      A hash map of [state, right-language-count] pairs.
      private com.carrotsearch.hppc.IntIntHashMap offsets
      A hash map of [state, offset] pairs.
      private static int SIZEOF_FLAGS
      Number of bytes for the arc's flags header (arc representation without the goto address).
      private boolean withNumbers
      true if we should serialize with numbers.
    • Constructor Summary

      Constructors 
      Constructor Description
      FSA5Serializer()  
    • Field Detail

      • MAX_ARC_SIZE

        private static final int MAX_ARC_SIZE
        Maximum number of bytes for a serialized arc.
        See Also:
        Constant Field Values
      • MAX_NODE_DATA_SIZE

        private static final int MAX_NODE_DATA_SIZE
        Maximum number of bytes for per-node data.
        See Also:
        Constant Field Values
      • SIZEOF_FLAGS

        private static final int SIZEOF_FLAGS
        Number of bytes for the arc's flags header (arc representation without the goto address).
        See Also:
        Constant Field Values
      • flags

        private static final java.util.EnumSet<FSAFlags> flags
        Supported flags.
      • fillerByte

        public byte fillerByte
        See Also:
        FSA5.filler
      • withNumbers

        private boolean withNumbers
        true if we should serialize with numbers.
        See Also:
        withNumbers()
      • offsets

        private com.carrotsearch.hppc.IntIntHashMap offsets
        A hash map of [state, offset] pairs.
      • numbers

        private com.carrotsearch.hppc.IntIntHashMap numbers
        A hash map of [state, right-language-count] pairs.
    • Constructor Detail

      • FSA5Serializer

        public FSA5Serializer()
    • Method Detail

      • withNumbers

        public FSA5Serializer withNumbers()
        Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.
        Specified by:
        withNumbers in interface FSASerializer
        Returns:
        Returns the same object for easier call chaining.
      • serialize

        public <T extends java.io.OutputStream> T serialize​(FSA fsa,
                                                            T os)
                                                     throws java.io.IOException
        Serialize root state s to an output stream in FSA5 format.
        Specified by:
        serialize in interface FSASerializer
        Type Parameters:
        T - A subclass of OutputStream, returned for chaining.
        Parameters:
        fsa - The automaton to serialize.
        os - The output stream to serialize to.
        Returns:
        Returns os for chaining.
        Throws:
        java.io.IOException - Rethrown if an I/O error occurs.
        See Also:
        withNumbers()
      • getFlags

        public java.util.Set<FSAFlags> getFlags()
        Return supported flags.
        Specified by:
        getFlags in interface FSASerializer
        Returns:
        Returns the set of flags supported by the serializer (and the output automaton).
      • linearize

        private int[] linearize​(FSA fsa)
        Linearization of states.
      • emitArcs

        private boolean emitArcs​(FSA fsa,
                                 java.io.OutputStream os,
                                 int[] linearized,
                                 int gtl,
                                 int nodeDataLength)
                          throws java.io.IOException
        Update arc offsets assuming the given goto length.
        Throws:
        java.io.IOException
      • emitArc

        private int emitArc​(java.nio.ByteBuffer bb,
                            java.io.OutputStream os,
                            int gtl,
                            int flags,
                            byte label,
                            int targetOffset)
                     throws java.io.IOException
        Throws:
        java.io.IOException
      • emitNodeData

        private int emitNodeData​(java.nio.ByteBuffer bb,
                                 java.io.OutputStream os,
                                 int nodeDataLength,
                                 int number)
                          throws java.io.IOException
        Throws:
        java.io.IOException