Class CFSA2Serializer

  • All Implemented Interfaces:
    FSASerializer

    public final class CFSA2Serializer
    extends java.lang.Object
    implements FSASerializer
    Serializes in-memory FSA graphs to CFSA2.

    It is possible to serialize the automaton with numbers required for perfect hashing. See withNumbers() method.

    See Also:
    CFSA2
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static java.util.EnumSet<FSAFlags> flags
      Supported flags.
      private byte[] labelsIndex
      The most frequent labels for integrating with the flags field.
      private int[] labelsInvIndex
      Inverted index of labels to be integrated with flags field.
      private java.util.logging.Logger logger  
      private static int NO_STATE
      No-state id.
      private com.carrotsearch.hppc.IntIntHashMap numbers
      A hash map of [state, right-language-count] pairs.
      private com.carrotsearch.hppc.IntIntHashMap offsets
      A hash map of [state, offset] pairs.
      private byte[] scratch
      Scratch array for serializing vints.
      private boolean withNumbers
      true if we should serialize with numbers.
    • Constructor Summary

      Constructors 
      Constructor Description
      CFSA2Serializer()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private int[] computeFirstStates​(com.carrotsearch.hppc.IntIntHashMap inlinkCount, int maxStates, int minInlinkCount)
      Compute the set of states that should be linearized first to minimize other states goto length.
      private com.carrotsearch.hppc.IntIntHashMap computeInlinkCount​(FSA fsa)
      Compute in-link count for each state.
      private void computeLabelsIndex​(FSA fsa)
      Compute a set of labels to be integrated with the flags field.
      private int emitArc​(java.io.OutputStream os, int flags, byte label, int targetOffset)  
      private int emitNodeArcs​(FSA fsa, java.io.OutputStream os, int state, int nextState)
      Emit all arcs of a single node.
      private int emitNodeData​(java.io.OutputStream os, int number)  
      private int emitNodes​(FSA fsa, java.io.OutputStream os, com.carrotsearch.hppc.IntArrayList linearized)
      Update arc offsets assuming the given goto length.
      java.util.Set<FSAFlags> getFlags()
      Return supported flags.
      private com.carrotsearch.hppc.IntArrayList linearize​(FSA fsa)
      Linearization of states.
      private int linearizeAndCalculateOffsets​(FSA fsa, com.carrotsearch.hppc.IntArrayList states, com.carrotsearch.hppc.IntArrayList linearized, com.carrotsearch.hppc.IntIntHashMap offsets)
      Linearize all states, putting states in front of the automaton and calculating stable state offsets.
      private void linearizeState​(FSA fsa, com.carrotsearch.hppc.IntStack nodes, com.carrotsearch.hppc.IntArrayList linearized, java.util.BitSet visited, int node)
      Add a state to linearized list.
      private void log​(java.util.logging.Level level, java.lang.String msg, java.lang.Object... args)  
      <T extends java.io.OutputStream>
      T
      serialize​(FSA fsa, T os)
      Serializes any FSA to CFSA2 stream.
      CFSA2Serializer withAnnotationSeparator​(byte annotationSeparator)
      Sets the annotation separator (only if FSASerializer.getFlags() returns FSAFlags.SEPARATORS).
      CFSA2Serializer withFiller​(byte filler)
      Sets the filler separator (only if FSASerializer.getFlags() returns FSAFlags.SEPARATORS).
      CFSA2Serializer withNumbers()
      Serialize the automaton with the number of right-language sequences in each node.
      (package private) static int writeVInt​(byte[] array, int offset, int value)
      Write a v-int to a byte array.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • logger

        private final java.util.logging.Logger logger
      • flags

        private static final java.util.EnumSet<FSAFlags> flags
        Supported flags.
      • withNumbers

        private boolean withNumbers
        true if we should serialize with numbers.
        See Also:
        withNumbers()
      • offsets

        private com.carrotsearch.hppc.IntIntHashMap offsets
        A hash map of [state, offset] pairs.
      • numbers

        private com.carrotsearch.hppc.IntIntHashMap numbers
        A hash map of [state, right-language-count] pairs.
      • scratch

        private final byte[] scratch
        Scratch array for serializing vints.
      • labelsIndex

        private byte[] labelsIndex
        The most frequent labels for integrating with the flags field.
      • labelsInvIndex

        private int[] labelsInvIndex
        Inverted index of labels to be integrated with flags field. A label at index i has the index or zero (no integration).
    • Constructor Detail

      • CFSA2Serializer

        public CFSA2Serializer()
    • Method Detail

      • withNumbers

        public CFSA2Serializer withNumbers()
        Serialize the automaton with the number of right-language sequences in each node. This is required to implement perfect hashing. The numbering also preserves the order of input sequences.
        Specified by:
        withNumbers in interface FSASerializer
        Returns:
        Returns the same object for easier call chaining.
      • serialize

        public <T extends java.io.OutputStream> T serialize​(FSA fsa,
                                                            T os)
                                                     throws java.io.IOException
        Serializes any FSA to CFSA2 stream.
        Specified by:
        serialize in interface FSASerializer
        Type Parameters:
        T - A subclass of OutputStream, returned for chaining.
        Parameters:
        fsa - The automaton to serialize.
        os - The output stream to serialize to.
        Returns:
        Returns os for chaining.
        Throws:
        java.io.IOException - Rethrown if an I/O error occurs.
        See Also:
        withNumbers()
      • computeLabelsIndex

        private void computeLabelsIndex​(FSA fsa)
        Compute a set of labels to be integrated with the flags field.
      • getFlags

        public java.util.Set<FSAFlags> getFlags()
        Return supported flags.
        Specified by:
        getFlags in interface FSASerializer
        Returns:
        Returns the set of flags supported by the serializer (and the output automaton).
      • linearize

        private com.carrotsearch.hppc.IntArrayList linearize​(FSA fsa)
                                                      throws java.io.IOException
        Linearization of states.
        Throws:
        java.io.IOException
      • log

        private void log​(java.util.logging.Level level,
                         java.lang.String msg,
                         java.lang.Object... args)
      • linearizeAndCalculateOffsets

        private int linearizeAndCalculateOffsets​(FSA fsa,
                                                 com.carrotsearch.hppc.IntArrayList states,
                                                 com.carrotsearch.hppc.IntArrayList linearized,
                                                 com.carrotsearch.hppc.IntIntHashMap offsets)
                                          throws java.io.IOException
        Linearize all states, putting states in front of the automaton and calculating stable state offsets.
        Throws:
        java.io.IOException
      • linearizeState

        private void linearizeState​(FSA fsa,
                                    com.carrotsearch.hppc.IntStack nodes,
                                    com.carrotsearch.hppc.IntArrayList linearized,
                                    java.util.BitSet visited,
                                    int node)
        Add a state to linearized list.
      • computeFirstStates

        private int[] computeFirstStates​(com.carrotsearch.hppc.IntIntHashMap inlinkCount,
                                         int maxStates,
                                         int minInlinkCount)
        Compute the set of states that should be linearized first to minimize other states goto length.
      • computeInlinkCount

        private com.carrotsearch.hppc.IntIntHashMap computeInlinkCount​(FSA fsa)
        Compute in-link count for each state.
      • emitNodes

        private int emitNodes​(FSA fsa,
                              java.io.OutputStream os,
                              com.carrotsearch.hppc.IntArrayList linearized)
                       throws java.io.IOException
        Update arc offsets assuming the given goto length.
        Throws:
        java.io.IOException
      • emitNodeArcs

        private int emitNodeArcs​(FSA fsa,
                                 java.io.OutputStream os,
                                 int state,
                                 int nextState)
                          throws java.io.IOException
        Emit all arcs of a single node.
        Throws:
        java.io.IOException
      • emitArc

        private int emitArc​(java.io.OutputStream os,
                            int flags,
                            byte label,
                            int targetOffset)
                     throws java.io.IOException
        Throws:
        java.io.IOException
      • emitNodeData

        private int emitNodeData​(java.io.OutputStream os,
                                 int number)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • writeVInt

        static int writeVInt​(byte[] array,
                             int offset,
                             int value)
        Write a v-int to a byte array.