Class AhoCorasickDoubleArrayTrie<V>

  • All Implemented Interfaces:
    java.io.Serializable

    public class AhoCorasickDoubleArrayTrie<V>
    extends java.lang.Object
    implements java.io.Serializable
    An implementation of Aho Corasick algorithm based on Double Array Trie
    See Also:
    Serialized Form
    • Field Detail

      • check

        protected int[] check
        check array of the Double Array Trie structure
      • base

        protected int[] base
        base array of the Double Array Trie structure
      • fail

        protected int[] fail
        fail table of the Aho Corasick automata
      • output

        protected int[][] output
        output table of the Aho Corasick automata
      • v

        protected V[] v
        outer value array
      • l

        protected int[] l
        the length of every key
      • size

        protected int size
        the size of base and check array
    • Constructor Detail

      • AhoCorasickDoubleArrayTrie

        public AhoCorasickDoubleArrayTrie()
    • Method Detail

      • parseText

        public java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> parseText​(java.lang.CharSequence text)
        Parse text
        Parameters:
        text - The text
        Returns:
        a list of outputs
      • parseText

        public void parseText​(java.lang.CharSequence text,
                              AhoCorasickDoubleArrayTrie.IHit<V> processor)
        Parse text
        Parameters:
        text - The text
        processor - A processor which handles the output
      • parseText

        public void parseText​(char[] text,
                              AhoCorasickDoubleArrayTrie.IHit<V> processor)
        Parse text
        Parameters:
        text - The text
        processor - A processor which handles the output
      • parseText

        public void parseText​(char[] text,
                              AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
        Parse text
        Parameters:
        text - The text
        processor - A processor which handles the output
      • matches

        public boolean matches​(java.lang.String text)
        Checks that string contains at least one substring
        Parameters:
        text - source text to check
        Returns:
        true if string contains at least one substring
      • findFirst

        public AhoCorasickDoubleArrayTrie.Hit<V> findFirst​(java.lang.String text)
        Search first match in string
        Parameters:
        text - source text to check
        Returns:
        first match or null if there are no matches
      • save

        public void save​(java.io.ObjectOutputStream out)
                  throws java.io.IOException
        Save
        Parameters:
        out - An ObjectOutputStream object
        Throws:
        java.io.IOException - Some IOException
      • load

        public void load​(java.io.ObjectInputStream in)
                  throws java.io.IOException,
                         java.lang.ClassNotFoundException
        Load
        Parameters:
        in - An ObjectInputStream object
        Throws:
        java.io.IOException
        java.lang.ClassNotFoundException
      • get

        public V get​(java.lang.String key)
        Get value by a String key, just like a map.get() method
        Parameters:
        key - The key
        Returns:
      • set

        public boolean set​(java.lang.String key,
                           V value)
        Update a value corresponding to a key
        Parameters:
        key - the key
        value - the value
        Returns:
        successful or not(failure if there is no key)
      • get

        public V get​(int index)
        Pick the value by index in value array
        Notice that to be more efficiently, this method DO NOT check the parameter
        Parameters:
        index - The index
        Returns:
        The value
      • getState

        private int getState​(int currentState,
                             char character)
        transmit state, supports failure function
        Parameters:
        currentState -
        character -
        Returns:
      • storeEmits

        private void storeEmits​(int position,
                                int currentState,
                                java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)
        store output
        Parameters:
        position -
        currentState -
        collectedEmits -
      • transition

        protected int transition​(int current,
                                 char c)
        transition of a state
        Parameters:
        current -
        c -
        Returns:
      • transitionWithRoot

        protected int transitionWithRoot​(int nodePos,
                                         char c)
        transition of a state, if the state is root and it failed, then returns the root
        Parameters:
        nodePos -
        c -
        Returns:
      • build

        public void build​(java.util.Map<java.lang.String,​V> map)
        Build a AhoCorasickDoubleArrayTrie from a map
        Parameters:
        map - a map containing key-value pairs
      • exactMatchSearch

        public int exactMatchSearch​(java.lang.String key)
        match exactly by a key
        Parameters:
        key - the key
        Returns:
        the index of the key, you can use it as a perfect hash function
      • exactMatchSearch

        private int exactMatchSearch​(java.lang.String key,
                                     int pos,
                                     int len,
                                     int nodePos)
        match exactly by a key
        Parameters:
        key -
        pos -
        len -
        nodePos -
        Returns:
      • exactMatchSearch

        private int exactMatchSearch​(char[] keyChars,
                                     int pos,
                                     int len,
                                     int nodePos)
        match exactly by a key
        Parameters:
        keyChars - the char array of the key
        pos - the begin index of char array
        len - the length of the key
        nodePos - the starting position of the node for searching
        Returns:
        the value index of the key, minus indicates null
      • size

        public int size()
        Get the size of the keywords
        Returns: