Package com.hankcs.algorithm
Class AhoCorasickDoubleArrayTrie<V>
- java.lang.Object
-
- com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
-
- All Implemented Interfaces:
java.io.Serializable
public class AhoCorasickDoubleArrayTrie<V> extends java.lang.Object implements java.io.Serializable
An implementation of Aho Corasick algorithm based on Double Array Trie- See Also:
- Serialized Form
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
AhoCorasickDoubleArrayTrie.Builder
A builder to build the AhoCorasickDoubleArrayTriestatic class
AhoCorasickDoubleArrayTrie.Hit<V>
A result outputstatic interface
AhoCorasickDoubleArrayTrie.IHit<V>
Processor handles the output when hit a keywordstatic interface
AhoCorasickDoubleArrayTrie.IHitCancellable<V>
Callback that allows to cancel the search process.static interface
AhoCorasickDoubleArrayTrie.IHitFull<V>
Processor handles the output when hit a keyword, with more detail
-
Field Summary
Fields Modifier and Type Field Description protected int[]
base
base array of the Double Array Trie structureprotected int[]
check
check array of the Double Array Trie structureprotected int[]
fail
fail table of the Aho Corasick automataprotected int[]
l
the length of every keyprotected int[][]
output
output table of the Aho Corasick automataprotected int
size
the size of base and check arrayprotected V[]
v
outer value array
-
Constructor Summary
Constructors Constructor Description AhoCorasickDoubleArrayTrie()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
build(java.util.Map<java.lang.String,V> map)
Build a AhoCorasickDoubleArrayTrie from a mapprivate int
exactMatchSearch(char[] keyChars, int pos, int len, int nodePos)
match exactly by a keyint
exactMatchSearch(java.lang.String key)
match exactly by a keyprivate int
exactMatchSearch(java.lang.String key, int pos, int len, int nodePos)
match exactly by a keyAhoCorasickDoubleArrayTrie.Hit<V>
findFirst(java.lang.String text)
Search first match in stringV
get(int index)
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameterV
get(java.lang.String key)
Get value by a String key, just like a map.get() methodprivate int
getState(int currentState, char character)
transmit state, supports failure functionvoid
load(java.io.ObjectInputStream in)
Loadboolean
matches(java.lang.String text)
Checks that string contains at least one substringvoid
parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse textvoid
parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
Parse textjava.util.List<AhoCorasickDoubleArrayTrie.Hit<V>>
parseText(java.lang.CharSequence text)
Parse textvoid
parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse textvoid
parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)
Parse textvoid
save(java.io.ObjectOutputStream out)
Saveboolean
set(java.lang.String key, V value)
Update a value corresponding to a keyint
size()
Get the size of the keywordsprivate void
storeEmits(int position, int currentState, java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)
store outputprotected int
transition(int current, char c)
transition of a stateprotected int
transitionWithRoot(int nodePos, char c)
transition of a state, if the state is root and it failed, then returns the root
-
-
-
Field Detail
-
check
protected int[] check
check array of the Double Array Trie structure
-
base
protected int[] base
base array of the Double Array Trie structure
-
fail
protected int[] fail
fail table of the Aho Corasick automata
-
output
protected int[][] output
output table of the Aho Corasick automata
-
v
protected V[] v
outer value array
-
l
protected int[] l
the length of every key
-
size
protected int size
the size of base and check array
-
-
Method Detail
-
parseText
public java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> parseText(java.lang.CharSequence text)
Parse text- Parameters:
text
- The text- Returns:
- a list of outputs
-
parseText
public void parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(java.lang.CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
public void parseText(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor)
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
matches
public boolean matches(java.lang.String text)
Checks that string contains at least one substring- Parameters:
text
- source text to check- Returns:
true
if string contains at least one substring
-
findFirst
public AhoCorasickDoubleArrayTrie.Hit<V> findFirst(java.lang.String text)
Search first match in string- Parameters:
text
- source text to check- Returns:
- first match or
null
if there are no matches
-
save
public void save(java.io.ObjectOutputStream out) throws java.io.IOException
Save- Parameters:
out
- An ObjectOutputStream object- Throws:
java.io.IOException
- Some IOException
-
load
public void load(java.io.ObjectInputStream in) throws java.io.IOException, java.lang.ClassNotFoundException
Load- Parameters:
in
- An ObjectInputStream object- Throws:
java.io.IOException
java.lang.ClassNotFoundException
-
get
public V get(java.lang.String key)
Get value by a String key, just like a map.get() method- Parameters:
key
- The key- Returns:
-
set
public boolean set(java.lang.String key, V value)
Update a value corresponding to a key- Parameters:
key
- the keyvalue
- the value- Returns:
- successful or not(failure if there is no key)
-
get
public V get(int index)
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameter- Parameters:
index
- The index- Returns:
- The value
-
getState
private int getState(int currentState, char character)
transmit state, supports failure function- Parameters:
currentState
-character
-- Returns:
-
storeEmits
private void storeEmits(int position, int currentState, java.util.List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits)
store output- Parameters:
position
-currentState
-collectedEmits
-
-
transition
protected int transition(int current, char c)
transition of a state- Parameters:
current
-c
-- Returns:
-
transitionWithRoot
protected int transitionWithRoot(int nodePos, char c)
transition of a state, if the state is root and it failed, then returns the root- Parameters:
nodePos
-c
-- Returns:
-
build
public void build(java.util.Map<java.lang.String,V> map)
Build a AhoCorasickDoubleArrayTrie from a map- Parameters:
map
- a map containing key-value pairs
-
exactMatchSearch
public int exactMatchSearch(java.lang.String key)
match exactly by a key- Parameters:
key
- the key- Returns:
- the index of the key, you can use it as a perfect hash function
-
exactMatchSearch
private int exactMatchSearch(java.lang.String key, int pos, int len, int nodePos)
match exactly by a key- Parameters:
key
-pos
-len
-nodePos
-- Returns:
-
exactMatchSearch
private int exactMatchSearch(char[] keyChars, int pos, int len, int nodePos)
match exactly by a key- Parameters:
keyChars
- the char array of the keypos
- the begin index of char arraylen
- the length of the keynodePos
- the starting position of the node for searching- Returns:
- the value index of the key, minus indicates null
-
size
public int size()
Get the size of the keywords- Returns:
-
-