Package com.hankcs.algorithm
Class AhoCorasickDoubleArrayTrie<V>
java.lang.Object
com.hankcs.algorithm.AhoCorasickDoubleArrayTrie<V>
- All Implemented Interfaces:
Serializable
An implementation of Aho Corasick algorithm based on Double Array Trie
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprivate class
A builder to build the AhoCorasickDoubleArrayTriestatic class
A result outputstatic interface
Processor handles the output when hit a keywordstatic interface
Callback that allows to cancel the search process.static interface
Processor handles the output when hit a keyword, with more detail -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected int[]
base array of the Double Array Trie structureprotected int[]
check array of the Double Array Trie structureprotected int[]
fail table of the Aho Corasick automataprotected int[]
the length of every keyprotected int[][]
output table of the Aho Corasick automataprotected int
the size of base and check arrayprotected V[]
outer value array -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
Build a AhoCorasickDoubleArrayTrie from a mapprivate int
exactMatchSearch
(char[] keyChars, int pos, int len, int nodePos) match exactly by a keyint
exactMatchSearch
(String key) match exactly by a keyprivate int
exactMatchSearch
(String key, int pos, int len, int nodePos) match exactly by a keySearch first match in stringget
(int index) Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameterGet value by a String key, just like a map.get() methodprivate int
getState
(int currentState, char character) transmit state, supports failure functionvoid
Loadboolean
Checks that string contains at least one substringvoid
parseText
(char[] text, AhoCorasickDoubleArrayTrie.IHit<V> processor) Parse textvoid
parseText
(char[] text, AhoCorasickDoubleArrayTrie.IHitFull<V> processor) Parse textparseText
(CharSequence text) Parse textvoid
parseText
(CharSequence text, AhoCorasickDoubleArrayTrie.IHit<V> processor) Parse textvoid
parseText
(CharSequence text, AhoCorasickDoubleArrayTrie.IHitCancellable<V> processor) Parse textvoid
save
(ObjectOutputStream out) Saveboolean
Update a value corresponding to a keyint
size()
Get the size of the keywordsprivate void
storeEmits
(int position, int currentState, List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits) store outputprotected int
transition
(int current, char c) transition of a stateprotected int
transitionWithRoot
(int nodePos, char c) transition of a state, if the state is root and it failed, then returns the root
-
Field Details
-
check
protected int[] checkcheck array of the Double Array Trie structure -
base
protected int[] basebase array of the Double Array Trie structure -
fail
protected int[] failfail table of the Aho Corasick automata -
output
protected int[][] outputoutput table of the Aho Corasick automata -
v
outer value array -
l
protected int[] lthe length of every key -
size
protected int sizethe size of base and check array
-
-
Constructor Details
-
AhoCorasickDoubleArrayTrie
public AhoCorasickDoubleArrayTrie()
-
-
Method Details
-
parseText
Parse text- Parameters:
text
- The text- Returns:
- a list of outputs
-
parseText
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
parseText
Parse text- Parameters:
text
- The textprocessor
- A processor which handles the output
-
matches
Checks that string contains at least one substring- Parameters:
text
- source text to check- Returns:
true
if string contains at least one substring
-
findFirst
Search first match in string- Parameters:
text
- source text to check- Returns:
- first match or
null
if there are no matches
-
save
Save- Parameters:
out
- An ObjectOutputStream object- Throws:
IOException
- Some IOException
-
load
Load- Parameters:
in
- An ObjectInputStream object- Throws:
IOException
ClassNotFoundException
-
get
Get value by a String key, just like a map.get() method- Parameters:
key
- The key- Returns:
-
set
Update a value corresponding to a key- Parameters:
key
- the keyvalue
- the value- Returns:
- successful or not(failure if there is no key)
-
get
Pick the value by index in value array
Notice that to be more efficiently, this method DO NOT check the parameter- Parameters:
index
- The index- Returns:
- The value
-
getState
private int getState(int currentState, char character) transmit state, supports failure function- Parameters:
currentState
-character
-- Returns:
-
storeEmits
private void storeEmits(int position, int currentState, List<AhoCorasickDoubleArrayTrie.Hit<V>> collectedEmits) store output- Parameters:
position
-currentState
-collectedEmits
-
-
transition
protected int transition(int current, char c) transition of a state- Parameters:
current
-c
-- Returns:
-
transitionWithRoot
protected int transitionWithRoot(int nodePos, char c) transition of a state, if the state is root and it failed, then returns the root- Parameters:
nodePos
-c
-- Returns:
-
build
Build a AhoCorasickDoubleArrayTrie from a map- Parameters:
map
- a map containing key-value pairs
-
exactMatchSearch
match exactly by a key- Parameters:
key
- the key- Returns:
- the index of the key, you can use it as a perfect hash function
-
exactMatchSearch
match exactly by a key- Parameters:
key
-pos
-len
-nodePos
-- Returns:
-
exactMatchSearch
private int exactMatchSearch(char[] keyChars, int pos, int len, int nodePos) match exactly by a key- Parameters:
keyChars
- the char array of the keypos
- the begin index of char arraylen
- the length of the keynodePos
- the starting position of the node for searching- Returns:
- the value index of the key, minus indicates null
-
size
public int size()Get the size of the keywords- Returns:
-