Class QGram
java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.QGram
- All Implemented Interfaces:
StringDistance
,Serializable
Q-gram distance, as defined by Ukkonen in "Approximate string-matching with
q-grams and maximal matches". The distance between two strings is defined as
the L1 norm of the difference of their profiles (the number of occurences of
each n-gram): SUM( |V1_i - V2_i| ). Q-gram distance is a lower bound on
Levenshtein distance, but can be computed in O(m + n), where Levenshtein
requires O(m.n).
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal double
The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).final double
Compute QGram distance using precomputed profiles.Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
Constructor Details
-
QGram
public QGram(int k) Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|)- Parameters:
k
-
-
QGram
public QGram()Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|) Default k is 3.
-
-
Method Details
-
distance
The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The computed Q-gram distance.
- Throws:
NullPointerException
- if s1 or s2 is null.
-
distance
Compute QGram distance using precomputed profiles.- Parameters:
profile1
-profile2
-- Returns:
-