Class NGram
java.lang.Object
info.debatty.java.stringsimilarity.NGram
- All Implemented Interfaces:
NormalizedStringDistance
,StringDistance
,Serializable
N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance",
String Processing and Information Retrieval, Lecture Notes in Computer
Science Volume 3772, 2005, pp 115-126.
The algorithm uses affixing with special character '\n' to increase the
weight of first characters. The normalization is achieved by dividing the
total similarity score the original length of the longest word.
http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf
- See Also:
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
-
Field Details
-
DEFAULT_N
private static final int DEFAULT_N- See Also:
-
n
private final int n
-
-
Constructor Details
-
NGram
public NGram(int n) Instantiate with given value for n-gram length.- Parameters:
n
-
-
NGram
public NGram()Instantiate with default value for n-gram length (2).
-
-
Method Details
-
distance
Compute n-gram distance.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s0
- The first string to compare.s1
- The second string to compare.- Returns:
- The computed n-gram distance in the range [0, 1]
- Throws:
NullPointerException
- if s0 or s1 is null.
-