Class NGram
- java.lang.Object
-
- info.debatty.java.stringsimilarity.NGram
-
- All Implemented Interfaces:
NormalizedStringDistance
,StringDistance
,java.io.Serializable
@Immutable public class NGram extends java.lang.Object implements NormalizedStringDistance
N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance", String Processing and Information Retrieval, Lecture Notes in Computer Science Volume 3772, 2005, pp 115-126. The algorithm uses affixing with special character '\n' to increase the weight of first characters. The normalization is achieved by dividing the total similarity score the original length of the longest word. http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf- See Also:
- Serialized Form
-
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(java.lang.String s0, java.lang.String s1)
Compute n-gram distance.
-
-
-
Field Detail
-
DEFAULT_N
private static final int DEFAULT_N
- See Also:
- Constant Field Values
-
n
private final int n
-
-
Method Detail
-
distance
public final double distance(java.lang.String s0, java.lang.String s1)
Compute n-gram distance.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s0
- The first string to compare.s1
- The second string to compare.- Returns:
- The computed n-gram distance in the range [0, 1]
- Throws:
java.lang.NullPointerException
- if s0 or s1 is null.
-
-