Class JaroWinkler
- java.lang.Object
-
- info.debatty.java.stringsimilarity.JaroWinkler
-
- All Implemented Interfaces:
NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,java.io.Serializable
@Immutable public class JaroWinkler extends java.lang.Object implements NormalizedStringSimilarity, NormalizedStringDistance
The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other. Jaro-Winkler was developed in the area of record linkage (duplicate detection) (Winkler, 1990). It returns a value in the interval [0.0, 1.0]. The distance is computed as 1 - Jaro-Winkler similarity.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description private static double
DEFAULT_THRESHOLD
private static double
JW_COEF
private static int
THREE
private double
threshold
-
Constructor Summary
Constructors Constructor Description JaroWinkler()
Instantiate with default threshold (0.7).JaroWinkler(double threshold)
Instantiate with given threshold to determine when Winkler bonus should be used.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description double
distance(java.lang.String s1, java.lang.String s2)
Return 1 - similarity.double
getThreshold()
Returns the current value of the threshold used for adding the Winkler bonus.private int[]
matches(java.lang.String s1, java.lang.String s2)
double
similarity(java.lang.String s1, java.lang.String s2)
Compute Jaro-Winkler similarity.
-
-
-
Field Detail
-
DEFAULT_THRESHOLD
private static final double DEFAULT_THRESHOLD
- See Also:
- Constant Field Values
-
THREE
private static final int THREE
- See Also:
- Constant Field Values
-
JW_COEF
private static final double JW_COEF
- See Also:
- Constant Field Values
-
threshold
private final double threshold
-
-
Constructor Detail
-
JaroWinkler
public JaroWinkler()
Instantiate with default threshold (0.7).
-
JaroWinkler
public JaroWinkler(double threshold)
Instantiate with given threshold to determine when Winkler bonus should be used. Set threshold to a negative value to get the Jaro distance.- Parameters:
threshold
-
-
-
Method Detail
-
getThreshold
public final double getThreshold()
Returns the current value of the threshold used for adding the Winkler bonus. The default value is 0.7.- Returns:
- the current value of the threshold
-
similarity
public final double similarity(java.lang.String s1, java.lang.String s2)
Compute Jaro-Winkler similarity.- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The Jaro-Winkler similarity in the range [0, 1]
- Throws:
java.lang.NullPointerException
- if s1 or s2 is null.
-
distance
public final double distance(java.lang.String s1, java.lang.String s2)
Return 1 - similarity.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1 - similarity.
- Throws:
java.lang.NullPointerException
- if s1 or s2 is null.
-
matches
private int[] matches(java.lang.String s1, java.lang.String s2)
-
-