Class JaroWinkler

java.lang.Object
info.debatty.java.stringsimilarity.JaroWinkler
All Implemented Interfaces:
NormalizedStringDistance, NormalizedStringSimilarity, StringDistance, StringSimilarity, Serializable

@Immutable public class JaroWinkler extends Object implements NormalizedStringSimilarity, NormalizedStringDistance
The Jaro–Winkler distance metric is designed and best suited for short strings such as person names, and to detect typos; it is (roughly) a variation of Damerau-Levenshtein, where the substitution of 2 close characters is considered less important then the substitution of 2 characters that a far from each other. Jaro-Winkler was developed in the area of record linkage (duplicate detection) (Winkler, 1990). It returns a value in the interval [0.0, 1.0]. The distance is computed as 1 - Jaro-Winkler similarity.
See Also:
  • Field Details

  • Constructor Details

    • JaroWinkler

      public JaroWinkler()
      Instantiate with default threshold (0.7).
    • JaroWinkler

      public JaroWinkler(double threshold)
      Instantiate with given threshold to determine when Winkler bonus should be used. Set threshold to a negative value to get the Jaro distance.
      Parameters:
      threshold -
  • Method Details

    • getThreshold

      public final double getThreshold()
      Returns the current value of the threshold used for adding the Winkler bonus. The default value is 0.7.
      Returns:
      the current value of the threshold
    • similarity

      public final double similarity(String s1, String s2)
      Compute Jaro-Winkler similarity.
      Specified by:
      similarity in interface StringSimilarity
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      The Jaro-Winkler similarity in the range [0, 1]
      Throws:
      NullPointerException - if s1 or s2 is null.
    • distance

      public final double distance(String s1, String s2)
      Return 1 - similarity.
      Specified by:
      distance in interface StringDistance
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      1 - similarity.
      Throws:
      NullPointerException - if s1 or s2 is null.
    • matches

      private int[] matches(String s1, String s2)