Class Levenshtein

java.lang.Object
info.debatty.java.stringsimilarity.Levenshtein
All Implemented Interfaces:
MetricStringDistance, StringDistance, Serializable

@Immutable public class Levenshtein extends Object implements MetricStringDistance
The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other.
See Also:
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    final double
    Equivalent to distance(s1, s2, Integer.MAX_VALUE).
    final double
    distance(String s1, String s2, int limit)
    The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Levenshtein

      public Levenshtein()
  • Method Details

    • distance

      public final double distance(String s1, String s2)
      Equivalent to distance(s1, s2, Integer.MAX_VALUE).
      Specified by:
      distance in interface MetricStringDistance
      Specified by:
      distance in interface StringDistance
      Parameters:
      s1 -
      s2 -
      Returns:
    • distance

      public final double distance(String s1, String s2, int limit)
      The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. http://en.wikipedia.org/wiki/Levenshtein_distance It is always at least the difference of the sizes of the two strings. It is at most the length of the longer string. It is zero if and only if the strings are equal. If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance. The Levenshtein distance verifies the triangle inequality (the distance between two strings is no greater than the sum Levenshtein distances from a third string). Implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(mn).
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      limit - The maximum result to compute before stopping. This means that the calculation can terminate early if you only care about strings with a certain similarity. Set this to Integer.MAX_VALUE if you want to run the calculation to completion in every case.
      Returns:
      The computed Levenshtein distance.
      Throws:
      NullPointerException - if s1 or s2 is null.