Class Levenshtein

  • All Implemented Interfaces:
    MetricStringDistance, StringDistance, java.io.Serializable

    @Immutable
    public class Levenshtein
    extends java.lang.Object
    implements MetricStringDistance
    The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one string into the other.
    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      Levenshtein()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double distance​(java.lang.String s1, java.lang.String s2)
      Equivalent to distance(s1, s2, Integer.MAX_VALUE).
      double distance​(java.lang.String s1, java.lang.String s2, int limit)
      The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • Levenshtein

        public Levenshtein()
    • Method Detail

      • distance

        public final double distance​(java.lang.String s1,
                                     java.lang.String s2)
        Equivalent to distance(s1, s2, Integer.MAX_VALUE).
        Specified by:
        distance in interface MetricStringDistance
        Specified by:
        distance in interface StringDistance
        Returns:
      • distance

        public final double distance​(java.lang.String s1,
                                     java.lang.String s2,
                                     int limit)
        The Levenshtein distance, or edit distance, between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. http://en.wikipedia.org/wiki/Levenshtein_distance It is always at least the difference of the sizes of the two strings. It is at most the length of the longer string. It is zero if and only if the strings are equal. If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance. The Levenshtein distance verifies the triangle inequality (the distance between two strings is no greater than the sum Levenshtein distances from a third string). Implementation uses dynamic programming (Wagner–Fischer algorithm), with only 2 rows of data. The space requirement is thus O(m) and the algorithm runs in O(mn).
        Parameters:
        s1 - The first string to compare.
        s2 - The second string to compare.
        limit - The maximum result to compute before stopping. This means that the calculation can terminate early if you only care about strings with a certain similarity. Set this to Integer.MAX_VALUE if you want to run the calculation to completion in every case.
        Returns:
        The computed Levenshtein distance.
        Throws:
        java.lang.NullPointerException - if s1 or s2 is null.