distance
public final double distance(java.lang.String s1,
java.lang.String s2,
int limit)
The Levenshtein distance, or edit distance, between two words is the
minimum number of single-character edits (insertions, deletions or
substitutions) required to change one word into the other.
http://en.wikipedia.org/wiki/Levenshtein_distance
It is always at least the difference of the sizes of the two strings.
It is at most the length of the longer string.
It is zero if and only if the strings are equal.
If the strings are the same size, the Hamming distance is an upper bound
on the Levenshtein distance.
The Levenshtein distance verifies the triangle inequality (the distance
between two strings is no greater than the sum Levenshtein distances from
a third string).
Implementation uses dynamic programming (Wagner–Fischer algorithm), with
only 2 rows of data. The space requirement is thus O(m) and the algorithm
runs in O(mn).
- Parameters:
s1
- The first string to compare.
s2
- The second string to compare.
limit
- The maximum result to compute before stopping. This
means that the calculation can terminate early if you
only care about strings with a certain similarity.
Set this to Integer.MAX_VALUE if you want to run the
calculation to completion in every case.
- Returns:
- The computed Levenshtein distance.
- Throws:
java.lang.NullPointerException
- if s1 or s2 is null.