Class QGram

java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.QGram
All Implemented Interfaces:
StringDistance, Serializable

@Immutable public class QGram extends ShingleBased implements StringDistance
Q-gram distance, as defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches". The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each n-gram): SUM( |V1_i - V2_i| ). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(m + n), where Levenshtein requires O(m.n).
See Also:
  • Constructor Details

    • QGram

      public QGram(int k)
      Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurences of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|)
      Parameters:
      k -
    • QGram

      public QGram()
      Q-gram similarity and distance. Defined by Ukkonen in "Approximate string-matching with q-grams and maximal matches", http://www.sciencedirect.com/science/article/pii/0304397592901434 The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle). Q-gram distance is a lower bound on Levenshtein distance, but can be computed in O(|A| + |B|), where Levenshtein requires O(|A|.|B|) Default k is 3.
  • Method Details

    • distance

      public final double distance(String s1, String s2)
      The distance between two strings is defined as the L1 norm of the difference of their profiles (the number of occurence of each k-shingle).
      Specified by:
      distance in interface StringDistance
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      The computed Q-gram distance.
      Throws:
      NullPointerException - if s1 or s2 is null.
    • distance

      public final double distance(Map<String,Integer> profile1, Map<String,Integer> profile2)
      Compute QGram distance using precomputed profiles.
      Parameters:
      profile1 -
      profile2 -
      Returns: