Class SorensenDice

java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.SorensenDice
All Implemented Interfaces:
NormalizedStringDistance, NormalizedStringSimilarity, StringDistance, StringSimilarity, Serializable

@Immutable public class SorensenDice extends ShingleBased implements NormalizedStringDistance, NormalizedStringSimilarity
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 inter V2| / (|V1| + |V2|). Distance is computed as 1 - cosine similarity.
See Also:
  • Constructor Details

    • SorensenDice

      public SorensenDice(int k)
      Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.
      Parameters:
      k -
    • SorensenDice

      public SorensenDice()
      Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
  • Method Details

    • similarity

      public final double similarity(String s1, String s2)
      Similarity is computed as 2 * |A inter B| / (|A| + |B|).
      Specified by:
      similarity in interface StringSimilarity
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      The computed Sorensen-Dice similarity.
      Throws:
      NullPointerException - if s1 or s2 is null.
    • distance

      public final double distance(String s1, String s2)
      Returns 1 - similarity.
      Specified by:
      distance in interface StringDistance
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      1.0 - the computed similarity
      Throws:
      NullPointerException - if s1 or s2 is null.