Class SorensenDice
java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.SorensenDice
- All Implemented Interfaces:
NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,Serializable
@Immutable
public class SorensenDice
extends ShingleBased
implements NormalizedStringDistance, NormalizedStringSimilarity
Similar to Jaccard index, but this time the similarity is computed as 2 * |V1
inter V2| / (|V1| + |V2|). Distance is computed as 1 - cosine similarity.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionSorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index.SorensenDice
(int k) Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. -
Method Summary
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
Constructor Details
-
SorensenDice
public SorensenDice(int k) Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality.- Parameters:
k
-
-
SorensenDice
public SorensenDice()Sorensen-Dice coefficient, aka Sørensen index, Dice's coefficient or Czekanowski's binary (non-quantitative) index. The strings are first converted to boolean sets of k-shingles (sequences of k characters), then the similarity is computed as 2 * |A inter B| / (|A| + |B|). Attention: Sorensen-Dice distance (and similarity) does not satisfy triangle inequality. Default k is 3.
-
-
Method Details
-
similarity
Similarity is computed as 2 * |A inter B| / (|A| + |B|).- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The computed Sorensen-Dice similarity.
- Throws:
NullPointerException
- if s1 or s2 is null.
-
distance
Returns 1 - similarity.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1.0 - the computed similarity
- Throws:
NullPointerException
- if s1 or s2 is null.
-