Class Jaccard
java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.Jaccard
- All Implemented Interfaces:
MetricStringDistance
,NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,Serializable
@Immutable
public class Jaccard
extends ShingleBased
implements MetricStringDistance, NormalizedStringDistance, NormalizedStringSimilarity
Each input string is converted into a set of n-grams, the Jaccard index is
then computed as |V1 inter V2| / |V1 union V2|.
Like Q-Gram distance, the input strings are first converted into sets of
n-grams (sequences of n characters, also called k-shingles), but this time
the cardinality of each n-gram is not taken into account.
Distance is computed as 1 - cosine similarity.
Jaccard index is a metric distance.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionJaccard()
The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|.Jaccard
(int k) The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|. -
Method Summary
Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
Constructor Details
-
Jaccard
public Jaccard(int k) The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|. The default value of k is 3.- Parameters:
k
-
-
Jaccard
public Jaccard()The strings are first transformed into sets of k-shingles (sequences of k characters), then Jaccard index is computed as |A inter B| / |A union B|. The default value of k is 3.
-
-
Method Details
-
similarity
Compute Jaccard index: |A inter B| / |A union B|.- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The Jaccard index in the range [0, 1]
- Throws:
NullPointerException
- if s1 or s2 is null.
-
distance
Distance is computed as 1 - similarity.- Specified by:
distance
in interfaceMetricStringDistance
- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1 - the Jaccard similarity.
- Throws:
NullPointerException
- if s1 or s2 is null.
-