Class Cosine
java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.Cosine
- All Implemented Interfaces:
NormalizedStringDistance
,NormalizedStringSimilarity
,StringDistance
,StringSimilarity
,Serializable
@Immutable
public class Cosine
extends ShingleBased
implements NormalizedStringDistance, NormalizedStringSimilarity
The similarity between the two strings is the cosine of the angle between
these two vectors representation. It is computed as V1 . V2 / (|V1| * |V2|)
The cosine distance is computed as 1 - cosine similarity.
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionfinal double
Return 1.0 - similarity.private static double
private static double
Compute the norm L2 : sqrt(Sum_i( v_i²)).final double
similarity
(String s1, String s2) Compute the cosine similarity between strings.final double
Compute similarity between precomputed profiles.Methods inherited from class info.debatty.java.stringsimilarity.ShingleBased
getK, getProfile
-
Constructor Details
-
Cosine
public Cosine(int k) Implements Cosine Similarity between strings. The strings are first transformed in vectors of occurrences of k-shingles (sequences of k characters). In this n-dimensional space, the similarity between the two strings is the cosine of their respective vectors.- Parameters:
k
-
-
Cosine
public Cosine()Implements Cosine Similarity between strings. The strings are first transformed in vectors of occurrences of k-shingles (sequences of k characters). In this n-dimensional space, the similarity between the two strings is the cosine of their respective vectors. Default k is 3.
-
-
Method Details
-
similarity
Compute the cosine similarity between strings.- Specified by:
similarity
in interfaceStringSimilarity
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- The cosine similarity in the range [0, 1]
- Throws:
NullPointerException
- if s1 or s2 is null.
-
norm
Compute the norm L2 : sqrt(Sum_i( v_i²)).- Parameters:
profile
-- Returns:
- L2 norm
-
dotProduct
-
distance
Return 1.0 - similarity.- Specified by:
distance
in interfaceStringDistance
- Parameters:
s1
- The first string to compare.s2
- The second string to compare.- Returns:
- 1.0 - the cosine similarity in the range [0, 1]
- Throws:
NullPointerException
- if s1 or s2 is null.
-
similarity
Compute similarity between precomputed profiles.- Parameters:
profile1
-profile2
-- Returns:
-