Class Cosine

java.lang.Object
info.debatty.java.stringsimilarity.ShingleBased
info.debatty.java.stringsimilarity.Cosine
All Implemented Interfaces:
NormalizedStringDistance, NormalizedStringSimilarity, StringDistance, StringSimilarity, Serializable

@Immutable public class Cosine extends ShingleBased implements NormalizedStringDistance, NormalizedStringSimilarity
The similarity between the two strings is the cosine of the angle between these two vectors representation. It is computed as V1 . V2 / (|V1| * |V2|) The cosine distance is computed as 1 - cosine similarity.
See Also:
  • Constructor Details

    • Cosine

      public Cosine(int k)
      Implements Cosine Similarity between strings. The strings are first transformed in vectors of occurrences of k-shingles (sequences of k characters). In this n-dimensional space, the similarity between the two strings is the cosine of their respective vectors.
      Parameters:
      k -
    • Cosine

      public Cosine()
      Implements Cosine Similarity between strings. The strings are first transformed in vectors of occurrences of k-shingles (sequences of k characters). In this n-dimensional space, the similarity between the two strings is the cosine of their respective vectors. Default k is 3.
  • Method Details

    • similarity

      public final double similarity(String s1, String s2)
      Compute the cosine similarity between strings.
      Specified by:
      similarity in interface StringSimilarity
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      The cosine similarity in the range [0, 1]
      Throws:
      NullPointerException - if s1 or s2 is null.
    • norm

      private static double norm(Map<String,Integer> profile)
      Compute the norm L2 : sqrt(Sum_i( v_i²)).
      Parameters:
      profile -
      Returns:
      L2 norm
    • dotProduct

      private static double dotProduct(Map<String,Integer> profile1, Map<String,Integer> profile2)
    • distance

      public final double distance(String s1, String s2)
      Return 1.0 - similarity.
      Specified by:
      distance in interface StringDistance
      Parameters:
      s1 - The first string to compare.
      s2 - The second string to compare.
      Returns:
      1.0 - the cosine similarity in the range [0, 1]
      Throws:
      NullPointerException - if s1 or s2 is null.
    • similarity

      public final double similarity(Map<String,Integer> profile1, Map<String,Integer> profile2)
      Compute similarity between precomputed profiles.
      Parameters:
      profile1 -
      profile2 -
      Returns: