Class NGram

  • All Implemented Interfaces:
    NormalizedStringDistance, StringDistance, java.io.Serializable

    @Immutable
    public class NGram
    extends java.lang.Object
    implements NormalizedStringDistance
    N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance", String Processing and Information Retrieval, Lecture Notes in Computer Science Volume 3772, 2005, pp 115-126. The algorithm uses affixing with special character '\n' to increase the weight of first characters. The normalization is achieved by dividing the total similarity score the original length of the longest word. http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int DEFAULT_N  
      private int n  
    • Constructor Summary

      Constructors 
      Constructor Description
      NGram()
      Instantiate with default value for n-gram length (2).
      NGram​(int n)
      Instantiate with given value for n-gram length.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      double distance​(java.lang.String s0, java.lang.String s1)
      Compute n-gram distance.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • NGram

        public NGram​(int n)
        Instantiate with given value for n-gram length.
        Parameters:
        n -
      • NGram

        public NGram()
        Instantiate with default value for n-gram length (2).
    • Method Detail

      • distance

        public final double distance​(java.lang.String s0,
                                     java.lang.String s1)
        Compute n-gram distance.
        Specified by:
        distance in interface StringDistance
        Parameters:
        s0 - The first string to compare.
        s1 - The second string to compare.
        Returns:
        The computed n-gram distance in the range [0, 1]
        Throws:
        java.lang.NullPointerException - if s0 or s1 is null.