Class NGram

java.lang.Object
info.debatty.java.stringsimilarity.NGram
All Implemented Interfaces:
NormalizedStringDistance, StringDistance, Serializable

@Immutable public class NGram extends Object implements NormalizedStringDistance
N-Gram Similarity as defined by Kondrak, "N-Gram Similarity and Distance", String Processing and Information Retrieval, Lecture Notes in Computer Science Volume 3772, 2005, pp 115-126. The algorithm uses affixing with special character '\n' to increase the weight of first characters. The normalization is achieved by dividing the total similarity score the original length of the longest word. http://webdocs.cs.ualberta.ca/~kondrak/papers/spire05.pdf
See Also:
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final int
     
    private final int
     
  • Constructor Summary

    Constructors
    Constructor
    Description
    Instantiate with default value for n-gram length (2).
    NGram(int n)
    Instantiate with given value for n-gram length.
  • Method Summary

    Modifier and Type
    Method
    Description
    final double
    Compute n-gram distance.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

    • NGram

      public NGram(int n)
      Instantiate with given value for n-gram length.
      Parameters:
      n -
    • NGram

      public NGram()
      Instantiate with default value for n-gram length (2).
  • Method Details

    • distance

      public final double distance(String s0, String s1)
      Compute n-gram distance.
      Specified by:
      distance in interface StringDistance
      Parameters:
      s0 - The first string to compare.
      s1 - The second string to compare.
      Returns:
      The computed n-gram distance in the range [0, 1]
      Throws:
      NullPointerException - if s0 or s1 is null.