Class MakeKneserNeyArpaFromText


  • public class MakeKneserNeyArpaFromText
    extends java.lang.Object
    Estimates a Kneser-Ney language model from raw text, and writes the language model out in ARPA-format. This is meant to closely resemble the functionality of SRILM's ngram-count -text <text file> -ukndiscount -lm <outputfile>) , with two main exceptions:
    (a) rather than calculating the discount for each n-gram order from counts, we use a constant discount of 0.75 for all orders
    (b) Count thresholding is currently not implemented (SRILM by default thresholds counts for n-grams with n > 3).

    Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary. If no input files or given (or "-" is specified), lines will be read from standard input.

    Author:
    adampauls
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void main​(java.lang.String[] argv)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MakeKneserNeyArpaFromText

        public MakeKneserNeyArpaFromText()
    • Method Detail

      • main

        public static void main​(java.lang.String[] argv)