Package edu.berkeley.nlp.lm.io
Class MakeKneserNeyArpaFromText
- java.lang.Object
-
- edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText
-
public class MakeKneserNeyArpaFromText extends java.lang.Object
Estimates a Kneser-Ney language model from raw text, and writes the language model out in ARPA-format. This is meant to closely resemble the functionality of SRILM'sngram-count -text <text file> -ukndiscount -lm <outputfile>)
, with two main exceptions:
(a) rather than calculating the discount for each n-gram order from counts, we use a constant discount of 0.75 for all orders
(b) Count thresholding is currently not implemented (SRILM by default thresholds counts for n-grams with n > 3).Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary. If no input files or given (or "-" is specified), lines will be read from standard input.
- Author:
- adampauls
-
-
Constructor Summary
Constructors Constructor Description MakeKneserNeyArpaFromText()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
main(java.lang.String[] argv)
-