Class LuceneSingleIndexLanguageModel

  • All Implemented Interfaces:
    java.lang.AutoCloseable, LanguageModel

    public class LuceneSingleIndexLanguageModel
    extends BaseLanguageModel
    Information about ngram occurrences, taken from Lucene indexes (one index per ngram level). This is not a real language model as it only returns information about occurrence counts but has no probability calculation, especially not for the case with 0 occurrences.
    Since:
    3.2
    • Constructor Detail

      • LuceneSingleIndexLanguageModel

        public LuceneSingleIndexLanguageModel​(java.io.File topIndexDir)
        Parameters:
        topIndexDir - a directory which contains at least another sub directory called 3grams, which is a Lucene index with ngram occurrences as created by org.languagetool.dev.FrequencyIndexCreator.
      • LuceneSingleIndexLanguageModel

        @Experimental
        public LuceneSingleIndexLanguageModel​(int maxNgram)
    • Method Detail

      • validateDirectory

        public static void validateDirectory​(java.io.File topIndexDir)
        Throw RuntimeException is the given directory does not seem to be a valid ngram top directory with sub directories 1grams etc.
        Since:
        3.0
      • clearCaches

        @Experimental
        public static void clearCaches()
        Only used internally.
        Since:
        3.2
      • doValidateDirectory

        protected void doValidateDirectory​(java.io.File topIndexDir)
      • addIndex

        private void addIndex​(java.io.File topIndexDir,
                              int ngramSize)
      • getCount

        public long getCount​(java.util.List<java.lang.String> tokens)
        Description copied from class: BaseLanguageModel
        Get the occurrence count for the given token sequence.
        Specified by:
        getCount in class BaseLanguageModel
      • close

        public void close()
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object