Class HomophoneOccurrenceDumper

  • All Implemented Interfaces:
    java.lang.AutoCloseable, org.languagetool.languagemodel.LanguageModel

    class HomophoneOccurrenceDumper
    extends org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
    Dump the occurrences of homophone 3grams to STDOUT. Useful to have a more compact file with homophone occurrences, as searching the homophones and their contexts in the Lucene index requires iterating all terms and is thus slow.
    Since:
    2.8
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel

        org.languagetool.languagemodel.LuceneSingleIndexLanguageModel.LuceneSearcher
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int MIN_COUNT  
      • Fields inherited from interface org.languagetool.languagemodel.LanguageModel

        GOOGLE_SENTENCE_END, GOOGLE_SENTENCE_START
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void dumpOccurrences​(java.util.Set<java.lang.String> tokens)  
      (package private) java.util.Map<java.lang.String,​java.lang.Long> getContext​(java.lang.String... tokens)
      Get the context (left and right words) for the given word(s).
      private org.apache.lucene.index.TermsEnum getIterator()  
      long getTotalTokenCount()  
      static void main​(java.lang.String[] args)  
      private void run​(java.lang.String confusionSetPath)  
      • Methods inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel

        clearCaches, close, doValidateDirectory, getCount, getCount, getLuceneSearcher, toString, validateDirectory
      • Methods inherited from class org.languagetool.languagemodel.BaseLanguageModel

        getPseudoProbability, getPseudoProbabilityStupidBackoff
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • HomophoneOccurrenceDumper

        HomophoneOccurrenceDumper​(java.io.File topIndexDir)
                           throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • getContext

        java.util.Map<java.lang.String,​java.lang.Long> getContext​(java.lang.String... tokens)
                                                                 throws java.io.IOException
        Get the context (left and right words) for the given word(s). This is slow, as it needs to scan the whole index.
        Throws:
        java.io.IOException
      • run

        private void run​(java.lang.String confusionSetPath)
                  throws java.io.IOException
        Throws:
        java.io.IOException
      • dumpOccurrences

        private void dumpOccurrences​(java.util.Set<java.lang.String> tokens)
                              throws java.io.IOException
        Throws:
        java.io.IOException
      • getIterator

        private org.apache.lucene.index.TermsEnum getIterator()
                                                       throws java.io.IOException
        Throws:
        java.io.IOException
      • main

        public static void main​(java.lang.String[] args)
                         throws java.io.IOException
        Throws:
        java.io.IOException
      • getTotalTokenCount

        public long getTotalTokenCount()
        Overrides:
        getTotalTokenCount in class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel