Package org.languagetool.dev
Class HomophoneOccurrenceDumper
- java.lang.Object
-
- org.languagetool.languagemodel.BaseLanguageModel
-
- org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
- org.languagetool.dev.HomophoneOccurrenceDumper
-
- All Implemented Interfaces:
java.lang.AutoCloseable
,org.languagetool.languagemodel.LanguageModel
class HomophoneOccurrenceDumper extends org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
Dump the occurrences of homophone 3grams to STDOUT. Useful to have a more compact file with homophone occurrences, as searching the homophones and their contexts in the Lucene index requires iterating all terms and is thus slow.- Since:
- 2.8
-
-
Field Summary
Fields Modifier and Type Field Description private static int
MIN_COUNT
-
Constructor Summary
Constructors Constructor Description HomophoneOccurrenceDumper(java.io.File topIndexDir)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
dumpOccurrences(java.util.Set<java.lang.String> tokens)
(package private) java.util.Map<java.lang.String,java.lang.Long>
getContext(java.lang.String... tokens)
Get the context (left and right words) for the given word(s).private org.apache.lucene.index.TermsEnum
getIterator()
long
getTotalTokenCount()
static void
main(java.lang.String[] args)
private void
run(java.lang.String confusionSetPath)
-
Methods inherited from class org.languagetool.languagemodel.LuceneSingleIndexLanguageModel
clearCaches, close, doValidateDirectory, getCount, getCount, getLuceneSearcher, toString, validateDirectory
-
-
-
-
Field Detail
-
MIN_COUNT
private static final int MIN_COUNT
- See Also:
- Constant Field Values
-
-
Method Detail
-
getContext
java.util.Map<java.lang.String,java.lang.Long> getContext(java.lang.String... tokens) throws java.io.IOException
Get the context (left and right words) for the given word(s). This is slow, as it needs to scan the whole index.- Throws:
java.io.IOException
-
run
private void run(java.lang.String confusionSetPath) throws java.io.IOException
- Throws:
java.io.IOException
-
dumpOccurrences
private void dumpOccurrences(java.util.Set<java.lang.String> tokens) throws java.io.IOException
- Throws:
java.io.IOException
-
getIterator
private org.apache.lucene.index.TermsEnum getIterator() throws java.io.IOException
- Throws:
java.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.io.IOException
- Throws:
java.io.IOException
-
getTotalTokenCount
public long getTotalTokenCount()
- Overrides:
getTotalTokenCount
in classorg.languagetool.languagemodel.LuceneSingleIndexLanguageModel
-
-