Package org.languagetool.rules.ngrams
Class ConfusionProbabilityRule
- java.lang.Object
-
- org.languagetool.rules.Rule
-
- org.languagetool.rules.ngrams.ConfusionProbabilityRule
-
public abstract class ConfusionProbabilityRule extends Rule
LanguageTool's homophone confusion check that uses ngram lookups to decide which word in a confusion set (fromconfusion_sets.txt
) suits best. Also see http://wiki.languagetool.org/finding-errors-using-n-gram-data.- Since:
- 2.7
-
-
Field Summary
Fields Modifier and Type Field Description private static com.google.common.cache.LoadingCache<java.lang.String,java.util.Map<java.lang.String,java.util.List<ConfusionPair>>>
confSetCache
private static boolean
DEBUG
private java.util.List<java.lang.String>
exceptions
private int
grams
private Language
language
private LanguageModel
lm
static float
MIN_COVERAGE
private static double
MIN_PROB
static java.lang.String
RULE_ID
private java.util.Map<java.lang.String,java.util.List<ConfusionPair>>
wordToPairs
-
Constructor Summary
Constructors Constructor Description ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language)
ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language, int grams)
ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language, int grams, java.util.List<java.lang.String> exceptions)
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description private boolean
covers(int exceptionStartPos, int exceptionEndPos, int startPos, int endPos)
private void
debug(java.lang.String message, java.lang.Object... vars)
int
estimateContextForSureMatch()
A number that estimates how many words there must be after a match before we can be (relatively) sure the match is valid.private ConfusionString
getAlternativeTerm(java.util.List<ConfusionString> confusionSet, GoogleToken token)
private @Nullable ConfusionString
getBetterAlternativeOrNull(GoogleToken token, java.util.List<GoogleToken> tokens, java.util.List<ConfusionString> confusionSet, long factor)
private ConfusionString
getBetterAlternativeOrNull(GoogleToken token, java.util.List<GoogleToken> tokens, ConfusionString otherWord, long factor)
private ConfusionString
getConfusionString(java.util.List<ConfusionString> confusionSet, GoogleToken token)
(package private) java.util.List<java.lang.String>
getContext(GoogleToken token, java.util.List<GoogleToken> tokens, java.lang.String newToken, int toLeft, int toRight)
java.lang.String
getDescription()
A short description of the error this rule can detect, usually in the language of the text that is checked.protected @NotNull java.util.List<java.lang.String>
getFilenames()
java.lang.String
getId()
A string used to identify the rule in e.g.protected java.lang.String
getMessage(ConfusionString textString, ConfusionString suggestion)
int
getNGrams()
Returns the ngram level used, typically 3.private java.util.List<java.lang.String>
getSuggestions(java.lang.String message)
protected boolean
isException(java.lang.String sentenceText)
Return true to prevent a match.private boolean
isLocalException(AnalyzedSentence sentence, GoogleToken googleToken)
RuleMatch[]
match(AnalyzedSentence sentence)
Check whether the given sentence matches this error rule, i.e.void
setConfusionPair(ConfusionPair pair)
Deprecated.used only for tests-
Methods inherited from class org.languagetool.rules.Rule
addExamplePair, getAntiPatterns, getCategory, getConfigureText, getCorrectExamples, getDefaultValue, getErrorTriggeringExamples, getIncorrectExamples, getLocQualityIssueType, getMaxConfigurableValue, getMinConfigurableValue, getSentenceWithImmunization, getUrl, hasConfigurableValue, isDefaultOff, isDefaultTempOff, isDictionaryBasedSpellingRule, isOfficeDefaultOff, isOfficeDefaultOn, makeAntiPatterns, setCategory, setCorrectExamples, setDefaultOff, setDefaultOn, setDefaultTempOff, setErrorTriggeringExamples, setIncorrectExamples, setLocQualityIssueType, setOfficeDefaultOff, setOfficeDefaultOn, setUrl, supportsLanguage, toRuleMatchArray, useInOffice
-
-
-
-
Field Detail
-
RULE_ID
public static final java.lang.String RULE_ID
- Since:
- 3.1
- See Also:
- Constant Field Values
-
MIN_COVERAGE
public static final float MIN_COVERAGE
- See Also:
- Constant Field Values
-
MIN_PROB
private static final double MIN_PROB
- See Also:
- Constant Field Values
-
DEBUG
private static final boolean DEBUG
- See Also:
- Constant Field Values
-
confSetCache
private static final com.google.common.cache.LoadingCache<java.lang.String,java.util.Map<java.lang.String,java.util.List<ConfusionPair>>> confSetCache
-
wordToPairs
private final java.util.Map<java.lang.String,java.util.List<ConfusionPair>> wordToPairs
-
lm
private final LanguageModel lm
-
grams
private final int grams
-
language
private final Language language
-
exceptions
private final java.util.List<java.lang.String> exceptions
-
-
Constructor Detail
-
ConfusionProbabilityRule
public ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language)
-
ConfusionProbabilityRule
public ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language, int grams)
-
ConfusionProbabilityRule
public ConfusionProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language, int grams, java.util.List<java.lang.String> exceptions)
- Since:
- 4.7
-
-
Method Detail
-
getFilenames
@NotNull protected @NotNull java.util.List<java.lang.String> getFilenames()
-
getId
public java.lang.String getId()
Description copied from class:Rule
A string used to identify the rule in e.g. configuration files. This string is supposed to be unique and to stay the same in all upcoming versions of LanguageTool. It's supposed to contain only the charactersA-Z
and the underscore.
-
estimateContextForSureMatch
public int estimateContextForSureMatch()
Description copied from class:Rule
A number that estimates how many words there must be after a match before we can be (relatively) sure the match is valid. This is useful for check-as-you-type, where a match might occur and the word that gets typed next makes the match disappear (something one would obviously like to avoid). Note: this may over-estimate the real context size. Returns-1
when the sentence needs to end to be sure there's a match.- Overrides:
estimateContextForSureMatch
in classRule
-
match
public RuleMatch[] match(AnalyzedSentence sentence)
Description copied from class:Rule
Check whether the given sentence matches this error rule, i.e. whether it contains the error detected by this rule. Note that the order in which this method is called is not always guaranteed, i.e. the sentence order in the text may be different than the order in which you get the sentences (this may be the case when LanguageTool is used as a LibreOffice/OpenOffice add-on, for example).
-
isLocalException
private boolean isLocalException(AnalyzedSentence sentence, GoogleToken googleToken)
-
covers
private boolean covers(int exceptionStartPos, int exceptionEndPos, int startPos, int endPos)
-
getSuggestions
private java.util.List<java.lang.String> getSuggestions(java.lang.String message)
-
isException
protected boolean isException(java.lang.String sentenceText)
Return true to prevent a match.
-
getDescription
public java.lang.String getDescription()
Description copied from class:Rule
A short description of the error this rule can detect, usually in the language of the text that is checked.- Specified by:
getDescription
in classRule
-
getMessage
protected java.lang.String getMessage(ConfusionString textString, ConfusionString suggestion)
-
setConfusionPair
public void setConfusionPair(ConfusionPair pair)
Deprecated.used only for tests
-
getNGrams
public int getNGrams()
Returns the ngram level used, typically 3.- Since:
- 3.1
-
getBetterAlternativeOrNull
@Nullable private @Nullable ConfusionString getBetterAlternativeOrNull(GoogleToken token, java.util.List<GoogleToken> tokens, java.util.List<ConfusionString> confusionSet, long factor)
-
getAlternativeTerm
private ConfusionString getAlternativeTerm(java.util.List<ConfusionString> confusionSet, GoogleToken token)
-
getConfusionString
private ConfusionString getConfusionString(java.util.List<ConfusionString> confusionSet, GoogleToken token)
-
getBetterAlternativeOrNull
private ConfusionString getBetterAlternativeOrNull(GoogleToken token, java.util.List<GoogleToken> tokens, ConfusionString otherWord, long factor)
-
getContext
java.util.List<java.lang.String> getContext(GoogleToken token, java.util.List<GoogleToken> tokens, java.lang.String newToken, int toLeft, int toRight)
-
debug
private void debug(java.lang.String message, java.lang.Object... vars)
-
-