Package org.languagetool.rules.ngrams
Class NgramProbabilityRule
- java.lang.Object
-
- org.languagetool.rules.Rule
-
- org.languagetool.rules.ngrams.NgramProbabilityRule
-
@Experimental public class NgramProbabilityRule extends Rule
LanguageTool's probability check that uses ngram lookups to decide if an ngram of the input text is so rare in our ngram index that it should be considered an error. Also see http://wiki.languagetool.org/finding-errors-using-n-gram-data.- Since:
- 3.2
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static class
NgramProbabilityRule.AdvancedReplacement
(package private) class
NgramProbabilityRule.Alternative
(package private) class
NgramProbabilityRule.Alternatives
(package private) static class
NgramProbabilityRule.Replacement
-
Field Summary
Fields Modifier and Type Field Description private static java.util.List<NgramProbabilityRule.AdvancedReplacement>
ADV_REPLACEMENTS
private static boolean
DEBUG
private Language
language
private LanguageModel
lm
private double
minProbability
private static java.util.List<NgramProbabilityRule.Replacement>
REPLACEMENTS
static java.lang.String
RULE_ID
-
Constructor Summary
Constructors Constructor Description NgramProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
acceptMatch(RuleMatch match, Probability p, AnalyzedSentence sentence)
Overwrite this method to discard matches by returningfalse
.private void
debug(java.lang.String message, java.lang.Object... vars)
private NgramProbabilityRule.Alternatives
getBetterAlternatives(GoogleToken prevToken, java.lang.String token, GoogleToken next, GoogleToken googleToken, Probability p, AnalyzedSentence sentence)
private java.util.Optional<java.util.List<NgramProbabilityRule.Alternative>>
getBetterAlternatives(NgramProbabilityRule.Replacement replacement, GoogleToken prevToken, GoogleToken token, GoogleToken next, Probability p)
private java.util.Optional<AnalyzedToken>
getByPosTag(java.util.Set<AnalyzedToken> tokens, java.lang.String wantedPosTagRegex)
java.lang.String
getDescription()
A short description of the error this rule can detect, usually in the language of the text that is checked.protected Tokenizer
getGoogleStyleWordTokenizer()
java.lang.String
getId()
A string used to identify the rule in e.g.RuleMatch[]
match(AnalyzedSentence sentence)
Check whether the given sentence matches this error rule, i.e.void
setMinProbability(double minProbability)
-
Methods inherited from class org.languagetool.rules.Rule
addExamplePair, estimateContextForSureMatch, getAntiPatterns, getCategory, getConfigureText, getCorrectExamples, getDefaultValue, getErrorTriggeringExamples, getIncorrectExamples, getLocQualityIssueType, getMaxConfigurableValue, getMinConfigurableValue, getSentenceWithImmunization, getUrl, hasConfigurableValue, isDefaultOff, isDefaultTempOff, isDictionaryBasedSpellingRule, isOfficeDefaultOff, isOfficeDefaultOn, makeAntiPatterns, setCategory, setCorrectExamples, setDefaultOff, setDefaultOn, setDefaultTempOff, setErrorTriggeringExamples, setIncorrectExamples, setLocQualityIssueType, setOfficeDefaultOff, setOfficeDefaultOn, setUrl, supportsLanguage, toRuleMatchArray, useInOffice
-
-
-
-
Field Detail
-
RULE_ID
public static final java.lang.String RULE_ID
- Since:
- 3.2
- See Also:
- Constant Field Values
-
DEBUG
private static final boolean DEBUG
- See Also:
- Constant Field Values
-
REPLACEMENTS
private static final java.util.List<NgramProbabilityRule.Replacement> REPLACEMENTS
-
ADV_REPLACEMENTS
private static final java.util.List<NgramProbabilityRule.AdvancedReplacement> ADV_REPLACEMENTS
-
lm
private final LanguageModel lm
-
language
private final Language language
-
minProbability
private double minProbability
-
-
Constructor Detail
-
NgramProbabilityRule
public NgramProbabilityRule(java.util.ResourceBundle messages, LanguageModel languageModel, Language language)
-
-
Method Detail
-
getId
public java.lang.String getId()
Description copied from class:Rule
A string used to identify the rule in e.g. configuration files. This string is supposed to be unique and to stay the same in all upcoming versions of LanguageTool. It's supposed to contain only the charactersA-Z
and the underscore.
-
setMinProbability
@Experimental public void setMinProbability(double minProbability)
-
match
public RuleMatch[] match(AnalyzedSentence sentence) throws java.io.IOException
Description copied from class:Rule
Check whether the given sentence matches this error rule, i.e. whether it contains the error detected by this rule. Note that the order in which this method is called is not always guaranteed, i.e. the sentence order in the text may be different than the order in which you get the sentences (this may be the case when LanguageTool is used as a LibreOffice/OpenOffice add-on, for example).
-
acceptMatch
protected boolean acceptMatch(RuleMatch match, Probability p, AnalyzedSentence sentence)
Overwrite this method to discard matches by returningfalse
.- Since:
- 3.3
-
getBetterAlternatives
private NgramProbabilityRule.Alternatives getBetterAlternatives(GoogleToken prevToken, java.lang.String token, GoogleToken next, GoogleToken googleToken, Probability p, AnalyzedSentence sentence) throws java.io.IOException
- Throws:
java.io.IOException
-
getBetterAlternatives
private java.util.Optional<java.util.List<NgramProbabilityRule.Alternative>> getBetterAlternatives(NgramProbabilityRule.Replacement replacement, GoogleToken prevToken, GoogleToken token, GoogleToken next, Probability p) throws java.io.IOException
- Throws:
java.io.IOException
-
getByPosTag
private java.util.Optional<AnalyzedToken> getByPosTag(java.util.Set<AnalyzedToken> tokens, java.lang.String wantedPosTagRegex)
-
getDescription
public java.lang.String getDescription()
Description copied from class:Rule
A short description of the error this rule can detect, usually in the language of the text that is checked.- Specified by:
getDescription
in classRule
-
getGoogleStyleWordTokenizer
protected Tokenizer getGoogleStyleWordTokenizer()
-
debug
private void debug(java.lang.String message, java.lang.Object... vars)
-
-