Package org.languagetool.rules.patterns
Class PatternToken
- java.lang.Object
-
- org.languagetool.rules.patterns.PatternToken
-
- All Implemented Interfaces:
java.lang.Cloneable
public class PatternToken extends java.lang.Object implements java.lang.Cloneable
A part of a pattern, represents the 'token' element of thegrammar.xml
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PatternToken.PosToken
-
Field Summary
Fields Modifier and Type Field Description private java.util.List<PatternToken>
andGroupList
private static java.lang.String
CASE_INSENSITIVE
Parameter passed to regular expression matcher to enable case insensitive Unicode matching.private boolean
caseSensitive
private ChunkTag
chunkTag
private java.util.List<PatternToken>
exceptionList
List of exceptions that are valid for the current token and / or some next tokens.private boolean
exceptionSet
True if any exception with a scope=="current" or scope=="next" is set for the element.private boolean
exceptionValidNext
True if scope=="next".private boolean
exceptionValidPrevious
True if attribute scope=="previous".private boolean
inflected
private boolean
isInsideMarker
private boolean
isLastUnified
Set to true on tokens that close the unification block.private int
maxOccurrence
private int
minOccurrence
private boolean
negation
private java.util.List<PatternToken>
orGroupList
private java.util.regex.Pattern
pattern
private java.lang.String
phraseName
String ID of the phrase the element is in.private PatternToken.PosToken
posToken
private java.util.List<PatternToken>
previousExceptionList
List of exceptions that are valid for a previous token.private java.lang.String
referenceString
True when the element stores a formatted reference to another element of the pattern.private int
skip
private boolean
stringRegExp
private java.lang.String
stringToken
private boolean
testString
This var is used to determine if callingsetStringElement(java.lang.String)
makes sense.private boolean
testWhitespace
private Match
tokenReference
The reference to another element in the pattern.private java.util.Map<java.lang.String,java.util.List<java.lang.String>>
unificationFeatures
private boolean
unificationNeutral
Determines whether the element should be ignored when doing unificationprivate boolean
uniNegation
static java.lang.String
UNKNOWN_TAG
Matches only tokens without any POS tag.private boolean
whitespaceBefore
-
Constructor Summary
Constructors Constructor Description PatternToken(java.lang.String token, boolean caseSensitive, boolean regExp, boolean inflected)
Creates Element that is used to match tokens in the text.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Object
clone()
PatternToken
compile(AnalyzedTokenReadings token, Synthesizer synth)
Prepare PatternToken for matching by formatting its string token and POS (if the Element is supposed to refer to some other token).private void
doCompile(AnalyzedTokenReadings token, Synthesizer synth)
java.util.List<PatternToken>
getAndGroup()
Returns the group of elements linked with AND operator.@Nullable ChunkTag
getChunkTag()
java.util.List<PatternToken>
getExceptionList()
Match
getMatch()
int
getMaxOccurrence()
The maximum number of times the element may occur.int
getMinOccurrence()
The minimum number of times the element needs to occur.boolean
getNegation()
java.util.List<PatternToken>
getOrGroup()
Returns the group of elements linked with OR operator.@Nullable java.lang.String
getPhraseName()
Gets the phrase the element is in.boolean
getPOSNegation()
@Nullable java.lang.String
getPOStag()
java.util.List<PatternToken>
getPreviousExceptionList()
int
getSkipNext()
Gets the exception scope length.@Nullable java.lang.String
getString()
private java.lang.String
getTestToken(AnalyzedToken token)
@Nullable java.util.Map<java.lang.String,java.util.List<java.lang.String>>
getUniFeatures()
Get unification features and types.boolean
hasAndGroup()
Checks if this element has an AND group associated with it.boolean
hasExceptionList()
boolean
hasNextException()
Checks if the element has an exception for a next scope.boolean
hasOrGroup()
Checks if this element has an OR group associated with it.boolean
hasPreviousException()
Checks if the element has an exception for a previous token.boolean
isAndExceptionGroupMatched(AnalyzedToken token)
Enables testing multiple conditions specified by multiple element exceptions.boolean
isCaseSensitive()
Whether the element matches case sensitively.boolean
isExceptionMatched(AnalyzedToken token)
Checks whether an exception matches.boolean
isExceptionMatchedCompletely(AnalyzedToken token)
This method checks exceptions both in AND-group and the token.boolean
isInflected()
boolean
isInsideMarker()
boolean
isLastInUnification()
boolean
isMatched(AnalyzedToken token)
Checks whether the rule element matches the token given as a parameter.boolean
isMatchedByPreviousException(AnalyzedToken token)
Checks whether an exception for a previous token matches (in case the exception had scope == "previous").boolean
isMatchedByPreviousException(AnalyzedTokenReadings prevToken)
Checks whether an exception for a previous token matches all readings of a given token (in case the exception had scope == "previous").boolean
isMatchedByScopeNextException(AnalyzedToken token)
Checks whether a previously set exception matches (in case the exception had scope == "next").boolean
isPartOfPhrase()
Checks if the Element is in any phrase.boolean
isPOStagRegularExpression()
Tests whether the POS matches a regular expression.private boolean
isPosTokenMatched(AnalyzedToken token)
Tests if part of speech matches a given string.boolean
isReferenceElement()
boolean
isRegularExpression()
Tests whether the element matches a regular expression.boolean
isSentenceStart()
Checks if the token is a sentence start.private boolean
isStringTokenMatched(AnalyzedToken token)
Tests whether the string token element matches a given token.boolean
isUnificationNeutral()
Determines whether the element should be silently ignored during unification, and simply added.boolean
isUnified()
boolean
isUniNegated()
boolean
isWhitespaceBefore(AnalyzedToken token)
void
setAndGroupElement(PatternToken andToken)
void
setChunkTag(ChunkTag chunkTag)
private void
setException(PatternToken pToken, boolean scopePrevious)
void
setExceptionSpaceBefore(boolean isWhite)
Sets the attribute on the exception that determines matching of patterns that depends on whether there was a space before the token matching the exception or not.void
setInsideMarker(boolean isInsideMarker)
void
setLastInUnification()
void
setMatch(Match match)
Sets the reference to another token.void
setMaxOccurrence(int i)
The maximum number of times this element may occur.void
setMinOccurrence(int i)
The minimum number of times this element may occur.void
setNegation(boolean negation)
Negates the matching so that non-matching elements match and vice-versa.void
setOrGroupElement(PatternToken orToken)
void
setPhraseName(java.lang.String id)
Sets the phrase the element is in.void
setPosToken(PatternToken.PosToken posToken)
void
setSkipNext(int i)
void
setStringElement(java.lang.String token)
void
setStringPosException(java.lang.String token, boolean regExp, boolean inflected, boolean negation, boolean scopeNext, boolean scopePrevious, java.lang.String posToken, boolean posRegExp, boolean posNegation, java.lang.Boolean caseSensitivity)
Sets a string and/or pos exception for matching tokens.void
setUnification(java.util.Map<java.lang.String,java.util.List<java.lang.String>> uniFeatures)
void
setUnificationNeutral()
Sets the element as ignored during unification.void
setUniNegation()
void
setWhitespaceBefore(boolean isWhite)
java.lang.String
toString()
-
-
-
Field Detail
-
UNKNOWN_TAG
public static final java.lang.String UNKNOWN_TAG
Matches only tokens without any POS tag.- See Also:
- Constant Field Values
-
CASE_INSENSITIVE
private static final java.lang.String CASE_INSENSITIVE
Parameter passed to regular expression matcher to enable case insensitive Unicode matching.- See Also:
- Constant Field Values
-
caseSensitive
private final boolean caseSensitive
-
stringRegExp
private final boolean stringRegExp
-
andGroupList
private final java.util.List<PatternToken> andGroupList
-
orGroupList
private final java.util.List<PatternToken> orGroupList
-
inflected
private final boolean inflected
-
stringToken
private java.lang.String stringToken
-
posToken
private PatternToken.PosToken posToken
-
chunkTag
private ChunkTag chunkTag
-
negation
private boolean negation
-
testWhitespace
private boolean testWhitespace
-
whitespaceBefore
private boolean whitespaceBefore
-
isInsideMarker
private boolean isInsideMarker
-
exceptionList
private java.util.List<PatternToken> exceptionList
List of exceptions that are valid for the current token and / or some next tokens.
-
exceptionValidNext
private boolean exceptionValidNext
True if scope=="next".
-
exceptionSet
private boolean exceptionSet
True if any exception with a scope=="current" or scope=="next" is set for the element.
-
exceptionValidPrevious
private boolean exceptionValidPrevious
True if attribute scope=="previous".
-
previousExceptionList
private java.util.List<PatternToken> previousExceptionList
List of exceptions that are valid for a previous token.
-
skip
private int skip
-
minOccurrence
private int minOccurrence
-
maxOccurrence
private int maxOccurrence
-
pattern
private java.util.regex.Pattern pattern
-
tokenReference
private Match tokenReference
The reference to another element in the pattern.
-
referenceString
private java.lang.String referenceString
True when the element stores a formatted reference to another element of the pattern.
-
phraseName
private java.lang.String phraseName
String ID of the phrase the element is in.
-
testString
private boolean testString
This var is used to determine if callingsetStringElement(java.lang.String)
makes sense. This method takes most time so it's best to reduce the number of its calls.
-
unificationNeutral
private boolean unificationNeutral
Determines whether the element should be ignored when doing unification
-
uniNegation
private boolean uniNegation
-
unificationFeatures
private java.util.Map<java.lang.String,java.util.List<java.lang.String>> unificationFeatures
-
isLastUnified
private boolean isLastUnified
Set to true on tokens that close the unification block.
-
-
Constructor Detail
-
PatternToken
public PatternToken(java.lang.String token, boolean caseSensitive, boolean regExp, boolean inflected)
Creates Element that is used to match tokens in the text.- Parameters:
token
- String to be matchedcaseSensitive
- true if the check is case-sensitiveregExp
- true if the check uses regular expressionsinflected
- true if the check refers to base forms (lemmas), note thattoken
must be a base form for this to work
-
-
Method Detail
-
clone
public java.lang.Object clone() throws java.lang.CloneNotSupportedException
- Overrides:
clone
in classjava.lang.Object
- Throws:
java.lang.CloneNotSupportedException
-
isMatched
public boolean isMatched(AnalyzedToken token)
Checks whether the rule element matches the token given as a parameter.- Parameters:
token
- AnalyzedToken to check matching against- Returns:
- True if token matches, false otherwise.
-
isExceptionMatched
public boolean isExceptionMatched(AnalyzedToken token)
Checks whether an exception matches.- Parameters:
token
- AnalyzedToken to check matching against- Returns:
- True if any of the exceptions matches (logical disjunction).
-
isAndExceptionGroupMatched
public boolean isAndExceptionGroupMatched(AnalyzedToken token)
Enables testing multiple conditions specified by multiple element exceptions. Works as logical AND operator.- Parameters:
token
- the token checked for exceptions.- Returns:
- true if all conditions are met, false otherwise.
-
isExceptionMatchedCompletely
public boolean isExceptionMatchedCompletely(AnalyzedToken token)
This method checks exceptions both in AND-group and the token. Introduced to for clarity.- Parameters:
token
- Token to match- Returns:
- True if matched.
-
setAndGroupElement
public void setAndGroupElement(PatternToken andToken)
-
hasAndGroup
public boolean hasAndGroup()
Checks if this element has an AND group associated with it.- Returns:
- true if the element has a group of elements that all should match.
-
getAndGroup
public java.util.List<PatternToken> getAndGroup()
Returns the group of elements linked with AND operator.
-
setOrGroupElement
public void setOrGroupElement(PatternToken orToken)
- Since:
- 2.3
-
hasOrGroup
public boolean hasOrGroup()
Checks if this element has an OR group associated with it.- Returns:
- true if the element has a group of elements that all should match.
- Since:
- 2.3
-
getOrGroup
public java.util.List<PatternToken> getOrGroup()
Returns the group of elements linked with OR operator.- Since:
- 2.3
-
isMatchedByScopeNextException
public boolean isMatchedByScopeNextException(AnalyzedToken token)
Checks whether a previously set exception matches (in case the exception had scope == "next").- Parameters:
token
-AnalyzedToken
to check matching against.- Returns:
- True if any of the exceptions matches.
-
isMatchedByPreviousException
public boolean isMatchedByPreviousException(AnalyzedToken token)
Checks whether an exception for a previous token matches (in case the exception had scope == "previous").- Parameters:
token
-AnalyzedToken
to check matching against.- Returns:
- True if any of the exceptions matches.
-
isMatchedByPreviousException
public boolean isMatchedByPreviousException(AnalyzedTokenReadings prevToken)
Checks whether an exception for a previous token matches all readings of a given token (in case the exception had scope == "previous").- Parameters:
prevToken
-AnalyzedTokenReadings
to check matching against.- Returns:
- true if any of the exceptions matches.
-
isSentenceStart
public boolean isSentenceStart()
Checks if the token is a sentence start.- Returns:
- True if the element starts the sentence and the element hasn't been set to have negated POS token.
-
setPosToken
public void setPosToken(PatternToken.PosToken posToken)
- Since:
- 2.9
-
setChunkTag
public void setChunkTag(ChunkTag chunkTag)
- Since:
- 2.9
-
getString
@Nullable public @Nullable java.lang.String getString()
-
setStringElement
public void setStringElement(java.lang.String token)
-
setStringPosException
public void setStringPosException(java.lang.String token, boolean regExp, boolean inflected, boolean negation, boolean scopeNext, boolean scopePrevious, java.lang.String posToken, boolean posRegExp, boolean posNegation, java.lang.Boolean caseSensitivity)
Sets a string and/or pos exception for matching tokens.- Parameters:
token
- The string in the exception.regExp
- True if the string is specified as a regular expression.inflected
- True if the string is a base form (lemma).negation
- True if the exception is negated.scopeNext
- True if the exception scope is next tokens.scopePrevious
- True if the exception should match only a single previous token.posToken
- The part of the speech tag in the exception.posRegExp
- True if the POS is specified as a regular expression.posNegation
- True if the POS exception is negated.caseSensitivity
- if null, use this element's setting for case sensitivity, otherwise the specified value- Since:
- 2.9
-
setException
private void setException(PatternToken pToken, boolean scopePrevious)
-
isPosTokenMatched
private boolean isPosTokenMatched(AnalyzedToken token)
Tests if part of speech matches a given string. Special value UNKNOWN_TAG matches null POS tags.- Parameters:
token
- Token to test.- Returns:
- true if matches
-
isStringTokenMatched
private boolean isStringTokenMatched(AnalyzedToken token)
Tests whether the string token element matches a given token.- Parameters:
token
-AnalyzedToken
to match against.- Returns:
- True if matches.
-
getTestToken
private java.lang.String getTestToken(AnalyzedToken token)
-
getSkipNext
public int getSkipNext()
Gets the exception scope length.- Returns:
- scope length in tokens
-
getMinOccurrence
public int getMinOccurrence()
The minimum number of times the element needs to occur.
-
getMaxOccurrence
public int getMaxOccurrence()
The maximum number of times the element may occur.
-
setSkipNext
public void setSkipNext(int i)
- Parameters:
i
- exception scope length.
-
setMinOccurrence
public void setMinOccurrence(int i)
The minimum number of times this element may occur.- Parameters:
i
- currently only0
and1
are supported
-
setMaxOccurrence
public void setMaxOccurrence(int i)
The maximum number of times this element may occur.- Parameters:
i
- a number >= 1 or-1
for unlimited occurrences
-
hasPreviousException
public boolean hasPreviousException()
Checks if the element has an exception for a previous token.- Returns:
- True if the element has a previous token matching exception.
-
hasNextException
public boolean hasNextException()
Checks if the element has an exception for a next scope. (only used for testing)- Returns:
- True if the element has exception for the next scope.
-
setNegation
public void setNegation(boolean negation)
Negates the matching so that non-matching elements match and vice-versa.
-
getNegation
public boolean getNegation()
- Since:
- 0.9.3
-
isReferenceElement
public boolean isReferenceElement()
- Returns:
- true when this element refers to another token.
-
setMatch
public void setMatch(Match match)
Sets the reference to another token.- Parameters:
match
- Formatting object for the token reference.
-
getMatch
public Match getMatch()
-
compile
public PatternToken compile(AnalyzedTokenReadings token, Synthesizer synth) throws java.io.IOException
Prepare PatternToken for matching by formatting its string token and POS (if the Element is supposed to refer to some other token).- Parameters:
token
- the token specified asAnalyzedTokenReadings
synth
- the language synthesizer (Synthesizer
)- Throws:
java.io.IOException
-
doCompile
private void doCompile(AnalyzedTokenReadings token, Synthesizer synth) throws java.io.IOException
- Throws:
java.io.IOException
-
setPhraseName
public void setPhraseName(java.lang.String id)
Sets the phrase the element is in.- Parameters:
id
- ID of the phrase.
-
isPartOfPhrase
public boolean isPartOfPhrase()
Checks if the Element is in any phrase.- Returns:
- True if the Element is contained in the phrase.
-
isCaseSensitive
public boolean isCaseSensitive()
Whether the element matches case sensitively.- Since:
- 2.3
-
isRegularExpression
public boolean isRegularExpression()
Tests whether the element matches a regular expression.- Since:
- 0.9.6
-
isPOStagRegularExpression
public boolean isPOStagRegularExpression()
Tests whether the POS matches a regular expression.- Since:
- 1.3.0
-
getPOStag
@Nullable public @Nullable java.lang.String getPOStag()
- Returns:
- the POS of the Element or
null
- Since:
- 0.9.6
-
getChunkTag
@Nullable public @Nullable ChunkTag getChunkTag()
- Returns:
- the chunk tag of the Element or
null
- Since:
- 2.3
-
getPOSNegation
public boolean getPOSNegation()
- Returns:
- true if the POS is negated.
-
isInflected
public boolean isInflected()
- Returns:
- true if the token matches all inflected forms
-
getPhraseName
@Nullable public @Nullable java.lang.String getPhraseName()
Gets the phrase the element is in.- Returns:
- String The name of the phrase.
-
isUnified
public boolean isUnified()
-
setUnification
public void setUnification(java.util.Map<java.lang.String,java.util.List<java.lang.String>> uniFeatures)
-
getUniFeatures
@Nullable public @Nullable java.util.Map<java.lang.String,java.util.List<java.lang.String>> getUniFeatures()
Get unification features and types.- Returns:
- A map from features to a list of types or
null
- Since:
- 1.0.1
-
setUniNegation
public void setUniNegation()
-
isUniNegated
public boolean isUniNegated()
-
isLastInUnification
public boolean isLastInUnification()
-
setLastInUnification
public void setLastInUnification()
-
isUnificationNeutral
public boolean isUnificationNeutral()
Determines whether the element should be silently ignored during unification, and simply added.- Returns:
- True when the element is not included in unifying.
- Since:
- 2.5
-
setUnificationNeutral
public void setUnificationNeutral()
Sets the element as ignored during unification.- Since:
- 2.5
-
setWhitespaceBefore
public void setWhitespaceBefore(boolean isWhite)
-
isInsideMarker
public boolean isInsideMarker()
-
setInsideMarker
public void setInsideMarker(boolean isInsideMarker)
-
setExceptionSpaceBefore
public void setExceptionSpaceBefore(boolean isWhite)
Sets the attribute on the exception that determines matching of patterns that depends on whether there was a space before the token matching the exception or not. The same procedure is used for tokens that are valid for previous or current tokens.- Parameters:
isWhite
- If true, the space before exception is required.
-
isWhitespaceBefore
public boolean isWhitespaceBefore(AnalyzedToken token)
-
getExceptionList
public java.util.List<PatternToken> getExceptionList()
- Returns:
- A List of Exceptions. Used for testing.
- Since:
- 1.0.0
-
getPreviousExceptionList
public java.util.List<PatternToken> getPreviousExceptionList()
- Returns:
- List of previous exceptions. Used for testing.
-
hasExceptionList
public boolean hasExceptionList()
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-