Package org.languagetool
Class AnalyzedTokenReadings
- java.lang.Object
-
- org.languagetool.AnalyzedTokenReadings
-
- All Implemented Interfaces:
java.lang.Iterable<AnalyzedToken>
public final class AnalyzedTokenReadings extends java.lang.Object implements java.lang.Iterable<AnalyzedToken>
An array ofAnalyzedToken
s used to store multiple POS tags and lemmas for a given single token.
-
-
Field Summary
Fields Modifier and Type Field Description private AnalyzedToken[]
anTokReadings
private java.util.List<ChunkTag>
chunkTags
private boolean
hasSameLemmas
private java.lang.String
historicalAnnotations
private boolean
isIgnoredBySpeller
private boolean
isImmunized
private boolean
isLinebreak
private boolean
isParaEnd
private boolean
isPosTagUnknown
private boolean
isSentEnd
private boolean
isSentStart
private boolean
isWhitespace
private boolean
isWhitespaceBefore
private static java.util.regex.Pattern
NON_WORD_REGEX
private int
startPos
private java.lang.String
token
private java.lang.String
whitespaceBeforeChar
-
Constructor Summary
Constructors Constructor Description AnalyzedTokenReadings(java.util.List<AnalyzedToken> tokens, int startPos)
AnalyzedTokenReadings(AnalyzedToken token)
AnalyzedTokenReadings(AnalyzedToken[] tokens, int startPos)
AnalyzedTokenReadings(AnalyzedToken token, int startPos)
AnalyzedTokenReadings(AnalyzedTokenReadings oldAtr, java.util.List<AnalyzedToken> newReadings, java.lang.String ruleApplied)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addHistoricalAnnotations(java.lang.String oldValue, java.lang.String ruleApplied)
void
addReading(AnalyzedToken token, java.lang.String ruleApplied)
Add a new reading.private boolean
areLemmasSame()
Used to configure the internal variable for lemma equality.boolean
equals(java.lang.Object obj)
AnalyzedToken
getAnalyzedToken(int idx)
Get a token reading.java.util.List<ChunkTag>
getChunkTags()
int
getEndPos()
java.lang.String
getHistoricalAnnotations()
Used to track disambiguator actions.java.util.List<AnalyzedToken>
getReadings()
int
getReadingsLength()
Number of readings.int
getStartPos()
java.lang.String
getToken()
java.lang.String
getWhitespaceBefore()
boolean
hasAnyLemma(java.lang.String... lemmas)
Checks if one of the token's readings has one of the given lemmasboolean
hasAnyPartialPosTag(java.lang.String... posTags)
Checks if the token has any of the given particular POS tags (only a part of the given POS tag needs to match)int
hashCode()
boolean
hasLemma(java.lang.String lemma)
Checks if one of the token's readings has a particular lemma.boolean
hasPartialPosTag(java.lang.String posTag)
Checks if the token has a particular POS tag, where only a part of the given POS tag needs to match.boolean
hasPosTag(java.lang.String posTag)
Checks if the token has a particular POS tag.boolean
hasPosTagAndLemma(java.lang.String posTag, java.lang.String lemma)
Checks if the token has a particular POS tag and lemma.boolean
hasPosTagStartingWith(java.lang.String posTag)
Checks if the token has a POS tag starting with the given string.boolean
hasReading()
Checks if there is at least one POS tagboolean
hasSameLemmas()
Used to optimize pattern matching.void
ignoreSpelling()
Make the token ignored by all spelling rules.void
immunize()
boolean
isFieldCode()
boolean
isIgnoredBySpeller()
Test if the token can be ignored by spelling rules.boolean
isImmunized()
boolean
isLinebreak()
Returns true if the token equals\n
,\r
,\n\r
, or\r\n
.boolean
isNonWord()
boolean
isParagraphEnd()
boolean
isPosTagUnknown()
Test if the token's POStag equals null.boolean
isSentenceEnd()
boolean
isSentenceStart()
boolean
isTagged()
boolean
isWhitespace()
boolean
isWhitespaceBefore()
java.util.Iterator<AnalyzedToken>
iterator()
void
leaveReading(AnalyzedToken token)
Removes all readings but the one that matches the token given.boolean
matchesPosTagRegex(java.lang.String posTagRegex)
Checks if at least one of the readings matches a given POS tag regex.void
removeReading(AnalyzedToken token, java.lang.String ruleApplied)
Removes a reading from the list of readings.void
setChunkTags(java.util.List<ChunkTag> chunkTags)
private void
setHistoricalAnnotations(java.lang.String historicalAnnotations)
Used to track disambiguator actions.private void
setNoRealPOStag()
Sets the flag on AnalyzedTokens to make matching onUNKNOWN
POS tag correct in the Element class.void
setParagraphEnd()
Add a reading with a paragraph end token unless this is already a paragraph end.void
setSentEnd()
Add a SENT_END tag.void
setStartPos(int position)
void
setWhitespaceBefore(java.lang.String prevToken)
java.lang.String
toString()
-
-
-
Field Detail
-
NON_WORD_REGEX
private static final java.util.regex.Pattern NON_WORD_REGEX
-
isWhitespace
private final boolean isWhitespace
-
isLinebreak
private final boolean isLinebreak
-
isSentStart
private final boolean isSentStart
-
anTokReadings
private AnalyzedToken[] anTokReadings
-
startPos
private int startPos
-
token
private java.lang.String token
-
chunkTags
private java.util.List<ChunkTag> chunkTags
-
isSentEnd
private boolean isSentEnd
-
isParaEnd
private boolean isParaEnd
-
isWhitespaceBefore
private boolean isWhitespaceBefore
-
isPosTagUnknown
private boolean isPosTagUnknown
-
whitespaceBeforeChar
private java.lang.String whitespaceBeforeChar
-
isImmunized
private boolean isImmunized
-
isIgnoredBySpeller
private boolean isIgnoredBySpeller
-
historicalAnnotations
private java.lang.String historicalAnnotations
-
hasSameLemmas
private boolean hasSameLemmas
-
-
Constructor Detail
-
AnalyzedTokenReadings
public AnalyzedTokenReadings(AnalyzedToken[] tokens, int startPos)
-
AnalyzedTokenReadings
public AnalyzedTokenReadings(AnalyzedToken token, int startPos)
-
AnalyzedTokenReadings
public AnalyzedTokenReadings(java.util.List<AnalyzedToken> tokens, int startPos)
-
AnalyzedTokenReadings
public AnalyzedTokenReadings(AnalyzedTokenReadings oldAtr, java.util.List<AnalyzedToken> newReadings, java.lang.String ruleApplied)
-
AnalyzedTokenReadings
AnalyzedTokenReadings(AnalyzedToken token)
-
-
Method Detail
-
getReadings
public java.util.List<AnalyzedToken> getReadings()
-
getAnalyzedToken
public AnalyzedToken getAnalyzedToken(int idx)
Get a token reading.
-
hasPosTag
public boolean hasPosTag(java.lang.String posTag)
Checks if the token has a particular POS tag.- Parameters:
posTag
- POS tag to look for
-
hasPosTagAndLemma
public boolean hasPosTagAndLemma(java.lang.String posTag, java.lang.String lemma)
Checks if the token has a particular POS tag and lemma.- Parameters:
posTag
- POS tag and lemma to look for
-
hasReading
public boolean hasReading()
Checks if there is at least one POS tag- Since:
- 4.7
-
hasLemma
public boolean hasLemma(java.lang.String lemma)
Checks if one of the token's readings has a particular lemma.- Parameters:
lemma
- lemma POS tag to look for
-
hasAnyLemma
public boolean hasAnyLemma(java.lang.String... lemmas)
Checks if one of the token's readings has one of the given lemmas- Parameters:
lemmas
- to look for
-
hasPartialPosTag
public boolean hasPartialPosTag(java.lang.String posTag)
Checks if the token has a particular POS tag, where only a part of the given POS tag needs to match.- Parameters:
posTag
- POS tag substring to look for- Since:
- 1.8
-
hasAnyPartialPosTag
public boolean hasAnyPartialPosTag(java.lang.String... posTags)
Checks if the token has any of the given particular POS tags (only a part of the given POS tag needs to match)- Parameters:
posTags
- POS tag substring to look for- Since:
- 4.0
-
hasPosTagStartingWith
public boolean hasPosTagStartingWith(java.lang.String posTag)
Checks if the token has a POS tag starting with the given string.- Parameters:
posTag
- POS tag substring to look for- Since:
- 4.0
-
matchesPosTagRegex
public boolean matchesPosTagRegex(java.lang.String posTagRegex)
Checks if at least one of the readings matches a given POS tag regex.- Parameters:
posTagRegex
- POS tag regular expression to look for- Since:
- 2.9
-
addReading
public void addReading(AnalyzedToken token, java.lang.String ruleApplied)
Add a new reading.- Parameters:
token
- new reading, given asAnalyzedToken
-
removeReading
public void removeReading(AnalyzedToken token, java.lang.String ruleApplied)
Removes a reading from the list of readings. Note: if the token has only one reading, then a new reading with an empty POS tag and an empty lemma is created.- Parameters:
token
- reading to be removed
-
leaveReading
public void leaveReading(AnalyzedToken token)
Removes all readings but the one that matches the token given.- Parameters:
token
- Token to be matched- Since:
- 1.5
-
getReadingsLength
public int getReadingsLength()
Number of readings.
-
isWhitespace
public boolean isWhitespace()
-
isLinebreak
public boolean isLinebreak()
Returns true if the token equals\n
,\r
,\n\r
, or\r\n
.
-
isSentenceStart
public boolean isSentenceStart()
- Since:
- 2.3
-
isParagraphEnd
public boolean isParagraphEnd()
- Returns:
- true when the token is a last token in a paragraph.
- Since:
- 2.3
-
setParagraphEnd
public void setParagraphEnd()
Add a reading with a paragraph end token unless this is already a paragraph end.- Since:
- 2.3
-
isSentenceEnd
public boolean isSentenceEnd()
- Returns:
- true when the token is a last token in a sentence.
- Since:
- 2.3
-
isFieldCode
public boolean isFieldCode()
- Returns:
- true if the token is LibreOffice/OpenOffice field code.
- Since:
- 0.9.9
-
setSentEnd
public void setSentEnd()
Add a SENT_END tag.
-
getStartPos
public int getStartPos()
-
getEndPos
public int getEndPos()
- Since:
- 2.9
-
setStartPos
public void setStartPos(int position)
-
getToken
public java.lang.String getToken()
-
setWhitespaceBefore
public void setWhitespaceBefore(java.lang.String prevToken)
-
getWhitespaceBefore
public java.lang.String getWhitespaceBefore()
-
isWhitespaceBefore
public boolean isWhitespaceBefore()
-
immunize
public void immunize()
-
isImmunized
public boolean isImmunized()
-
ignoreSpelling
public void ignoreSpelling()
Make the token ignored by all spelling rules.- Since:
- 2.5
-
isIgnoredBySpeller
public boolean isIgnoredBySpeller()
Test if the token can be ignored by spelling rules.- Returns:
- true if the token should be ignored.
- Since:
- 2.5
-
isPosTagUnknown
public boolean isPosTagUnknown()
Test if the token's POStag equals null.- Returns:
- true if the token does not have a POStag
- Since:
- 3.9
-
setNoRealPOStag
private void setNoRealPOStag()
Sets the flag on AnalyzedTokens to make matching onUNKNOWN
POS tag correct in the Element class.
-
getHistoricalAnnotations
public java.lang.String getHistoricalAnnotations()
Used to track disambiguator actions.- Returns:
- the historicalAnnotations
-
setHistoricalAnnotations
private void setHistoricalAnnotations(java.lang.String historicalAnnotations)
Used to track disambiguator actions.- Parameters:
historicalAnnotations
- the historicalAnnotations to set
-
addHistoricalAnnotations
private void addHistoricalAnnotations(java.lang.String oldValue, java.lang.String ruleApplied)
-
setChunkTags
public void setChunkTags(java.util.List<ChunkTag> chunkTags)
- Since:
- 2.3
-
getChunkTags
public java.util.List<ChunkTag> getChunkTags()
- Since:
- 2.3
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
isTagged
public boolean isTagged()
- Returns:
- true if AnalyzedTokenReadings has some real POS tag (= not null or a special tag)
- Since:
- 2.3
-
areLemmasSame
private boolean areLemmasSame()
Used to configure the internal variable for lemma equality.- Returns:
- true if all
AnalyzedToken
lemmas are the same. - Since:
- 2.5
-
hasSameLemmas
public boolean hasSameLemmas()
Used to optimize pattern matching.- Returns:
- true if all
AnalyzedToken
lemmas are the same.
-
isNonWord
public boolean isNonWord()
- Returns:
- true if AnalyzedTokenReadings is a punctuation mark, bracket, etc
- Since:
- 4.4
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
equals
public boolean equals(java.lang.Object obj)
- Overrides:
equals
in classjava.lang.Object
-
iterator
public java.util.Iterator<AnalyzedToken> iterator()
- Specified by:
iterator
in interfacejava.lang.Iterable<AnalyzedToken>
- Since:
- 2.3
-
-