Class AnalyzedTokenReadings

  • All Implemented Interfaces:
    java.lang.Iterable<AnalyzedToken>

    public final class AnalyzedTokenReadings
    extends java.lang.Object
    implements java.lang.Iterable<AnalyzedToken>
    An array of AnalyzedTokens used to store multiple POS tags and lemmas for a given single token.
    • Field Detail

      • NON_WORD_REGEX

        private static final java.util.regex.Pattern NON_WORD_REGEX
      • isWhitespace

        private final boolean isWhitespace
      • isLinebreak

        private final boolean isLinebreak
      • isSentStart

        private final boolean isSentStart
      • startPos

        private int startPos
      • token

        private java.lang.String token
      • chunkTags

        private java.util.List<ChunkTag> chunkTags
      • isSentEnd

        private boolean isSentEnd
      • isParaEnd

        private boolean isParaEnd
      • isWhitespaceBefore

        private boolean isWhitespaceBefore
      • isPosTagUnknown

        private boolean isPosTagUnknown
      • whitespaceBeforeChar

        private java.lang.String whitespaceBeforeChar
      • isImmunized

        private boolean isImmunized
      • isIgnoredBySpeller

        private boolean isIgnoredBySpeller
      • historicalAnnotations

        private java.lang.String historicalAnnotations
      • hasSameLemmas

        private boolean hasSameLemmas
    • Constructor Detail

      • AnalyzedTokenReadings

        public AnalyzedTokenReadings​(AnalyzedToken[] tokens,
                                     int startPos)
      • AnalyzedTokenReadings

        public AnalyzedTokenReadings​(AnalyzedToken token,
                                     int startPos)
      • AnalyzedTokenReadings

        public AnalyzedTokenReadings​(java.util.List<AnalyzedToken> tokens,
                                     int startPos)
      • AnalyzedTokenReadings

        AnalyzedTokenReadings​(AnalyzedToken token)
    • Method Detail

      • getReadings

        public java.util.List<AnalyzedToken> getReadings()
      • hasPosTag

        public boolean hasPosTag​(java.lang.String posTag)
        Checks if the token has a particular POS tag.
        Parameters:
        posTag - POS tag to look for
      • hasPosTagAndLemma

        public boolean hasPosTagAndLemma​(java.lang.String posTag,
                                         java.lang.String lemma)
        Checks if the token has a particular POS tag and lemma.
        Parameters:
        posTag - POS tag and lemma to look for
      • hasReading

        public boolean hasReading()
        Checks if there is at least one POS tag
        Since:
        4.7
      • hasLemma

        public boolean hasLemma​(java.lang.String lemma)
        Checks if one of the token's readings has a particular lemma.
        Parameters:
        lemma - lemma POS tag to look for
      • hasAnyLemma

        public boolean hasAnyLemma​(java.lang.String... lemmas)
        Checks if one of the token's readings has one of the given lemmas
        Parameters:
        lemmas - to look for
      • hasPartialPosTag

        public boolean hasPartialPosTag​(java.lang.String posTag)
        Checks if the token has a particular POS tag, where only a part of the given POS tag needs to match.
        Parameters:
        posTag - POS tag substring to look for
        Since:
        1.8
      • hasAnyPartialPosTag

        public boolean hasAnyPartialPosTag​(java.lang.String... posTags)
        Checks if the token has any of the given particular POS tags (only a part of the given POS tag needs to match)
        Parameters:
        posTags - POS tag substring to look for
        Since:
        4.0
      • hasPosTagStartingWith

        public boolean hasPosTagStartingWith​(java.lang.String posTag)
        Checks if the token has a POS tag starting with the given string.
        Parameters:
        posTag - POS tag substring to look for
        Since:
        4.0
      • matchesPosTagRegex

        public boolean matchesPosTagRegex​(java.lang.String posTagRegex)
        Checks if at least one of the readings matches a given POS tag regex.
        Parameters:
        posTagRegex - POS tag regular expression to look for
        Since:
        2.9
      • addReading

        public void addReading​(AnalyzedToken token,
                               java.lang.String ruleApplied)
        Add a new reading.
        Parameters:
        token - new reading, given as AnalyzedToken
      • removeReading

        public void removeReading​(AnalyzedToken token,
                                  java.lang.String ruleApplied)
        Removes a reading from the list of readings. Note: if the token has only one reading, then a new reading with an empty POS tag and an empty lemma is created.
        Parameters:
        token - reading to be removed
      • leaveReading

        public void leaveReading​(AnalyzedToken token)
        Removes all readings but the one that matches the token given.
        Parameters:
        token - Token to be matched
        Since:
        1.5
      • getReadingsLength

        public int getReadingsLength()
        Number of readings.
      • isWhitespace

        public boolean isWhitespace()
      • isLinebreak

        public boolean isLinebreak()
        Returns true if the token equals \n, \r, \n\r, or \r\n.
      • isSentenceStart

        public boolean isSentenceStart()
        Since:
        2.3
      • isParagraphEnd

        public boolean isParagraphEnd()
        Returns:
        true when the token is a last token in a paragraph.
        Since:
        2.3
      • setParagraphEnd

        public void setParagraphEnd()
        Add a reading with a paragraph end token unless this is already a paragraph end.
        Since:
        2.3
      • isSentenceEnd

        public boolean isSentenceEnd()
        Returns:
        true when the token is a last token in a sentence.
        Since:
        2.3
      • isFieldCode

        public boolean isFieldCode()
        Returns:
        true if the token is LibreOffice/OpenOffice field code.
        Since:
        0.9.9
      • setSentEnd

        public void setSentEnd()
        Add a SENT_END tag.
      • getStartPos

        public int getStartPos()
      • getEndPos

        public int getEndPos()
        Since:
        2.9
      • setStartPos

        public void setStartPos​(int position)
      • getToken

        public java.lang.String getToken()
      • setWhitespaceBefore

        public void setWhitespaceBefore​(java.lang.String prevToken)
      • getWhitespaceBefore

        public java.lang.String getWhitespaceBefore()
      • isWhitespaceBefore

        public boolean isWhitespaceBefore()
      • immunize

        public void immunize()
      • isImmunized

        public boolean isImmunized()
      • ignoreSpelling

        public void ignoreSpelling()
        Make the token ignored by all spelling rules.
        Since:
        2.5
      • isIgnoredBySpeller

        public boolean isIgnoredBySpeller()
        Test if the token can be ignored by spelling rules.
        Returns:
        true if the token should be ignored.
        Since:
        2.5
      • isPosTagUnknown

        public boolean isPosTagUnknown()
        Test if the token's POStag equals null.
        Returns:
        true if the token does not have a POStag
        Since:
        3.9
      • setNoRealPOStag

        private void setNoRealPOStag()
        Sets the flag on AnalyzedTokens to make matching on UNKNOWN POS tag correct in the Element class.
      • getHistoricalAnnotations

        public java.lang.String getHistoricalAnnotations()
        Used to track disambiguator actions.
        Returns:
        the historicalAnnotations
      • setHistoricalAnnotations

        private void setHistoricalAnnotations​(java.lang.String historicalAnnotations)
        Used to track disambiguator actions.
        Parameters:
        historicalAnnotations - the historicalAnnotations to set
      • addHistoricalAnnotations

        private void addHistoricalAnnotations​(java.lang.String oldValue,
                                              java.lang.String ruleApplied)
      • setChunkTags

        public void setChunkTags​(java.util.List<ChunkTag> chunkTags)
        Since:
        2.3
      • getChunkTags

        public java.util.List<ChunkTag> getChunkTags()
        Since:
        2.3
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • isTagged

        public boolean isTagged()
        Returns:
        true if AnalyzedTokenReadings has some real POS tag (= not null or a special tag)
        Since:
        2.3
      • areLemmasSame

        private boolean areLemmasSame()
        Used to configure the internal variable for lemma equality.
        Returns:
        true if all AnalyzedToken lemmas are the same.
        Since:
        2.5
      • hasSameLemmas

        public boolean hasSameLemmas()
        Used to optimize pattern matching.
        Returns:
        true if all AnalyzedToken lemmas are the same.
      • isNonWord

        public boolean isNonWord()
        Returns:
        true if AnalyzedTokenReadings is a punctuation mark, bracket, etc
        Since:
        4.4
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • iterator

        public java.util.Iterator<AnalyzedToken> iterator()
        Specified by:
        iterator in interface java.lang.Iterable<AnalyzedToken>
        Since:
        2.3