Class PatternToken

  • All Implemented Interfaces:
    java.lang.Cloneable

    public class PatternToken
    extends java.lang.Object
    implements java.lang.Cloneable
    A part of a pattern, represents the 'token' element of the grammar.xml.
    • Field Detail

      • UNKNOWN_TAG

        public static final java.lang.String UNKNOWN_TAG
        Matches only tokens without any POS tag.
        See Also:
        Constant Field Values
      • CASE_INSENSITIVE

        private static final java.lang.String CASE_INSENSITIVE
        Parameter passed to regular expression matcher to enable case insensitive Unicode matching.
        See Also:
        Constant Field Values
      • caseSensitive

        private final boolean caseSensitive
      • stringRegExp

        private final boolean stringRegExp
      • andGroupList

        private final java.util.List<PatternToken> andGroupList
      • orGroupList

        private final java.util.List<PatternToken> orGroupList
      • inflected

        private final boolean inflected
      • stringToken

        private java.lang.String stringToken
      • negation

        private boolean negation
      • testWhitespace

        private boolean testWhitespace
      • whitespaceBefore

        private boolean whitespaceBefore
      • isInsideMarker

        private boolean isInsideMarker
      • exceptionList

        private java.util.List<PatternToken> exceptionList
        List of exceptions that are valid for the current token and / or some next tokens.
      • exceptionValidNext

        private boolean exceptionValidNext
        True if scope=="next".
      • exceptionSet

        private boolean exceptionSet
        True if any exception with a scope=="current" or scope=="next" is set for the element.
      • exceptionValidPrevious

        private boolean exceptionValidPrevious
        True if attribute scope=="previous".
      • previousExceptionList

        private java.util.List<PatternToken> previousExceptionList
        List of exceptions that are valid for a previous token.
      • skip

        private int skip
      • minOccurrence

        private int minOccurrence
      • maxOccurrence

        private int maxOccurrence
      • pattern

        private java.util.regex.Pattern pattern
      • tokenReference

        private Match tokenReference
        The reference to another element in the pattern.
      • referenceString

        private java.lang.String referenceString
        True when the element stores a formatted reference to another element of the pattern.
      • phraseName

        private java.lang.String phraseName
        String ID of the phrase the element is in.
      • testString

        private boolean testString
        This var is used to determine if calling setStringElement(java.lang.String) makes sense. This method takes most time so it's best to reduce the number of its calls.
      • unificationNeutral

        private boolean unificationNeutral
        Determines whether the element should be ignored when doing unification
      • uniNegation

        private boolean uniNegation
      • unificationFeatures

        private java.util.Map<java.lang.String,​java.util.List<java.lang.String>> unificationFeatures
      • isLastUnified

        private boolean isLastUnified
        Set to true on tokens that close the unification block.
    • Constructor Detail

      • PatternToken

        public PatternToken​(java.lang.String token,
                            boolean caseSensitive,
                            boolean regExp,
                            boolean inflected)
        Creates Element that is used to match tokens in the text.
        Parameters:
        token - String to be matched
        caseSensitive - true if the check is case-sensitive
        regExp - true if the check uses regular expressions
        inflected - true if the check refers to base forms (lemmas), note that token must be a base form for this to work
    • Method Detail

      • clone

        public java.lang.Object clone()
                               throws java.lang.CloneNotSupportedException
        Overrides:
        clone in class java.lang.Object
        Throws:
        java.lang.CloneNotSupportedException
      • isMatched

        public boolean isMatched​(AnalyzedToken token)
        Checks whether the rule element matches the token given as a parameter.
        Parameters:
        token - AnalyzedToken to check matching against
        Returns:
        True if token matches, false otherwise.
      • isExceptionMatched

        public boolean isExceptionMatched​(AnalyzedToken token)
        Checks whether an exception matches.
        Parameters:
        token - AnalyzedToken to check matching against
        Returns:
        True if any of the exceptions matches (logical disjunction).
      • isAndExceptionGroupMatched

        public boolean isAndExceptionGroupMatched​(AnalyzedToken token)
        Enables testing multiple conditions specified by multiple element exceptions. Works as logical AND operator.
        Parameters:
        token - the token checked for exceptions.
        Returns:
        true if all conditions are met, false otherwise.
      • isExceptionMatchedCompletely

        public boolean isExceptionMatchedCompletely​(AnalyzedToken token)
        This method checks exceptions both in AND-group and the token. Introduced to for clarity.
        Parameters:
        token - Token to match
        Returns:
        True if matched.
      • setAndGroupElement

        public void setAndGroupElement​(PatternToken andToken)
      • hasAndGroup

        public boolean hasAndGroup()
        Checks if this element has an AND group associated with it.
        Returns:
        true if the element has a group of elements that all should match.
      • getAndGroup

        public java.util.List<PatternToken> getAndGroup()
        Returns the group of elements linked with AND operator.
      • setOrGroupElement

        public void setOrGroupElement​(PatternToken orToken)
        Since:
        2.3
      • hasOrGroup

        public boolean hasOrGroup()
        Checks if this element has an OR group associated with it.
        Returns:
        true if the element has a group of elements that all should match.
        Since:
        2.3
      • getOrGroup

        public java.util.List<PatternToken> getOrGroup()
        Returns the group of elements linked with OR operator.
        Since:
        2.3
      • isMatchedByScopeNextException

        public boolean isMatchedByScopeNextException​(AnalyzedToken token)
        Checks whether a previously set exception matches (in case the exception had scope == "next").
        Parameters:
        token - AnalyzedToken to check matching against.
        Returns:
        True if any of the exceptions matches.
      • isMatchedByPreviousException

        public boolean isMatchedByPreviousException​(AnalyzedToken token)
        Checks whether an exception for a previous token matches (in case the exception had scope == "previous").
        Parameters:
        token - AnalyzedToken to check matching against.
        Returns:
        True if any of the exceptions matches.
      • isMatchedByPreviousException

        public boolean isMatchedByPreviousException​(AnalyzedTokenReadings prevToken)
        Checks whether an exception for a previous token matches all readings of a given token (in case the exception had scope == "previous").
        Parameters:
        prevToken - AnalyzedTokenReadings to check matching against.
        Returns:
        true if any of the exceptions matches.
      • isSentenceStart

        public boolean isSentenceStart()
        Checks if the token is a sentence start.
        Returns:
        True if the element starts the sentence and the element hasn't been set to have negated POS token.
      • setChunkTag

        public void setChunkTag​(ChunkTag chunkTag)
        Since:
        2.9
      • getString

        @Nullable
        public @Nullable java.lang.String getString()
      • setStringElement

        public void setStringElement​(java.lang.String token)
      • setStringPosException

        public void setStringPosException​(java.lang.String token,
                                          boolean regExp,
                                          boolean inflected,
                                          boolean negation,
                                          boolean scopeNext,
                                          boolean scopePrevious,
                                          java.lang.String posToken,
                                          boolean posRegExp,
                                          boolean posNegation,
                                          java.lang.Boolean caseSensitivity)
        Sets a string and/or pos exception for matching tokens.
        Parameters:
        token - The string in the exception.
        regExp - True if the string is specified as a regular expression.
        inflected - True if the string is a base form (lemma).
        negation - True if the exception is negated.
        scopeNext - True if the exception scope is next tokens.
        scopePrevious - True if the exception should match only a single previous token.
        posToken - The part of the speech tag in the exception.
        posRegExp - True if the POS is specified as a regular expression.
        posNegation - True if the POS exception is negated.
        caseSensitivity - if null, use this element's setting for case sensitivity, otherwise the specified value
        Since:
        2.9
      • setException

        private void setException​(PatternToken pToken,
                                  boolean scopePrevious)
      • isPosTokenMatched

        private boolean isPosTokenMatched​(AnalyzedToken token)
        Tests if part of speech matches a given string. Special value UNKNOWN_TAG matches null POS tags.
        Parameters:
        token - Token to test.
        Returns:
        true if matches
      • isStringTokenMatched

        private boolean isStringTokenMatched​(AnalyzedToken token)
        Tests whether the string token element matches a given token.
        Parameters:
        token - AnalyzedToken to match against.
        Returns:
        True if matches.
      • getTestToken

        private java.lang.String getTestToken​(AnalyzedToken token)
      • getSkipNext

        public int getSkipNext()
        Gets the exception scope length.
        Returns:
        scope length in tokens
      • getMinOccurrence

        public int getMinOccurrence()
        The minimum number of times the element needs to occur.
      • getMaxOccurrence

        public int getMaxOccurrence()
        The maximum number of times the element may occur.
      • setSkipNext

        public void setSkipNext​(int i)
        Parameters:
        i - exception scope length.
      • setMinOccurrence

        public void setMinOccurrence​(int i)
        The minimum number of times this element may occur.
        Parameters:
        i - currently only 0 and 1 are supported
      • setMaxOccurrence

        public void setMaxOccurrence​(int i)
        The maximum number of times this element may occur.
        Parameters:
        i - a number >= 1 or -1 for unlimited occurrences
      • hasPreviousException

        public boolean hasPreviousException()
        Checks if the element has an exception for a previous token.
        Returns:
        True if the element has a previous token matching exception.
      • hasNextException

        public boolean hasNextException()
        Checks if the element has an exception for a next scope. (only used for testing)
        Returns:
        True if the element has exception for the next scope.
      • setNegation

        public void setNegation​(boolean negation)
        Negates the matching so that non-matching elements match and vice-versa.
      • isReferenceElement

        public boolean isReferenceElement()
        Returns:
        true when this element refers to another token.
      • setMatch

        public void setMatch​(Match match)
        Sets the reference to another token.
        Parameters:
        match - Formatting object for the token reference.
      • getMatch

        public Match getMatch()
      • compile

        public PatternToken compile​(AnalyzedTokenReadings token,
                                    Synthesizer synth)
                             throws java.io.IOException
        Prepare PatternToken for matching by formatting its string token and POS (if the Element is supposed to refer to some other token).
        Parameters:
        token - the token specified as AnalyzedTokenReadings
        synth - the language synthesizer (Synthesizer)
        Throws:
        java.io.IOException
      • setPhraseName

        public void setPhraseName​(java.lang.String id)
        Sets the phrase the element is in.
        Parameters:
        id - ID of the phrase.
      • isPartOfPhrase

        public boolean isPartOfPhrase()
        Checks if the Element is in any phrase.
        Returns:
        True if the Element is contained in the phrase.
      • isCaseSensitive

        public boolean isCaseSensitive()
        Whether the element matches case sensitively.
        Since:
        2.3
      • isRegularExpression

        public boolean isRegularExpression()
        Tests whether the element matches a regular expression.
        Since:
        0.9.6
      • isPOStagRegularExpression

        public boolean isPOStagRegularExpression()
        Tests whether the POS matches a regular expression.
        Since:
        1.3.0
      • getPOStag

        @Nullable
        public @Nullable java.lang.String getPOStag()
        Returns:
        the POS of the Element or null
        Since:
        0.9.6
      • getChunkTag

        @Nullable
        public @Nullable ChunkTag getChunkTag()
        Returns:
        the chunk tag of the Element or null
        Since:
        2.3
      • getPOSNegation

        public boolean getPOSNegation()
        Returns:
        true if the POS is negated.
      • isInflected

        public boolean isInflected()
        Returns:
        true if the token matches all inflected forms
      • getPhraseName

        @Nullable
        public @Nullable java.lang.String getPhraseName()
        Gets the phrase the element is in.
        Returns:
        String The name of the phrase.
      • isUnified

        public boolean isUnified()
      • setUnification

        public void setUnification​(java.util.Map<java.lang.String,​java.util.List<java.lang.String>> uniFeatures)
      • getUniFeatures

        @Nullable
        public @Nullable java.util.Map<java.lang.String,​java.util.List<java.lang.String>> getUniFeatures()
        Get unification features and types.
        Returns:
        A map from features to a list of types or null
        Since:
        1.0.1
      • setUniNegation

        public void setUniNegation()
      • isUniNegated

        public boolean isUniNegated()
      • isLastInUnification

        public boolean isLastInUnification()
      • setLastInUnification

        public void setLastInUnification()
      • isUnificationNeutral

        public boolean isUnificationNeutral()
        Determines whether the element should be silently ignored during unification, and simply added.
        Returns:
        True when the element is not included in unifying.
        Since:
        2.5
      • setUnificationNeutral

        public void setUnificationNeutral()
        Sets the element as ignored during unification.
        Since:
        2.5
      • setWhitespaceBefore

        public void setWhitespaceBefore​(boolean isWhite)
      • isInsideMarker

        public boolean isInsideMarker()
      • setInsideMarker

        public void setInsideMarker​(boolean isInsideMarker)
      • setExceptionSpaceBefore

        public void setExceptionSpaceBefore​(boolean isWhite)
        Sets the attribute on the exception that determines matching of patterns that depends on whether there was a space before the token matching the exception or not. The same procedure is used for tokens that are valid for previous or current tokens.
        Parameters:
        isWhite - If true, the space before exception is required.
      • isWhitespaceBefore

        public boolean isWhitespaceBefore​(AnalyzedToken token)
      • getExceptionList

        public java.util.List<PatternToken> getExceptionList()
        Returns:
        A List of Exceptions. Used for testing.
        Since:
        1.0.0
      • getPreviousExceptionList

        public java.util.List<PatternToken> getPreviousExceptionList()
        Returns:
        List of previous exceptions. Used for testing.
      • hasExceptionList

        public boolean hasExceptionList()
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object