Class AnalyzedSentence


  • public final class AnalyzedSentence
    extends java.lang.Object
    A sentence that has been tokenized and analyzed.
    • Field Detail

      • whPositions

        private final int[] whPositions
      • tokenSet

        private final java.util.Set<java.lang.String> tokenSet
      • lemmaSet

        private final java.util.Set<java.lang.String> lemmaSet
    • Method Detail

      • getTokensWithoutWhitespace

        public AnalyzedTokenReadings[] getTokensWithoutWhitespace()
        Returns the AnalyzedTokenReadings of the analyzed text, with whitespace tokens removed but with the artificial SENT_START token included.
      • getOriginalPosition

        public int getOriginalPosition​(int nonWhPosition)
        Get a position of a non-whitespace token in the original sentence with whitespace.
        Parameters:
        nonWhPosition - position of a non-whitespace token
        Returns:
        position in the original sentence.
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • toShortString

        public java.lang.String toShortString​(java.lang.String readingDelimiter)
        Return string representation without chunk information.
        Since:
        2.3
      • getText

        public java.lang.String getText()
        Return the original text.
        Since:
        2.7
      • toTextString

        java.lang.String toTextString()
        Return string representation without any analysis information, just the original text.
        Since:
        2.6
      • toString

        public java.lang.String toString​(java.lang.String readingDelimiter)
        Return string representation with chunk information.
      • toString

        private java.lang.String toString​(java.lang.String readingDelimiter,
                                          boolean includeChunks)
      • getAnnotations

        public java.lang.String getAnnotations()
        Get disambiguator actions log.
      • getTokenSet

        public java.util.Set<java.lang.String> getTokenSet()
        Get the lowercase tokens of this sentence in a set. Used internally for performance optimization.
        Since:
        2.4
      • getLemmaSet

        public java.util.Set<java.lang.String> getLemmaSet()
        Get the lowercase lemmas of this sentence in a set. Used internally for performance optimization.
        Since:
        2.5
      • equals

        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • hasParagraphEndMark

        public boolean hasParagraphEndMark​(Language lang)
        Returns true if sentences ends with a paragraph break.
        Since:
        4.3