Package org.languagetool
Class AnalyzedSentence
- java.lang.Object
-
- org.languagetool.AnalyzedSentence
-
public final class AnalyzedSentence extends java.lang.Object
A sentence that has been tokenized and analyzed.
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.Set<java.lang.String>
lemmaSet
private AnalyzedTokenReadings[]
nonBlankPreDisambigTokens
private AnalyzedTokenReadings[]
nonBlankTokens
private AnalyzedTokenReadings[]
preDisambigTokens
private AnalyzedTokenReadings[]
tokens
private java.util.Set<java.lang.String>
tokenSet
private int[]
whPositions
-
Constructor Summary
Constructors Modifier Constructor Description AnalyzedSentence(AnalyzedTokenReadings[] tokens)
Creates an AnalyzedSentence from the givenAnalyzedTokenReadings
.private
AnalyzedSentence(AnalyzedTokenReadings[] tokens, int[] mapping, AnalyzedTokenReadings[] nonBlankTokens, AnalyzedTokenReadings[] nonBlankPreDisambigTokens)
AnalyzedSentence(AnalyzedTokenReadings[] tokens, AnalyzedTokenReadings[] preDisambigTokens)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AnalyzedSentence
copy(AnalyzedSentence sentence)
The method copiesAnalyzedSentence
and returns the copy.boolean
equals(java.lang.Object o)
java.lang.String
getAnnotations()
Get disambiguator actions log.java.util.Set<java.lang.String>
getLemmaSet()
Get the lowercase lemmas of this sentence in a set.private java.util.Set<java.lang.String>
getLemmaSet(AnalyzedTokenReadings[] tokens)
private @NotNull java.util.List<AnalyzedTokenReadings>
getNonBlankReadings(AnalyzedTokenReadings[] tokens, int whCounter, int nonWhCounter, int[] mapping)
int
getOriginalPosition(int nonWhPosition)
Get a position of a non-whitespace token in the original sentence with whitespace.AnalyzedTokenReadings[]
getPreDisambigTokens()
AnalyzedTokenReadings[]
getPreDisambigTokensWithoutWhitespace()
java.lang.String
getText()
Return the original text.AnalyzedTokenReadings[]
getTokens()
Returns theAnalyzedTokenReadings
of the analyzed text.java.util.Set<java.lang.String>
getTokenSet()
Get the lowercase tokens of this sentence in a set.private java.util.Set<java.lang.String>
getTokenSet(AnalyzedTokenReadings[] tokens)
AnalyzedTokenReadings[]
getTokensWithoutWhitespace()
Returns theAnalyzedTokenReadings
of the analyzed text, with whitespace tokens removed but with the artificialSENT_START
token included.int
hashCode()
boolean
hasParagraphEndMark(Language lang)
Returns true if sentences ends with a paragraph break.java.lang.String
toShortString(java.lang.String readingDelimiter)
Return string representation without chunk information.java.lang.String
toString()
java.lang.String
toString(java.lang.String readingDelimiter)
Return string representation with chunk information.private java.lang.String
toString(java.lang.String readingDelimiter, boolean includeChunks)
(package private) java.lang.String
toTextString()
Return string representation without any analysis information, just the original text.
-
-
-
Field Detail
-
tokens
private final AnalyzedTokenReadings[] tokens
-
preDisambigTokens
private final AnalyzedTokenReadings[] preDisambigTokens
-
nonBlankTokens
private final AnalyzedTokenReadings[] nonBlankTokens
-
nonBlankPreDisambigTokens
private final AnalyzedTokenReadings[] nonBlankPreDisambigTokens
-
whPositions
private final int[] whPositions
-
tokenSet
private final java.util.Set<java.lang.String> tokenSet
-
lemmaSet
private final java.util.Set<java.lang.String> lemmaSet
-
-
Constructor Detail
-
AnalyzedSentence
public AnalyzedSentence(AnalyzedTokenReadings[] tokens)
Creates an AnalyzedSentence from the givenAnalyzedTokenReadings
. Whitespace is also a token.
-
AnalyzedSentence
public AnalyzedSentence(AnalyzedTokenReadings[] tokens, AnalyzedTokenReadings[] preDisambigTokens)
-
AnalyzedSentence
private AnalyzedSentence(AnalyzedTokenReadings[] tokens, int[] mapping, AnalyzedTokenReadings[] nonBlankTokens, AnalyzedTokenReadings[] nonBlankPreDisambigTokens)
-
-
Method Detail
-
getNonBlankReadings
@NotNull private @NotNull java.util.List<AnalyzedTokenReadings> getNonBlankReadings(AnalyzedTokenReadings[] tokens, int whCounter, int nonWhCounter, int[] mapping)
-
getTokenSet
private java.util.Set<java.lang.String> getTokenSet(AnalyzedTokenReadings[] tokens)
-
getLemmaSet
private java.util.Set<java.lang.String> getLemmaSet(AnalyzedTokenReadings[] tokens)
-
copy
public AnalyzedSentence copy(AnalyzedSentence sentence)
The method copiesAnalyzedSentence
and returns the copy. Useful for performing local immunization (for example).- Parameters:
sentence
-AnalyzedSentence
to be copied- Returns:
- a new object which is a copy
- Since:
- 2.5
-
getTokens
public AnalyzedTokenReadings[] getTokens()
Returns theAnalyzedTokenReadings
of the analyzed text. Whitespace is also a token.
-
getPreDisambigTokens
@Experimental public AnalyzedTokenReadings[] getPreDisambigTokens()
- Since:
- 4.5
-
getTokensWithoutWhitespace
public AnalyzedTokenReadings[] getTokensWithoutWhitespace()
Returns theAnalyzedTokenReadings
of the analyzed text, with whitespace tokens removed but with the artificialSENT_START
token included.
-
getPreDisambigTokensWithoutWhitespace
@Experimental public AnalyzedTokenReadings[] getPreDisambigTokensWithoutWhitespace()
- Since:
- 4.5
-
getOriginalPosition
public int getOriginalPosition(int nonWhPosition)
Get a position of a non-whitespace token in the original sentence with whitespace.- Parameters:
nonWhPosition
- position of a non-whitespace token- Returns:
- position in the original sentence.
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
toShortString
public java.lang.String toShortString(java.lang.String readingDelimiter)
Return string representation without chunk information.- Since:
- 2.3
-
getText
public java.lang.String getText()
Return the original text.- Since:
- 2.7
-
toTextString
java.lang.String toTextString()
Return string representation without any analysis information, just the original text.- Since:
- 2.6
-
toString
public java.lang.String toString(java.lang.String readingDelimiter)
Return string representation with chunk information.
-
toString
private java.lang.String toString(java.lang.String readingDelimiter, boolean includeChunks)
-
getAnnotations
public java.lang.String getAnnotations()
Get disambiguator actions log.
-
getTokenSet
public java.util.Set<java.lang.String> getTokenSet()
Get the lowercase tokens of this sentence in a set. Used internally for performance optimization.- Since:
- 2.4
-
getLemmaSet
public java.util.Set<java.lang.String> getLemmaSet()
Get the lowercase lemmas of this sentence in a set. Used internally for performance optimization.- Since:
- 2.5
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
hasParagraphEndMark
public boolean hasParagraphEndMark(Language lang)
Returns true if sentences ends with a paragraph break.- Since:
- 4.3
-
-