Class MultiWordChunker
- java.lang.Object
-
- org.languagetool.tagging.disambiguation.AbstractDisambiguator
-
- org.languagetool.tagging.disambiguation.MultiWordChunker
-
- All Implemented Interfaces:
Disambiguator
public class MultiWordChunker extends AbstractDisambiguator
Multiword tagger-chunker.
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
allowFirstCapitalized
private java.lang.String
filename
private java.util.Map<java.lang.String,java.lang.String>
mFull
private java.util.Map<java.lang.String,java.lang.Integer>
mStartNoSpace
private java.util.Map<java.lang.String,java.lang.Integer>
mStartSpace
-
Constructor Summary
Constructors Constructor Description MultiWordChunker(java.lang.String filename)
MultiWordChunker(java.lang.String filename, boolean allowFirstCapitalized)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AnalyzedSentence
disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.private void
lazyInit()
private java.util.List<java.lang.String>
loadWords(java.io.InputStream stream)
private AnalyzedTokenReadings
prepareNewReading(java.lang.String tokens, java.lang.String tok, AnalyzedTokenReadings token, boolean isLast)
private AnalyzedTokenReadings
setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading)
-
Methods inherited from class org.languagetool.tagging.disambiguation.AbstractDisambiguator
preDisambiguate
-
-
-
-
Field Detail
-
filename
private final java.lang.String filename
-
allowFirstCapitalized
private final boolean allowFirstCapitalized
-
mStartSpace
private java.util.Map<java.lang.String,java.lang.Integer> mStartSpace
-
mStartNoSpace
private java.util.Map<java.lang.String,java.lang.Integer> mStartNoSpace
-
mFull
private java.util.Map<java.lang.String,java.lang.String> mFull
-
-
Constructor Detail
-
MultiWordChunker
public MultiWordChunker(java.lang.String filename)
- Parameters:
filename
- file text with multiwords and tags
-
MultiWordChunker
public MultiWordChunker(java.lang.String filename, boolean allowFirstCapitalized)
- Parameters:
filename
- file text with multiwords and tagsallowFirstCapitalized
- if set totrue
, first word of the multiword can be capitalized
-
-
Method Detail
-
lazyInit
private void lazyInit()
-
disambiguate
public final AnalyzedSentence disambiguate(AnalyzedSentence input)
Implements multiword POS tags, e.g., <ELLIPSIS> for ellipsis (...) start, and </ELLIPSIS> for ellipsis end.- Parameters:
input
- The tokens to be chunked.- Returns:
- AnalyzedSentence with additional markers.
-
prepareNewReading
private AnalyzedTokenReadings prepareNewReading(java.lang.String tokens, java.lang.String tok, AnalyzedTokenReadings token, boolean isLast)
-
setAndAnnotate
private AnalyzedTokenReadings setAndAnnotate(AnalyzedTokenReadings oldReading, AnalyzedToken newReading)
-
loadWords
private java.util.List<java.lang.String> loadWords(java.io.InputStream stream)
-
-