Package org.languagetool.tokenizers
Class SRXSentenceTokenizer
- java.lang.Object
-
- org.languagetool.tokenizers.SRXSentenceTokenizer
-
- All Implemented Interfaces:
SentenceTokenizer
,Tokenizer
- Direct Known Subclasses:
SimpleSentenceTokenizer
public class SRXSentenceTokenizer extends java.lang.Object implements SentenceTokenizer
Class to tokenize sentences using rules from an SRX file.
-
-
Field Summary
Fields Modifier and Type Field Description private Language
language
private java.lang.String
parCode
private net.loomchild.segment.srx.SrxDocument
srxDocument
-
Constructor Summary
Constructors Constructor Description SRXSentenceTokenizer(Language language)
Build a sentence tokenizer based on the rules in thesegment.srx
file that comes with LanguageTool.SRXSentenceTokenizer(Language language, java.lang.String srxInClassPath)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
boolean
singleLineBreaksMarksPara()
java.util.List<java.lang.String>
tokenize(java.lang.String text)
Tokenize the given string to sentences.
-
-
-
Field Detail
-
srxDocument
private final net.loomchild.segment.srx.SrxDocument srxDocument
-
language
private final Language language
-
parCode
private java.lang.String parCode
-
-
Constructor Detail
-
SRXSentenceTokenizer
public SRXSentenceTokenizer(Language language)
Build a sentence tokenizer based on the rules in thesegment.srx
file that comes with LanguageTool.
-
SRXSentenceTokenizer
public SRXSentenceTokenizer(Language language, java.lang.String srxInClassPath)
- Parameters:
srxInClassPath
- the path to an SRX file in the classpath- Since:
- 3.2
-
-
Method Detail
-
tokenize
public final java.util.List<java.lang.String> tokenize(java.lang.String text)
Description copied from interface:SentenceTokenizer
Tokenize the given string to sentences.- Specified by:
tokenize
in interfaceSentenceTokenizer
- Specified by:
tokenize
in interfaceTokenizer
-
singleLineBreaksMarksPara
public final boolean singleLineBreaksMarksPara()
- Specified by:
singleLineBreaksMarksPara
in interfaceSentenceTokenizer
-
setSingleLineBreaksMarksParagraph
public final void setSingleLineBreaksMarksParagraph(boolean lineBreakParagraphs)
- Specified by:
setSingleLineBreaksMarksParagraph
in interfaceSentenceTokenizer
- Parameters:
lineBreakParagraphs
- iftrue
, single lines breaks are assumed to end a paragraph; iffalse
, only two ore more consecutive line breaks end a paragraph
-
-