Package org.languagetool.tokenizers.nl
Class DutchWordTokenizer
- java.lang.Object
-
- org.languagetool.tokenizers.WordTokenizer
-
- org.languagetool.tokenizers.nl.DutchWordTokenizer
-
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class DutchWordTokenizer extends org.languagetool.tokenizers.WordTokenizer
-
-
Field Summary
Fields Modifier and Type Field Description private java.lang.String
nlTokenizingChars
private static java.util.List<java.lang.String>
QUOTES
-
Constructor Summary
Constructors Constructor Description DutchWordTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
endsWithQuote(java.lang.String token)
java.lang.String
getTokenizingCharacters()
private boolean
startsWithQuote(java.lang.String token)
java.util.List<java.lang.String>
tokenize(java.lang.String text)
Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.
-
-
-
Method Detail
-
tokenize
public java.util.List<java.lang.String> tokenize(java.lang.String text)
Tokenizes just like WordTokenizer with the exception for words such as "oma's" that contain an apostrophe in their middle.- Specified by:
tokenize
in interfaceorg.languagetool.tokenizers.Tokenizer
- Overrides:
tokenize
in classorg.languagetool.tokenizers.WordTokenizer
- Parameters:
text
- Text to tokenize- Returns:
- List of tokens
-
startsWithQuote
private boolean startsWithQuote(java.lang.String token)
-
endsWithQuote
private boolean endsWithQuote(java.lang.String token)
-
getTokenizingCharacters
public java.lang.String getTokenizingCharacters()
- Overrides:
getTokenizingCharacters
in classorg.languagetool.tokenizers.WordTokenizer
-
-