Package org.languagetool.tokenizers.ca
Class CatalanWordTokenizer
- java.lang.Object
-
- org.languagetool.tokenizers.WordTokenizer
-
- org.languagetool.tokenizers.ca.CatalanWordTokenizer
-
- All Implemented Interfaces:
org.languagetool.tokenizers.Tokenizer
public class CatalanWordTokenizer extends org.languagetool.tokenizers.WordTokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token. Special treatment for hyphens and apostrophes in Catalan.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.regex.Pattern
APOSTROF_RECTE
private static java.util.regex.Pattern
APOSTROF_RECTE_1
private static java.util.regex.Pattern
APOSTROF_RODO
private static java.util.regex.Pattern
APOSTROF_RODO_1
private static java.util.regex.Pattern
DECIMAL_COMMA
private static java.util.regex.Pattern
DECIMAL_POINT
private static java.lang.String
DICT_FILENAME
private static java.util.regex.Pattern
ELA_GEMINADA
private static java.util.regex.Pattern
ELA_GEMINADA_UPPERCASE
private static java.util.regex.Pattern
HYPHENS
private static int
maxPatterns
private static java.util.regex.Pattern
NEARBY_HYPHENS
private java.util.regex.Pattern[]
patterns
private static java.lang.String
PF
private static java.util.regex.Pattern
SPACE_DIGITS
private static java.util.regex.Pattern
SPACE_DIGITS0
private static java.util.regex.Pattern
SPACE_DIGITS2
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller
speller
-
Constructor Summary
Constructors Constructor Description CatalanWordTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<java.lang.String>
tokenize(java.lang.String text)
private java.util.List<java.lang.String>
wordsToAdd(java.lang.String s)
-
-
-
Field Detail
-
PF
private static final java.lang.String PF
- See Also:
- Constant Field Values
-
maxPatterns
private static final int maxPatterns
- See Also:
- Constant Field Values
-
patterns
private final java.util.regex.Pattern[] patterns
-
DICT_FILENAME
private static final java.lang.String DICT_FILENAME
- See Also:
- Constant Field Values
-
speller
protected org.languagetool.rules.spelling.morfologik.MorfologikSpeller speller
-
ELA_GEMINADA
private static final java.util.regex.Pattern ELA_GEMINADA
-
ELA_GEMINADA_UPPERCASE
private static final java.util.regex.Pattern ELA_GEMINADA_UPPERCASE
-
APOSTROF_RECTE
private static final java.util.regex.Pattern APOSTROF_RECTE
-
APOSTROF_RODO
private static final java.util.regex.Pattern APOSTROF_RODO
-
APOSTROF_RECTE_1
private static final java.util.regex.Pattern APOSTROF_RECTE_1
-
APOSTROF_RODO_1
private static final java.util.regex.Pattern APOSTROF_RODO_1
-
NEARBY_HYPHENS
private static final java.util.regex.Pattern NEARBY_HYPHENS
-
HYPHENS
private static final java.util.regex.Pattern HYPHENS
-
DECIMAL_POINT
private static final java.util.regex.Pattern DECIMAL_POINT
-
DECIMAL_COMMA
private static final java.util.regex.Pattern DECIMAL_COMMA
-
SPACE_DIGITS0
private static final java.util.regex.Pattern SPACE_DIGITS0
-
SPACE_DIGITS
private static final java.util.regex.Pattern SPACE_DIGITS
-
SPACE_DIGITS2
private static final java.util.regex.Pattern SPACE_DIGITS2
-
-
Method Detail
-
tokenize
public java.util.List<java.lang.String> tokenize(java.lang.String text)
- Specified by:
tokenize
in interfaceorg.languagetool.tokenizers.Tokenizer
- Overrides:
tokenize
in classorg.languagetool.tokenizers.WordTokenizer
- Parameters:
text
- Text to tokenize- Returns:
- List of tokens. Note: a special string CA_APOS is used to replace apostrophes, and CA_HYPHEN to replace hyphens.
-
wordsToAdd
private java.util.List<java.lang.String> wordsToAdd(java.lang.String s)
-
-