Class PortugueseWordTokenizer

  • All Implemented Interfaces:
    org.languagetool.tokenizers.Tokenizer

    public class PortugueseWordTokenizer
    extends org.languagetool.tokenizers.WordTokenizer
    Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
    Since:
    3.6
    • Field Detail

      • NON_BREAKING_SPACE_SUBST

        private static final char NON_BREAKING_SPACE_SUBST
        See Also:
        Constant Field Values
      • NON_BREAKING_DOT_SUBST

        private static final char NON_BREAKING_DOT_SUBST
        See Also:
        Constant Field Values
      • NON_BREAKING_COLON_SUBST

        private static final char NON_BREAKING_COLON_SUBST
        See Also:
        Constant Field Values
      • DECIMAL_COMMA_PATTERN

        private static final java.util.regex.Pattern DECIMAL_COMMA_PATTERN
      • DECIMAL_COMMA_REPL

        private static final java.lang.String DECIMAL_COMMA_REPL
        See Also:
        Constant Field Values
      • DECIMAL_SPACE_PATTERN

        private static final java.util.regex.Pattern DECIMAL_SPACE_PATTERN
      • DOTTED_NUMBERS_PATTERN

        private static final java.util.regex.Pattern DOTTED_NUMBERS_PATTERN
      • DOTTED_NUMBERS_REPL

        private static final java.lang.String DOTTED_NUMBERS_REPL
        See Also:
        Constant Field Values
      • COLON_NUMBERS_PATTERN

        private static final java.util.regex.Pattern COLON_NUMBERS_PATTERN
      • COLON_NUMBERS_REPL

        private static final java.lang.String COLON_NUMBERS_REPL
        See Also:
        Constant Field Values
      • DATE_PATTERN

        private static final java.util.regex.Pattern DATE_PATTERN
      • DATE_PATTERN_REPL

        private static final java.lang.String DATE_PATTERN_REPL
        See Also:
        Constant Field Values
      • DOTTED_ORDINALS_PATTERN

        private static final java.util.regex.Pattern DOTTED_ORDINALS_PATTERN
      • DOTTED_ORDINALS_REPL

        private static final java.lang.String DOTTED_ORDINALS_REPL
        See Also:
        Constant Field Values
    • Constructor Detail

      • PortugueseWordTokenizer

        public PortugueseWordTokenizer()
    • Method Detail

      • tokenize

        public java.util.List<java.lang.String> tokenize​(java.lang.String text)
        Specified by:
        tokenize in interface org.languagetool.tokenizers.Tokenizer
        Overrides:
        tokenize in class org.languagetool.tokenizers.WordTokenizer