Class LanguageModelUtils


  • public final class LanguageModelUtils
    extends java.lang.Object
    • Field Detail

      • logger

        private static final org.slf4j.Logger logger
    • Constructor Detail

      • LanguageModelUtils

        private LanguageModelUtils()
    • Method Detail

      • getGoogleStyleWordTokenizer

        static Tokenizer getGoogleStyleWordTokenizer​(Language language)
        Return a tokenizer that works more like Google does for its ngram index (which doesn't seem to be properly documented).
      • getContext

        static java.util.List<java.lang.String> getContext​(GoogleToken token,
                                                           java.util.List<GoogleToken> tokens,
                                                           java.lang.String newToken,
                                                           int toLeft,
                                                           int toRight)
      • getContext

        static java.util.List<java.lang.String> getContext​(GoogleToken token,
                                                           java.util.List<GoogleToken> tokens,
                                                           java.util.List<GoogleToken> newTokens,
                                                           int toLeft,
                                                           int toRight)
      • getContext

        public static <T> java.util.List<T> getContext​(T token,
                                                       java.util.List<T> tokens,
                                                       java.util.List<T> newTokens,
                                                       int toLeft,
                                                       int toRight,
                                                       java.util.function.Predicate<T> isWhitespace,
                                                       T endToken)