Class Language

  • Direct Known Subclasses:
    DynamicLanguage, LanguageBuilder.ExtendedLanguage, NoopLanguage, SimpleSentenceTokenizer.AnyLanguage

    public abstract class Language
    extends java.lang.Object
    Base class for any supported language (English, German, etc). Language classes are detected at runtime by searching the classpath for files named META-INF/org/languagetool/language-module.properties. Those file(s) need to contain a key languageClasses which specifies the fully qualified class name(s), e.g. org.languagetool.language.English. Use commas to specify more than one class.

    Sub classes should typically use lazy init for anything that's costly to set up. This improves start up time for the LanguageTool stand-alone version.

    • Field Detail

      • DEMO_DISAMBIGUATOR

        private static final Disambiguator DEMO_DISAMBIGUATOR
      • DEMO_TAGGER

        private static final Tagger DEMO_TAGGER
      • WORD_TOKENIZER

        private static final WordTokenizer WORD_TOKENIZER
      • ignoredCharactersRegex

        private final java.util.regex.Pattern ignoredCharactersRegex
      • noLmWarningPrinted

        private boolean noLmWarningPrinted
    • Constructor Detail

      • Language

        public Language()
    • Method Detail

      • getShortCode

        public abstract java.lang.String getShortCode()
        Get this language's character code, e.g. en for English. For most languages this is a two-letter code according to ISO 639-1, but for those languages that don't have a two-letter code, a three-letter code according to ISO 639-2 is returned. The country parameter (e.g. "US"), if any, is not returned.
        Since:
        3.6
      • getName

        public abstract java.lang.String getName()
        Get this language's name in English, e.g. English or German (Germany).
        Returns:
        language name
      • getCountries

        public abstract java.lang.String[] getCountries()
        Get this language's country options , e.g. US (as in en-US) or PL (as in pl-PL).
        Returns:
        String[] - array of country options for the language.
      • getMaintainers

        @Nullable
        public abstract @Nullable Contributor[] getMaintainers()
        Get the name(s) of the maintainer(s) for this language or null.
      • getRelevantRules

        public abstract java.util.List<Rule> getRelevantRules​(java.util.ResourceBundle messages,
                                                              UserConfig userConfig,
                                                              Language motherTongue,
                                                              java.util.List<Language> altLanguages)
                                                       throws java.io.IOException
        Get the rules classes that should run for texts in this language.
        Throws:
        java.io.IOException
        Since:
        4.3
      • getCommonWordsPath

        public java.lang.String getCommonWordsPath()
        A file with commons words, either in the classpath or as a filename in the file system.
        Since:
        4.5
      • getVariant

        @Nullable
        public @Nullable java.lang.String getVariant()
        Get this language's variant, e.g. valencia (as in ca-ES-valencia) or null. Attention: not to be confused with "country" option
        Returns:
        variant for the language or null
        Since:
        2.3
      • getDefaultEnabledRulesForVariant

        public java.util.List<java.lang.String> getDefaultEnabledRulesForVariant()
        Get enabled rules different from the default ones for this language variant.
        Returns:
        enabled rules for the language variant.
        Since:
        2.4
      • getDefaultDisabledRulesForVariant

        public java.util.List<java.lang.String> getDefaultDisabledRulesForVariant()
        Get disabled rules different from the default ones for this language variant.
        Returns:
        disabled rules for the language variant.
        Since:
        2.4
      • getLanguageModel

        @Nullable
        public @Nullable LanguageModel getLanguageModel​(java.io.File indexDir)
                                                 throws java.io.IOException
        Parameters:
        indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
        Returns:
        a LanguageModel or null if this language doesn't support one
        Throws:
        java.io.IOException
        Since:
        2.7
      • getRelevantLanguageModelRules

        public java.util.List<Rule> getRelevantLanguageModelRules​(java.util.ResourceBundle messages,
                                                                  LanguageModel languageModel)
                                                           throws java.io.IOException
        Get a list of rules that require a LanguageModel. Returns an empty list for languages that don't have such rules.
        Throws:
        java.io.IOException
        Since:
        2.7
      • getRelevantLanguageModelCapableRules

        public java.util.List<Rule> getRelevantLanguageModelCapableRules​(java.util.ResourceBundle messages,
                                                                         @Nullable
                                                                         @Nullable LanguageModel languageModel,
                                                                         UserConfig userConfig,
                                                                         Language motherTongue,
                                                                         java.util.List<Language> altLanguages)
                                                                  throws java.io.IOException
        Get a list of rules that can optionally use a LanguageModel. Returns an empty list for languages that don't have such rules.
        Parameters:
        languageModel - null if no language model is available
        Throws:
        java.io.IOException
        Since:
        4.5
      • getWord2VecModel

        @Nullable
        public @Nullable Word2VecModel getWord2VecModel​(java.io.File indexDir)
                                                 throws java.io.IOException
        Parameters:
        indexDir - directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt
        Returns:
        a Word2VecModel or null if this language doesn't support one
        Throws:
        java.io.IOException
        Since:
        4.0
      • getRelevantWord2VecModelRules

        public java.util.List<Rule> getRelevantWord2VecModelRules​(java.util.ResourceBundle messages,
                                                                  Word2VecModel word2vecModel)
                                                           throws java.io.IOException
        Get a list of rules that require a Word2VecModel. Returns an empty list for languages that don't have such rules.
        Throws:
        java.io.IOException
        Since:
        4.0
      • getRelevantNeuralNetworkModels

        public java.util.List<Rule> getRelevantNeuralNetworkModels​(java.util.ResourceBundle messages,
                                                                   java.io.File modelDir)
        Get a list of rules that load trained neural networks. Returns an empty list for languages that don't have such rules.
        Since:
        4.4
      • getRelevantRulesGlobalConfig

        public java.util.List<Rule> getRelevantRulesGlobalConfig​(java.util.ResourceBundle messages,
                                                                 GlobalConfig globalConfig,
                                                                 UserConfig userConfig,
                                                                 Language motherTongue,
                                                                 java.util.List<Language> altLanguages)
                                                          throws java.io.IOException
        Get the rules classes that should run for texts in this language.
        Throws:
        java.io.IOException
        Since:
        4.6
      • getLocale

        public java.util.Locale getLocale()
        Get this language's Java locale, not considering the country code.
      • getLocaleWithCountryAndVariant

        public java.util.Locale getLocaleWithCountryAndVariant()
        Get this language's Java locale, considering language code and country code (if any).
        Since:
        2.1
      • getRuleFileNames

        public java.util.List<java.lang.String> getRuleFileNames()
        Get the location of the rule file(s) in a form like /org/languagetool/rules/de/grammar.xml, i.e. a path in the classpath. The files must exist or an exception will be thrown, unless the filename contains the string -test-.
      • getDefaultLanguageVariant

        @Nullable
        public @Nullable Language getDefaultLanguageVariant()
        Languages that have country variants need to overwrite this to select their most common variant.
        Returns:
        default country variant or null
        Since:
        1.8
      • getDisambiguator

        public Disambiguator getDisambiguator()
        Get this language's part-of-speech disambiguator implementation.
      • getTagger

        public Tagger getTagger()
        Get this language's part-of-speech tagger implementation. The tagger must not be null, but it can be a trivial pseudo-tagger that only assigns null tags.
      • getSentenceTokenizer

        public SentenceTokenizer getSentenceTokenizer()
        Get this language's sentence tokenizer implementation.
      • getWordTokenizer

        public Tokenizer getWordTokenizer()
        Get this language's word tokenizer implementation.
      • getChunker

        @Nullable
        public @Nullable Chunker getChunker()
        Get this language's chunker implementation or null.
        Since:
        2.3
      • getPostDisambiguationChunker

        @Nullable
        public @Nullable Chunker getPostDisambiguationChunker()
        Get this language's chunker implementation or null.
        Since:
        2.9
      • getSynthesizer

        @Nullable
        public @Nullable Synthesizer getSynthesizer()
        Get this language's part-of-speech synthesizer implementation or null.
      • getUnifier

        public Unifier getUnifier()
        Get this language's feature unifier.
        Returns:
        Feature unifier for analyzed tokens.
      • getDisambiguationUnifier

        public Unifier getDisambiguationUnifier()
        Get this language's feature unifier used for disambiguation. Note: it might be different from the normal rule unifier.
        Returns:
        Feature unifier for analyzed tokens.
      • getDisambiguationUnifierConfiguration

        public UnifierConfiguration getDisambiguationUnifierConfiguration()
        Since:
        2.3
      • getTranslatedName

        public final java.lang.String getTranslatedName​(java.util.ResourceBundle messages)
        Get the name of the language translated to the current locale, if available. Otherwise, get the untranslated name.
      • getShortCodeWithCountryAndVariant

        public final java.lang.String getShortCodeWithCountryAndVariant()
        Get the short name of the language with country and variant (if any), if it is a single-country language. For generic language classes, get only a two- or three-character code.
        Since:
        3.6
      • getPatternRules

        protected java.util.List<AbstractPatternRule> getPatternRules()
                                                               throws java.io.IOException
        Get the pattern rules as defined in the files returned by getRuleFileNames().
        Throws:
        java.io.IOException
        Since:
        2.7
      • toString

        public final java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • isVariant

        public final boolean isVariant()
        Whether this is a country variant of another language, i.e. whether it doesn't directly extend Language, but a subclass of Language.
        Since:
        1.8
      • hasVariant

        public final boolean hasVariant()
        Whether this class has at least one subclass that implements variants of this language.
        Since:
        1.8
      • isExternal

        public boolean isExternal()
        For internal use only. Overwritten to return true for languages that have been loaded from an external file after start up.
      • equalsConsiderVariantsIfSpecified

        public boolean equalsConsiderVariantsIfSpecified​(Language otherLanguage)
        Return true if this is the same language as the given one, considering country variants only if set for both languages. For example: en = en, en = en-GB, en-GB = en-GB, but en-US != en-GB
        Since:
        1.8
      • hasCountry

        private boolean hasCountry()
      • getIgnoredCharactersRegex

        public java.util.regex.Pattern getIgnoredCharactersRegex()
        Returns:
        Return compiled regular expression to ignore inside tokens
        Since:
        2.9
      • getMaintainedState

        public LanguageMaintainedState getMaintainedState()
        Information about whether the support for this language in LanguageTool is actively maintained. If not, the user interface might show a warning.
        Since:
        3.3
      • isHiddenFromGui

        public boolean isHiddenFromGui()
      • isTheDefaultVariant

        private boolean isTheDefaultVariant()
      • getPriorityForId

        public int getPriorityForId​(java.lang.String id)
        Returns a priority for Rule or Category Id (default: 0). Positive integers have higher priority. Negative integers have lower priority.
        Since:
        3.6
      • isSpellcheckOnlyLanguage

        public boolean isSpellcheckOnlyLanguage()
        Whether this language supports spell checking only and no advanced grammar and style checking.
        Since:
        4.5
      • equals

        public boolean equals​(java.lang.Object o)
        Considers languages as equal if their language code, including the country and variant codes are equal.
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object