Class JLanguageTool

  • Direct Known Subclasses:
    MultiThreadedJLanguageTool

    public class JLanguageTool
    extends java.lang.Object
    The main class used for checking text against different rules:
    • built-in Java rules (for English: a vs. an, whitespace after commas, ...)
    • built-in pattern rules loaded from external XML files (usually called grammar.xml)
    • your own implementation of the abstract Rule classes added with addRule(Rule)

    You will probably want to use the sub class MultiThreadedJLanguageTool for best performance.

    Thread-safety: this class is not thread safe. Create one instance per thread, but create the language only once (e.g. new AmericanEnglish()) and use it for all instances of JLanguageTool.

    See Also:
    MultiThreadedJLanguageTool
    • Field Detail

      • VERSION

        public static final java.lang.String VERSION
        LanguageTool version as a string like 2.3 or 2.4-SNAPSHOT.
        See Also:
        Constant Field Values
      • BUILD_DATE

        @Nullable
        public static final @Nullable java.lang.String BUILD_DATE
        LanguageTool build date and time like 2013-10-17 16:10 or null if not run from JAR.
      • GIT_SHORT_ID

        @Nullable
        public static final @Nullable java.lang.String GIT_SHORT_ID
        Abbreviated git id or null if not available.
        Since:
        4.5
      • PATTERN_FILE

        public static final java.lang.String PATTERN_FILE
        The name of the file with error patterns.
        See Also:
        Constant Field Values
      • FALSE_FRIEND_FILE

        public static final java.lang.String FALSE_FRIEND_FILE
        The name of the file with false friend information.
        See Also:
        Constant Field Values
      • SENTENCE_START_TAGNAME

        public static final java.lang.String SENTENCE_START_TAGNAME
        The internal tag used to mark the beginning of a sentence.
        See Also:
        Constant Field Values
      • SENTENCE_END_TAGNAME

        public static final java.lang.String SENTENCE_END_TAGNAME
        The internal tag used to mark the end of a sentence.
        See Also:
        Constant Field Values
      • PARAGRAPH_END_TAGNAME

        public static final java.lang.String PARAGRAPH_END_TAGNAME
        The internal tag used to mark the end of a paragraph.
        See Also:
        Constant Field Values
      • MESSAGE_BUNDLE

        public static final java.lang.String MESSAGE_BUNDLE
        Name of the message bundle for translations.
        See Also:
        Constant Field Values
      • DICTIONARY_FILENAME_EXTENSION

        public static final java.lang.String DICTIONARY_FILENAME_EXTENSION
        Extension of dictionary files read by Spellers
        See Also:
        Constant Field Values
      • maxErrorsPerWordRate

        private float maxErrorsPerWordRate
      • builtinRules

        private final java.util.List<Rule> builtinRules
      • userRules

        private final java.util.List<Rule> userRules
      • optionalLanguageModelRules

        private final java.util.Set<java.lang.String> optionalLanguageModelRules
      • disabledRules

        private final java.util.Set<java.lang.String> disabledRules
      • disabledRuleCategories

        private final java.util.Set<CategoryId> disabledRuleCategories
      • enabledRules

        private final java.util.Set<java.lang.String> enabledRules
      • enabledRuleCategories

        private final java.util.Set<CategoryId> enabledRuleCategories
      • language

        private final Language language
      • altLanguages

        private final java.util.List<Language> altLanguages
      • motherTongue

        private final Language motherTongue
      • printStream

        private java.io.PrintStream printStream
      • listUnknownWords

        private boolean listUnknownWords
      • unknownWords

        private java.util.Set<java.lang.String> unknownWords
      • cleanOverlappingMatches

        private boolean cleanOverlappingMatches
      • temporaryFiles

        private static final java.util.List<java.io.File> temporaryFiles
    • Constructor Detail

      • JLanguageTool

        public JLanguageTool​(Language lang,
                             Language motherTongue)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        lang - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
      • JLanguageTool

        public JLanguageTool​(Language language)
        Create a JLanguageTool and setup the built-in Java rules for the given language.
        Parameters:
        language - the language of the text to be checked
      • JLanguageTool

        public JLanguageTool​(Language language,
                             Language motherTongue,
                             ResultCache cache)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        3.7
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             ResultCache cache,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes. Use null to deactivate the cache.
        Since:
        4.2
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             java.util.List<Language> altLanguages,
                             Language motherTongue,
                             ResultCache cache,
                             GlobalConfig globalConfig,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        altLanguages - The languages that are accepted as alternative languages - currently this means words are accepted if they are in an alternative language and not similar to a word from language. If there's a similar word in language, there will be an error of type RuleMatch.Type.Hint (EXPERIMENTAL)
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        4.3
      • JLanguageTool

        @Experimental
        public JLanguageTool​(Language language,
                             Language motherTongue,
                             ResultCache cache,
                             UserConfig userConfig)
        Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
        Parameters:
        language - the language of the text to be checked
        motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
        cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
        Since:
        4.2
    • Method Detail

      • getBuildDate

        @Nullable
        private static @Nullable java.lang.String getBuildDate()
        Returns the build date or null if not run from JAR.
      • getShortGitId

        @Nullable
        private static @Nullable java.lang.String getShortGitId()
        Returns the abbreviated git id or null.
      • isPremiumVersion

        public static boolean isPremiumVersion()
        Since:
        4.2
      • getDataBroker

        public static ResourceDataBroker getDataBroker()
        The grammar checker needs resources from following directories:
        • /resource
        • /rules
        Returns:
        The currently set data broker which allows to obtain resources from the mentioned directories above. If no data broker was set, a new DefaultResourceDataBroker will be instantiated and returned.
        Since:
        1.0.1
      • setDataBroker

        public static void setDataBroker​(ResourceDataBroker broker)
        The grammar checker needs resources from following directories:
        • /resource
        • /rules
        Parameters:
        broker - The new resource broker to be used.
        Since:
        1.0.1
      • setListUnknownWords

        public void setListUnknownWords​(boolean listUnknownWords)
        Whether the check(String) methods store unknown words. If set to true (default: false), you can get the list of unknown words using getUnknownWords().
      • setCleanOverlappingMatches

        public void setCleanOverlappingMatches​(boolean cleanOverlappingMatches)
        Whether the check(String) methods return overlapping errors. If set to true (default: true), it removes overlapping errors according to the priorities established for the language.
        Since:
        3.6
      • setMaxErrorsPerWordRate

        @Experimental
        public void setMaxErrorsPerWordRate​(float maxErrorsPerWordRate)
        Maximum errors per word rate, checking will stop with an exception if the rate is higher. For example, with a rate of 0.33, the checking would stop if the user's text has so many errors that more than every 3rd word causes a rule match. Note that this may not apply for very short texts.
        Since:
        4.0
      • getMessageBundle

        public static java.util.ResourceBundle getMessageBundle()
        Gets the ResourceBundle (i18n strings) for the default language of the user's system.
      • getMessageBundle

        public static java.util.ResourceBundle getMessageBundle​(Language lang)
        Gets the ResourceBundle (i18n strings) for the given user interface language.
        Since:
        2.4 (public since 2.4)
      • getAllBuiltinRules

        private java.util.List<Rule> getAllBuiltinRules​(Language language,
                                                        java.util.ResourceBundle messages,
                                                        UserConfig userConfig,
                                                        GlobalConfig globalConfig)
      • setOutput

        public void setOutput​(java.io.PrintStream printStream)
        Set a PrintStream that will receive verbose output. Set to null (which is the default) to disable verbose output.
      • loadPatternRules

        public java.util.List<AbstractPatternRule> loadPatternRules​(java.lang.String filename)
                                                             throws java.io.IOException
        Load pattern rules from an XML file. Use addRule(Rule) to add these rules to the checking process.
        Parameters:
        filename - path to an XML file in the classpath or in the filesystem - the classpath is checked first
        Returns:
        a List of PatternRule objects
        Throws:
        java.io.IOException
      • loadFalseFriendRules

        public java.util.List<AbstractPatternRule> loadFalseFriendRules​(java.lang.String filename)
                                                                 throws javax.xml.parsers.ParserConfigurationException,
                                                                        org.xml.sax.SAXException,
                                                                        java.io.IOException
        Load false friend rules from an XML file. Only those pairs will be loaded that match the current text language and the mother tongue specified in the JLanguageTool constructor. Use addRule(Rule) to add these rules to the checking process.
        Parameters:
        filename - path to an XML file in the classpath or in the filesystem - the classpath is checked first
        Returns:
        a List of PatternRule objects, or an empty list if mother tongue is not set
        Throws:
        javax.xml.parsers.ParserConfigurationException
        org.xml.sax.SAXException
        java.io.IOException
      • updateOptionalLanguageModelRules

        private void updateOptionalLanguageModelRules​(@Nullable
                                                      @Nullable LanguageModel lm)
        Remove rules that can profit from a language model, recreate them with the given model and add them again
        Parameters:
        lm - the language model or null if none is available
      • activateNeuralNetworkRules

        public void activateNeuralNetworkRules​(java.io.File modelDir)
                                        throws java.io.IOException
        Activate rules that depend on pretrained neural network models.
        Parameters:
        modelDir - root dir of exported models
        Throws:
        java.io.IOException
        Since:
        4.4
      • activateLanguageModelRules

        public void activateLanguageModelRules​(java.io.File indexDir)
                                        throws java.io.IOException
        Activate rules that depend on a language model. The language model currently consists of Lucene indexes with ngram occurrence counts.
        Parameters:
        indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
        Throws:
        java.io.IOException
        Since:
        2.7
      • activateWord2VecModelRules

        public void activateWord2VecModelRules​(java.io.File indexDir)
                                        throws java.io.IOException
        Activate rules that depend on a word2vec language model.
        Parameters:
        indexDir - directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt
        Throws:
        java.io.IOException
        Since:
        4.0
      • activateDefaultPatternRules

        private void activateDefaultPatternRules()
                                          throws java.io.IOException
        Loads and activates the pattern rules from org/languagetool/rules/<languageCode>/grammar.xml.
        Throws:
        java.io.IOException
      • activateDefaultFalseFriendRules

        private void activateDefaultFalseFriendRules()
                                              throws javax.xml.parsers.ParserConfigurationException,
                                                     org.xml.sax.SAXException,
                                                     java.io.IOException
        Loads and activates the false friend rules from rules/false-friends.xml.
        Throws:
        javax.xml.parsers.ParserConfigurationException
        org.xml.sax.SAXException
        java.io.IOException
      • addMatchFilter

        public void addMatchFilter​(@NotNull
                                   @NotNull RuleMatchFilter filter)
        Add a RuleMatchFilter for post-processing of rule matches Filters are called sequentially in the same order as added
        Parameters:
        filter - filter to add
        Since:
        4.7
      • addRule

        public void addRule​(Rule rule)
        Add a rule to be used by the next call to the check methods like check(String).
      • disableRule

        public void disableRule​(java.lang.String ruleId)
        Disable a given rule so the check methods like check(String) won't use it.
        Parameters:
        ruleId - the id of the rule to disable - no error will be thrown if the id does not exist
        See Also:
        enableRule(String)
      • disableRules

        public void disableRules​(java.util.List<java.lang.String> ruleIds)
        Disable the given rules so the check methods like check(String) won't use them.
        Parameters:
        ruleIds - the ids of the rules to disable - no error will be thrown if the id does not exist
        Since:
        2.4
      • disableCategory

        public void disableCategory​(CategoryId id)
        Disable the given rule category so the check methods like check(String) won't use it.
        Parameters:
        id - the id of the category to disable - no error will be thrown if the id does not exist
        Since:
        3.3
        See Also:
        enableRuleCategory(CategoryId)
      • isCategoryDisabled

        public boolean isCategoryDisabled​(CategoryId id)
        Returns true if a category is explicitly disabled.
        Parameters:
        id - the id of the category to check - no error will be thrown if the id does not exist
        Returns:
        true if this category is explicitly disabled.
        Since:
        3.5
        See Also:
        disableCategory(org.languagetool.rules.CategoryId)
      • getLanguage

        public Language getLanguage()
        Get the language that was used to configure this instance.
      • getDisabledRules

        public java.util.Set<java.lang.String> getDisabledRules()
        Get rule ids of the rules that have been explicitly disabled.
      • enableRule

        public void enableRule​(java.lang.String ruleId)
        Enable a given rule so the check methods like check(String) will use it. This will not throw an exception if the given rule id doesn't exist.
        Parameters:
        ruleId - the id of the rule to enable
        See Also:
        disableRule(String)
      • sentenceTokenize

        public java.util.List<java.lang.String> sentenceTokenize​(java.lang.String text)
        Tokenizes the given text into sentences.
      • check

        public java.util.List<RuleMatch> check​(java.lang.String text)
                                        throws java.io.IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Parameters:
        text - the text to be checked
        Returns:
        a List of RuleMatch objects
        Throws:
        java.io.IOException
      • check

        public java.util.List<RuleMatch> check​(java.lang.String text,
                                               RuleMatchListener listener)
                                        throws java.io.IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Parameters:
        text - the text to be checked
        Returns:
        a List of RuleMatch objects
        Throws:
        java.io.IOException
        Since:
        3.7
      • check

        public java.util.List<RuleMatch> check​(AnnotatedText text)
                                        throws java.io.IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules, adjusting error positions so they refer to the original text including markup.
        Throws:
        java.io.IOException
        Since:
        2.3
      • check

        public java.util.List<RuleMatch> check​(AnnotatedText annotatedText,
                                               boolean tokenizeText,
                                               JLanguageTool.ParagraphHandling paraMode)
                                        throws java.io.IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Parameters:
        annotatedText - The text to be checked, created with AnnotatedTextBuilder. Call this method with the complete text to be checked. If you call it repeatedly with smaller chunks like paragraphs or sentence, those rules that work across paragraphs/sentences won't work (their status gets reset whenever this method is called).
        tokenizeText - If true, then the text is tokenized into sentences. Otherwise, it is assumed it's already tokenized, i.e. it is only one sentence
        paraMode - Uses paragraph-level rules only if true.
        Returns:
        a List of RuleMatch objects, describing potential errors in the text
        Throws:
        java.io.IOException
        Since:
        2.3
      • check

        public java.util.List<RuleMatch> check​(AnnotatedText annotatedText,
                                               boolean tokenizeText,
                                               JLanguageTool.ParagraphHandling paraMode,
                                               RuleMatchListener listener)
                                        throws java.io.IOException
        The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
        Throws:
        java.io.IOException
        Since:
        3.7
      • analyzeText

        public java.util.List<AnalyzedSentence> analyzeText​(java.lang.String text)
                                                     throws java.io.IOException
        Use this method if you want to access LanguageTool's otherwise internal analysis of the text. For actual text checking, use the check... methods instead.
        Parameters:
        text - The text to be analyzed
        Throws:
        java.io.IOException
        Since:
        2.5
      • analyzeSentences

        protected java.util.List<AnalyzedSentence> analyzeSentences​(java.util.List<java.lang.String> sentences)
                                                             throws java.io.IOException
        Throws:
        java.io.IOException
      • printSentenceInfo

        protected void printSentenceInfo​(AnalyzedSentence analyzedSentence)
      • checkAnalyzedSentence

        public java.util.List<RuleMatch> checkAnalyzedSentence​(JLanguageTool.ParagraphHandling paraMode,
                                                               java.util.List<Rule> rules,
                                                               AnalyzedSentence analyzedSentence)
                                                        throws java.io.IOException
        This is an internal method that's public only for technical reasons, please use one of the check(String) methods instead.
        Throws:
        java.io.IOException
        Since:
        2.3
      • ignoreRule

        private boolean ignoreRule​(Rule rule)
      • adjustRuleMatchPos

        public RuleMatch adjustRuleMatchPos​(RuleMatch match,
                                            int charCount,
                                            int columnCount,
                                            int lineCount,
                                            java.lang.String sentence,
                                            AnnotatedText annotatedText)
        Change RuleMatch positions so they are relative to the complete text, not just to the sentence.
        Parameters:
        charCount - Count of characters in the sentences before
        columnCount - Current column number
        lineCount - Current line number
        sentence - The text being checked
        Returns:
        The RuleMatch object with adjustments
      • rememberUnknownWords

        protected void rememberUnknownWords​(AnalyzedSentence analyzedText)
      • getUnknownWords

        public java.util.List<java.lang.String> getUnknownWords()
        Get the alphabetically sorted list of unknown words in the latest run of one of the check(String) methods.
        Throws:
        java.lang.IllegalStateException - if setListUnknownWords(boolean) has been set to false
      • countLineBreaks

        static int countLineBreaks​(java.lang.String s)
      • getAnalyzedSentence

        public AnalyzedSentence getAnalyzedSentence​(java.lang.String sentence)
                                             throws java.io.IOException
        Tokenizes the given sentence into words and analyzes it, and then disambiguates POS tags.
        Parameters:
        sentence - sentence to be analyzed
        Throws:
        java.io.IOException
      • getRawAnalyzedSentence

        public AnalyzedSentence getRawAnalyzedSentence​(java.lang.String sentence)
                                                throws java.io.IOException
        Tokenizes the given sentence into words and analyzes it. This is the same as getAnalyzedSentence(String) but it does not run the disambiguator.
        Parameters:
        sentence - sentence to be analyzed
        Throws:
        java.io.IOException
        Since:
        0.9.8
      • replaceSoftHyphens

        private java.util.Map<java.lang.Integer,​java.lang.String> replaceSoftHyphens​(java.util.List<java.lang.String> tokens)
      • getCategories

        public java.util.Map<CategoryId,​Category> getCategories()
        Get all rule categories for the current language.
        Returns:
        a map of Categories, keyed by their id.
        Since:
        3.5
      • getAllRules

        public java.util.List<Rule> getAllRules()
        Get all rules for the current language that are built-in or that have been added using addRule(Rule). Please note that XML rules that are grouped will appear as multiple rules with the same id. To tell them apart, check if they are of type AbstractPatternRule, cast them to that type and call their AbstractPatternRule.getSubId() method.
        Returns:
        a List of Rule objects
      • getAllActiveRules

        public java.util.List<Rule> getAllActiveRules()
        Get all active (not disabled) rules for the current language that are built-in or that have been added using e.g. addRule(Rule). See getAllRules() for hints about rule ids.
        Returns:
        a List of Rule objects
      • getAllActiveOfficeRules

        public java.util.List<Rule> getAllActiveOfficeRules()
        Works like getAllActiveRules but overrides defaults by office defaults
        Returns:
        a List of Rule objects
        Since:
        4.0
      • getPatternRulesByIdAndSubId

        public java.util.List<AbstractPatternRule> getPatternRulesByIdAndSubId​(java.lang.String id,
                                                                               java.lang.String subId)
        Get pattern rules by Id and SubId. This returns a list because rules that use <or>...</or> are internally expanded into several rules.
        Returns:
        a List of Rule objects
        Since:
        2.3
      • printIfVerbose

        protected void printIfVerbose​(java.lang.String s)
      • addTemporaryFile

        public static void addTemporaryFile​(java.io.File file)
        Adds a temporary file to the internal list (internal method, you should never need to call this as a user of LanguageTool)
        Parameters:
        file - the file to be added.
      • removeTemporaryFiles

        public static void removeTemporaryFiles()
        Clean up all temporary files, if there are any.
      • applyCustomFilters

        protected java.util.List<RuleMatch> applyCustomFilters​(java.util.List<RuleMatch> matches,
                                                               AnnotatedText text)
        should be called just once with complete list of matches, before returning them to caller
        Parameters:
        matches - matches after applying rules and default filters
        text - text that matches refer to
        Returns:
        transformed matches (after applying filters in matchFilters)
        Since:
        4.7
      • setConfigValues

        public void setConfigValues​(java.util.Map<java.lang.String,​java.lang.Integer> v)