Class JLanguageTool

java.lang.Object
org.languagetool.JLanguageTool
Direct Known Subclasses:
MultiThreadedJLanguageTool

public class JLanguageTool extends Object
The main class used for checking text against different rules:
  • built-in Java rules (for English: a vs. an, whitespace after commas, ...)
  • built-in pattern rules loaded from external XML files (usually called grammar.xml)
  • your own implementation of the abstract Rule classes added with addRule(Rule)

You will probably want to use the sub class MultiThreadedJLanguageTool for best performance.

Thread-safety: this class is not thread safe. Create one instance per thread, but create the language only once (e.g. new AmericanEnglish()) and use it for all instances of JLanguageTool.

See Also:
  • Field Details

    • VERSION

      public static final String VERSION
      LanguageTool version as a string like 2.3 or 2.4-SNAPSHOT.
      See Also:
    • BUILD_DATE

      @Nullable public static final @Nullable String BUILD_DATE
      LanguageTool build date and time like 2013-10-17 16:10 or null if not run from JAR.
    • GIT_SHORT_ID

      @Nullable public static final @Nullable String GIT_SHORT_ID
      Abbreviated git id or null if not available.
      Since:
      4.5
    • PATTERN_FILE

      public static final String PATTERN_FILE
      The name of the file with error patterns.
      See Also:
    • FALSE_FRIEND_FILE

      public static final String FALSE_FRIEND_FILE
      The name of the file with false friend information.
      See Also:
    • SENTENCE_START_TAGNAME

      public static final String SENTENCE_START_TAGNAME
      The internal tag used to mark the beginning of a sentence.
      See Also:
    • SENTENCE_END_TAGNAME

      public static final String SENTENCE_END_TAGNAME
      The internal tag used to mark the end of a sentence.
      See Also:
    • PARAGRAPH_END_TAGNAME

      public static final String PARAGRAPH_END_TAGNAME
      The internal tag used to mark the end of a paragraph.
      See Also:
    • MESSAGE_BUNDLE

      public static final String MESSAGE_BUNDLE
      Name of the message bundle for translations.
      See Also:
    • DICTIONARY_FILENAME_EXTENSION

      public static final String DICTIONARY_FILENAME_EXTENSION
      Extension of dictionary files read by Spellers
      See Also:
    • cache

      private final ResultCache cache
    • userConfig

      private final UserConfig userConfig
    • descProvider

      private final ShortDescriptionProvider descProvider
    • maxErrorsPerWordRate

      private float maxErrorsPerWordRate
    • dataBroker

      private static ResourceDataBroker dataBroker
    • builtinRules

      private final List<Rule> builtinRules
    • userRules

      private final List<Rule> userRules
    • optionalLanguageModelRules

      private final Set<String> optionalLanguageModelRules
    • disabledRules

      private final Set<String> disabledRules
    • disabledRuleCategories

      private final Set<CategoryId> disabledRuleCategories
    • enabledRules

      private final Set<String> enabledRules
    • enabledRuleCategories

      private final Set<CategoryId> enabledRuleCategories
    • language

      private final Language language
    • altLanguages

      private final List<Language> altLanguages
    • motherTongue

      private final Language motherTongue
    • matchFilters

      private final List<RuleMatchFilter> matchFilters
    • printStream

      private PrintStream printStream
    • listUnknownWords

      private boolean listUnknownWords
    • unknownWords

      private Set<String> unknownWords
    • cleanOverlappingMatches

      private boolean cleanOverlappingMatches
    • temporaryFiles

      private static final List<File> temporaryFiles
  • Constructor Details

    • JLanguageTool

      public JLanguageTool(Language lang, Language motherTongue)
      Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
      Parameters:
      lang - the language of the text to be checked
      motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
    • JLanguageTool

      public JLanguageTool(Language language)
      Create a JLanguageTool and setup the built-in Java rules for the given language.
      Parameters:
      language - the language of the text to be checked
    • JLanguageTool

      public JLanguageTool(Language language, Language motherTongue, ResultCache cache)
      Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
      Parameters:
      language - the language of the text to be checked
      motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
      cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
      Since:
      3.7
    • JLanguageTool

      @Experimental public JLanguageTool(Language language, ResultCache cache, UserConfig userConfig)
      Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
      Parameters:
      language - the language of the text to be checked
      cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes. Use null to deactivate the cache.
      Since:
      4.2
    • JLanguageTool

      @Experimental public JLanguageTool(Language language, List<Language> altLanguages, Language motherTongue, ResultCache cache, GlobalConfig globalConfig, UserConfig userConfig)
      Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
      Parameters:
      language - the language of the text to be checked
      altLanguages - The languages that are accepted as alternative languages - currently this means words are accepted if they are in an alternative language and not similar to a word from language. If there's a similar word in language, there will be an error of type RuleMatch.Type.Hint (EXPERIMENTAL)
      motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
      cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
      Since:
      4.3
    • JLanguageTool

      @Experimental public JLanguageTool(Language language, Language motherTongue, ResultCache cache, UserConfig userConfig)
      Create a JLanguageTool and setup the built-in rules for the given language and false friend rules for the text language / mother tongue pair.
      Parameters:
      language - the language of the text to be checked
      motherTongue - the user's mother tongue, used for false friend rules, or null. The mother tongue may also be used as a source language for checking bilingual texts.
      cache - a cache to speed up checking if the same sentences get checked more than once, e.g. when LT is running as a server and texts are re-checked due to changes
      Since:
      4.2
  • Method Details

    • getBuildDate

      @Nullable private static @Nullable String getBuildDate()
      Returns the build date or null if not run from JAR.
    • getShortGitId

      @Nullable private static @Nullable String getShortGitId()
      Returns the abbreviated git id or null.
    • isPremiumVersion

      public static boolean isPremiumVersion()
      Since:
      4.2
    • getDataBroker

      public static ResourceDataBroker getDataBroker()
      The grammar checker needs resources from following directories:
      • /resource
      • /rules
      Returns:
      The currently set data broker which allows to obtain resources from the mentioned directories above. If no data broker was set, a new DefaultResourceDataBroker will be instantiated and returned.
      Since:
      1.0.1
    • setDataBroker

      public static void setDataBroker(ResourceDataBroker broker)
      The grammar checker needs resources from following directories:
      • /resource
      • /rules
      Parameters:
      broker - The new resource broker to be used.
      Since:
      1.0.1
    • setListUnknownWords

      public void setListUnknownWords(boolean listUnknownWords)
      Whether the check(String) methods store unknown words. If set to true (default: false), you can get the list of unknown words using getUnknownWords().
    • setCleanOverlappingMatches

      public void setCleanOverlappingMatches(boolean cleanOverlappingMatches)
      Whether the check(String) methods return overlapping errors. If set to true (default: true), it removes overlapping errors according to the priorities established for the language.
      Since:
      3.6
    • setMaxErrorsPerWordRate

      @Experimental public void setMaxErrorsPerWordRate(float maxErrorsPerWordRate)
      Maximum errors per word rate, checking will stop with an exception if the rate is higher. For example, with a rate of 0.33, the checking would stop if the user's text has so many errors that more than every 3rd word causes a rule match. Note that this may not apply for very short texts.
      Since:
      4.0
    • getMessageBundle

      public static ResourceBundle getMessageBundle()
      Gets the ResourceBundle (i18n strings) for the default language of the user's system.
    • getMessageBundle

      public static ResourceBundle getMessageBundle(Language lang)
      Gets the ResourceBundle (i18n strings) for the given user interface language.
      Since:
      2.4 (public since 2.4)
    • getAllBuiltinRules

      private List<Rule> getAllBuiltinRules(Language language, ResourceBundle messages, UserConfig userConfig, GlobalConfig globalConfig)
    • setOutput

      public void setOutput(PrintStream printStream)
      Set a PrintStream that will receive verbose output. Set to null (which is the default) to disable verbose output.
    • loadPatternRules

      public List<AbstractPatternRule> loadPatternRules(String filename) throws IOException
      Load pattern rules from an XML file. Use addRule(Rule) to add these rules to the checking process.
      Parameters:
      filename - path to an XML file in the classpath or in the filesystem - the classpath is checked first
      Returns:
      a List of PatternRule objects
      Throws:
      IOException
    • loadFalseFriendRules

      public List<AbstractPatternRule> loadFalseFriendRules(String filename) throws ParserConfigurationException, SAXException, IOException
      Load false friend rules from an XML file. Only those pairs will be loaded that match the current text language and the mother tongue specified in the JLanguageTool constructor. Use addRule(Rule) to add these rules to the checking process.
      Parameters:
      filename - path to an XML file in the classpath or in the filesystem - the classpath is checked first
      Returns:
      a List of PatternRule objects, or an empty list if mother tongue is not set
      Throws:
      ParserConfigurationException
      SAXException
      IOException
    • updateOptionalLanguageModelRules

      private void updateOptionalLanguageModelRules(@Nullable @Nullable LanguageModel lm)
      Remove rules that can profit from a language model, recreate them with the given model and add them again
      Parameters:
      lm - the language model or null if none is available
    • activateNeuralNetworkRules

      public void activateNeuralNetworkRules(File modelDir) throws IOException
      Activate rules that depend on pretrained neural network models.
      Parameters:
      modelDir - root dir of exported models
      Throws:
      IOException
      Since:
      4.4
    • activateLanguageModelRules

      public void activateLanguageModelRules(File indexDir) throws IOException
      Activate rules that depend on a language model. The language model currently consists of Lucene indexes with ngram occurrence counts.
      Parameters:
      indexDir - directory with a '3grams' sub directory which contains a Lucene index with 3gram occurrence counts
      Throws:
      IOException
      Since:
      2.7
    • activateWord2VecModelRules

      public void activateWord2VecModelRules(File indexDir) throws IOException
      Activate rules that depend on a word2vec language model.
      Parameters:
      indexDir - directory with a subdirectories like 'en', each containing dictionary.txt and final_embeddings.txt
      Throws:
      IOException
      Since:
      4.0
    • activateDefaultPatternRules

      private void activateDefaultPatternRules() throws IOException
      Loads and activates the pattern rules from org/languagetool/rules/<languageCode>/grammar.xml.
      Throws:
      IOException
    • activateDefaultFalseFriendRules

      private void activateDefaultFalseFriendRules() throws ParserConfigurationException, SAXException, IOException
      Loads and activates the false friend rules from rules/false-friends.xml.
      Throws:
      ParserConfigurationException
      SAXException
      IOException
    • addMatchFilter

      public void addMatchFilter(@NotNull @NotNull RuleMatchFilter filter)
      Add a RuleMatchFilter for post-processing of rule matches Filters are called sequentially in the same order as added
      Parameters:
      filter - filter to add
      Since:
      4.7
    • addRule

      public void addRule(Rule rule)
      Add a rule to be used by the next call to the check methods like check(String).
    • disableRule

      public void disableRule(String ruleId)
      Disable a given rule so the check methods like check(String) won't use it.
      Parameters:
      ruleId - the id of the rule to disable - no error will be thrown if the id does not exist
      See Also:
    • disableRules

      public void disableRules(List<String> ruleIds)
      Disable the given rules so the check methods like check(String) won't use them.
      Parameters:
      ruleIds - the ids of the rules to disable - no error will be thrown if the id does not exist
      Since:
      2.4
    • disableCategory

      public void disableCategory(CategoryId id)
      Disable the given rule category so the check methods like check(String) won't use it.
      Parameters:
      id - the id of the category to disable - no error will be thrown if the id does not exist
      Since:
      3.3
      See Also:
    • isCategoryDisabled

      public boolean isCategoryDisabled(CategoryId id)
      Returns true if a category is explicitly disabled.
      Parameters:
      id - the id of the category to check - no error will be thrown if the id does not exist
      Returns:
      true if this category is explicitly disabled.
      Since:
      3.5
      See Also:
    • getLanguage

      public Language getLanguage()
      Get the language that was used to configure this instance.
    • getDisabledRules

      public Set<String> getDisabledRules()
      Get rule ids of the rules that have been explicitly disabled.
    • enableRule

      public void enableRule(String ruleId)
      Enable a given rule so the check methods like check(String) will use it. This will not throw an exception if the given rule id doesn't exist.
      Parameters:
      ruleId - the id of the rule to enable
      See Also:
    • enableRuleCategory

      public void enableRuleCategory(CategoryId id)
      Enable all rules of the given category so the check methods like check(String) will use it. This will not throw an exception if the given rule id doesn't exist.
      Since:
      3.3
      See Also:
    • sentenceTokenize

      public List<String> sentenceTokenize(String text)
      Tokenizes the given text into sentences.
    • check

      public List<RuleMatch> check(String text) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
      Parameters:
      text - the text to be checked
      Returns:
      a List of RuleMatch objects
      Throws:
      IOException
    • check

      public List<RuleMatch> check(String text, RuleMatchListener listener) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
      Parameters:
      text - the text to be checked
      Returns:
      a List of RuleMatch objects
      Throws:
      IOException
      Since:
      3.7
    • check

      public List<RuleMatch> check(String text, boolean tokenizeText, JLanguageTool.ParagraphHandling paraMode) throws IOException
      Throws:
      IOException
    • check

      public List<RuleMatch> check(String text, boolean tokenizeText, JLanguageTool.ParagraphHandling paraMode, RuleMatchListener listener) throws IOException
      Throws:
      IOException
      Since:
      3.7
    • check

      public List<RuleMatch> check(AnnotatedText text) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules, adjusting error positions so they refer to the original text including markup.
      Throws:
      IOException
      Since:
      2.3
    • check

      public List<RuleMatch> check(AnnotatedText text, RuleMatchListener listener) throws IOException
      Throws:
      IOException
      Since:
      3.9
    • check

      public List<RuleMatch> check(AnnotatedText annotatedText, boolean tokenizeText, JLanguageTool.ParagraphHandling paraMode) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
      Parameters:
      annotatedText - The text to be checked, created with AnnotatedTextBuilder. Call this method with the complete text to be checked. If you call it repeatedly with smaller chunks like paragraphs or sentence, those rules that work across paragraphs/sentences won't work (their status gets reset whenever this method is called).
      tokenizeText - If true, then the text is tokenized into sentences. Otherwise, it is assumed it's already tokenized, i.e. it is only one sentence
      paraMode - Uses paragraph-level rules only if true.
      Returns:
      a List of RuleMatch objects, describing potential errors in the text
      Throws:
      IOException
      Since:
      2.3
    • check

      public List<RuleMatch> check(AnnotatedText annotatedText, boolean tokenizeText, JLanguageTool.ParagraphHandling paraMode, RuleMatchListener listener) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules.
      Throws:
      IOException
      Since:
      3.7
    • check

      public List<RuleMatch> check(AnnotatedText annotatedText, boolean tokenizeText, JLanguageTool.ParagraphHandling paraMode, RuleMatchListener listener, JLanguageTool.Mode mode) throws IOException
      The main check method. Tokenizes the text into sentences and matches these sentences against all currently active rules depending on mode.
      Throws:
      IOException
      Since:
      4.3
    • analyzeText

      public List<AnalyzedSentence> analyzeText(String text) throws IOException
      Use this method if you want to access LanguageTool's otherwise internal analysis of the text. For actual text checking, use the check... methods instead.
      Parameters:
      text - The text to be analyzed
      Throws:
      IOException
      Since:
      2.5
    • analyzeSentences

      protected List<AnalyzedSentence> analyzeSentences(List<String> sentences) throws IOException
      Throws:
      IOException
    • printSentenceInfo

      protected void printSentenceInfo(AnalyzedSentence analyzedSentence)
    • performCheck

      protected List<RuleMatch> performCheck(List<AnalyzedSentence> analyzedSentences, List<String> sentences, List<Rule> allRules, JLanguageTool.ParagraphHandling paraMode, AnnotatedText annotatedText, JLanguageTool.Mode mode) throws IOException
      Throws:
      IOException
    • performCheck

      protected List<RuleMatch> performCheck(List<AnalyzedSentence> analyzedSentences, List<String> sentences, List<Rule> allRules, JLanguageTool.ParagraphHandling paraMode, AnnotatedText annotatedText, RuleMatchListener listener, JLanguageTool.Mode mode) throws IOException
      Throws:
      IOException
      Since:
      3.7
    • checkAnalyzedSentence

      public List<RuleMatch> checkAnalyzedSentence(JLanguageTool.ParagraphHandling paraMode, List<Rule> rules, AnalyzedSentence analyzedSentence) throws IOException
      This is an internal method that's public only for technical reasons, please use one of the check(String) methods instead.
      Throws:
      IOException
      Since:
      2.3
    • ignoreRule

      private boolean ignoreRule(Rule rule)
    • adjustRuleMatchPos

      public RuleMatch adjustRuleMatchPos(RuleMatch match, int charCount, int columnCount, int lineCount, String sentence, AnnotatedText annotatedText)
      Change RuleMatch positions so they are relative to the complete text, not just to the sentence.
      Parameters:
      charCount - Count of characters in the sentences before
      columnCount - Current column number
      lineCount - Current line number
      sentence - The text being checked
      Returns:
      The RuleMatch object with adjustments
    • extendSuggestions

      private List<SuggestedReplacement> extendSuggestions(List<SuggestedReplacement> replacements)
    • rememberUnknownWords

      protected void rememberUnknownWords(AnalyzedSentence analyzedText)
    • getUnknownWords

      public List<String> getUnknownWords()
      Get the alphabetically sorted list of unknown words in the latest run of one of the check(String) methods.
      Throws:
      IllegalStateException - if setListUnknownWords(boolean) has been set to false
    • countLineBreaks

      static int countLineBreaks(String s)
    • getAnalyzedSentence

      public AnalyzedSentence getAnalyzedSentence(String sentence) throws IOException
      Tokenizes the given sentence into words and analyzes it, and then disambiguates POS tags.
      Parameters:
      sentence - sentence to be analyzed
      Throws:
      IOException
    • getRawAnalyzedSentence

      public AnalyzedSentence getRawAnalyzedSentence(String sentence) throws IOException
      Tokenizes the given sentence into words and analyzes it. This is the same as getAnalyzedSentence(String) but it does not run the disambiguator.
      Parameters:
      sentence - sentence to be analyzed
      Throws:
      IOException
      Since:
      0.9.8
    • replaceSoftHyphens

      private Map<Integer,String> replaceSoftHyphens(List<String> tokens)
    • getCategories

      public Map<CategoryId,Category> getCategories()
      Get all rule categories for the current language.
      Returns:
      a map of Categories, keyed by their id.
      Since:
      3.5
    • getAllRules

      public List<Rule> getAllRules()
      Get all rules for the current language that are built-in or that have been added using addRule(Rule). Please note that XML rules that are grouped will appear as multiple rules with the same id. To tell them apart, check if they are of type AbstractPatternRule, cast them to that type and call their AbstractPatternRule.getSubId() method.
      Returns:
      a List of Rule objects
    • getAllActiveRules

      public List<Rule> getAllActiveRules()
      Get all active (not disabled) rules for the current language that are built-in or that have been added using e.g. addRule(Rule). See getAllRules() for hints about rule ids.
      Returns:
      a List of Rule objects
    • getAllActiveOfficeRules

      public List<Rule> getAllActiveOfficeRules()
      Works like getAllActiveRules but overrides defaults by office defaults
      Returns:
      a List of Rule objects
      Since:
      4.0
    • getPatternRulesByIdAndSubId

      public List<AbstractPatternRule> getPatternRulesByIdAndSubId(String id, String subId)
      Get pattern rules by Id and SubId. This returns a list because rules that use <or>...</or> are internally expanded into several rules.
      Returns:
      a List of Rule objects
      Since:
      2.3
    • printIfVerbose

      protected void printIfVerbose(String s)
    • addTemporaryFile

      public static void addTemporaryFile(File file)
      Adds a temporary file to the internal list (internal method, you should never need to call this as a user of LanguageTool)
      Parameters:
      file - the file to be added.
    • removeTemporaryFiles

      public static void removeTemporaryFiles()
      Clean up all temporary files, if there are any.
    • applyCustomFilters

      protected List<RuleMatch> applyCustomFilters(List<RuleMatch> matches, AnnotatedText text)
      should be called just once with complete list of matches, before returning them to caller
      Parameters:
      matches - matches after applying rules and default filters
      text - text that matches refer to
      Returns:
      transformed matches (after applying filters in matchFilters)
      Since:
      4.7
    • setConfigValues

      public void setConfigValues(Map<String,Integer> v)