Package org.languagetool.language
Class LanguageIdentifier
- java.lang.Object
-
- org.languagetool.language.LanguageIdentifier
-
public class LanguageIdentifier extends java.lang.Object
Identify the language of a text. Note that some languages might never be detected because they are close to another language. Language variants like en-US or en-GB are not detected, the result will been
for those. By default, only the first 1000 characters of a text are considered. Email signatures that use\n-- \n
as a delimiter are ignored.- Since:
- 2.9
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) class
LanguageIdentifier.RemoveEMailSignatureFilter
-
Field Summary
Fields Modifier and Type Field Description private static int
CONSIDER_ONLY_PREFERRED_THRESHOLD
private static java.util.List<java.lang.String>
externalLangCodes
private boolean
fasttextEnabled
private java.io.BufferedReader
fasttextIn
private java.io.BufferedWriter
fasttextOut
private java.lang.Process
fasttextProcess
private static java.util.List<java.lang.String>
ignoreLangCodes
private static int
K_HIGHEST_SCORES
private com.optimaize.langdetect.LanguageDetector
languageDetector
private static org.slf4j.Logger
logger
private int
maxLength
private static double
MINIMAL_CONFIDENCE
private static int
SHORT_ALGO_THRESHOLD
private static java.util.regex.Pattern
SIGNATURE
private com.optimaize.langdetect.text.TextObjectFactory
textObjectFactory
private static float
THRESHOLD
-
Constructor Summary
Constructors Constructor Description LanguageIdentifier()
LanguageIdentifier(int maxLength)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private boolean
canLanguageBeDetected(java.lang.String langCode, java.util.List<java.lang.String> additionalLanguageCodes)
@Nullable Language
detectLanguage(java.lang.String text)
@Nullable DetectedLanguage
detectLanguage(java.lang.String text, java.util.List<java.lang.String> noopLangsTmp, java.util.List<java.lang.String> preferredLangsTmp)
private java.util.Map.Entry<java.lang.String,java.lang.Double>
detectLanguageCode(java.lang.String text)
(package private) @Nullable DetectedLanguage
detectLanguageWithDetails(java.lang.String text)
void
enableFasttext(java.io.File fasttextBinary, java.io.File fasttextModel)
private java.util.Map.Entry<java.lang.String,java.lang.Double>
getHighestScoringResult(java.util.Map<java.lang.String,java.lang.Double> probs)
private static java.util.List<java.lang.String>
getLanguageCodes()
private java.util.List<com.optimaize.langdetect.profiles.LanguageProfile>
loadProfiles(java.util.List<java.lang.String> langCodes)
private java.util.Map<java.lang.String,java.lang.Double>
runFasttext(java.lang.String text, java.util.List<java.lang.String> additionalLanguageCodes)
private void
startFasttext(java.io.File modelPath, java.io.File binaryPath)
-
-
-
Field Detail
-
logger
private static final org.slf4j.Logger logger
-
MINIMAL_CONFIDENCE
private static final double MINIMAL_CONFIDENCE
- See Also:
- Constant Field Values
-
K_HIGHEST_SCORES
private static final int K_HIGHEST_SCORES
- See Also:
- Constant Field Values
-
SHORT_ALGO_THRESHOLD
private static final int SHORT_ALGO_THRESHOLD
- See Also:
- Constant Field Values
-
CONSIDER_ONLY_PREFERRED_THRESHOLD
private static final int CONSIDER_ONLY_PREFERRED_THRESHOLD
- See Also:
- Constant Field Values
-
SIGNATURE
private static final java.util.regex.Pattern SIGNATURE
-
ignoreLangCodes
private static final java.util.List<java.lang.String> ignoreLangCodes
-
externalLangCodes
private static final java.util.List<java.lang.String> externalLangCodes
-
THRESHOLD
private static final float THRESHOLD
- See Also:
- Constant Field Values
-
languageDetector
private final com.optimaize.langdetect.LanguageDetector languageDetector
-
textObjectFactory
private final com.optimaize.langdetect.text.TextObjectFactory textObjectFactory
-
maxLength
private final int maxLength
-
fasttextEnabled
private boolean fasttextEnabled
-
fasttextProcess
private java.lang.Process fasttextProcess
-
fasttextIn
private java.io.BufferedReader fasttextIn
-
fasttextOut
private java.io.BufferedWriter fasttextOut
-
-
Constructor Detail
-
LanguageIdentifier
public LanguageIdentifier()
-
LanguageIdentifier
public LanguageIdentifier(int maxLength)
- Parameters:
maxLength
- the maximum number of characters that will be considered - can help with performance. Don't use values below 100, as this would decrease accuracy.- Throws:
java.lang.IllegalArgumentException
- ifmaxLength
is less than 10- Since:
- 4.2
-
-
Method Detail
-
enableFasttext
public void enableFasttext(java.io.File fasttextBinary, java.io.File fasttextModel)
-
getLanguageCodes
private static java.util.List<java.lang.String> getLanguageCodes()
-
loadProfiles
private java.util.List<com.optimaize.langdetect.profiles.LanguageProfile> loadProfiles(java.util.List<java.lang.String> langCodes) throws java.io.IOException
- Throws:
java.io.IOException
-
detectLanguage
@Nullable public @Nullable Language detectLanguage(java.lang.String text)
- Returns:
- language or
null
if language could not be identified
-
detectLanguageWithDetails
@Nullable @Experimental @Nullable DetectedLanguage detectLanguageWithDetails(java.lang.String text)
- Returns:
- language or
null
if language could not be identified
-
detectLanguage
@Nullable public @Nullable DetectedLanguage detectLanguage(java.lang.String text, java.util.List<java.lang.String> noopLangsTmp, java.util.List<java.lang.String> preferredLangsTmp)
- Parameters:
noopLangsTmp
- list of codes that are detected but will lead to the NoopLanguage that has no rules- Returns:
- language or
null
if language could not be identified - Since:
- 4.4 (new parameter noopLangs, changed return type to DetectedLanguage)
-
canLanguageBeDetected
private boolean canLanguageBeDetected(java.lang.String langCode, java.util.List<java.lang.String> additionalLanguageCodes)
-
startFasttext
private void startFasttext(java.io.File modelPath, java.io.File binaryPath) throws java.io.IOException
- Throws:
java.io.IOException
-
getHighestScoringResult
private java.util.Map.Entry<java.lang.String,java.lang.Double> getHighestScoringResult(java.util.Map<java.lang.String,java.lang.Double> probs)
-
runFasttext
private java.util.Map<java.lang.String,java.lang.Double> runFasttext(java.lang.String text, java.util.List<java.lang.String> additionalLanguageCodes) throws java.io.IOException
- Throws:
java.io.IOException
-
detectLanguageCode
@Nullable private java.util.Map.Entry<java.lang.String,java.lang.Double> detectLanguageCode(java.lang.String text)
- Returns:
- language or
null
if language could not be identified
-
-