Package com.optimaize.langdetect
Class LanguageDetectorBuilder
- java.lang.Object
-
- com.optimaize.langdetect.LanguageDetectorBuilder
-
public class LanguageDetectorBuilder extends java.lang.Object
Builder forLanguageDetector
.This class does no internal synchronization.
-
-
Field Summary
Fields Modifier and Type Field Description private double
alpha
private static double
ALPHA_DEFAULT
private @NotNull java.util.Set<LdLocale>
langsAdded
private @NotNull java.util.Set<LanguageProfile>
languageProfiles
private @Nullable java.util.Map<LdLocale,java.lang.Double>
langWeightingMap
private double
minimalConfidence
private @NotNull NgramExtractor
ngramExtractor
private double
prefixFactor
private double
probabilityThreshold
private com.google.common.base.Optional<java.lang.Long>
seed
private int
shortTextAlgorithm
private double
suffixFactor
-
Constructor Summary
Constructors Modifier Constructor Description private
LanguageDetectorBuilder(@NotNull NgramExtractor ngramExtractor)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description LanguageDetectorBuilder
affixFactor(double affixFactor)
Sets prefixFactor() and suffixFactor() both to the given value.LanguageDetectorBuilder
alpha(double alpha)
LanguageDetector
build()
static LanguageDetectorBuilder
create(@NotNull NgramExtractor ngramExtractor)
LanguageDetectorBuilder
languagePriorities(@Nullable java.util.Map<LdLocale,java.lang.Double> langWeightingMap)
TODO document exactly.LanguageDetectorBuilder
minimalConfidence(double minimalConfidence)
LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability.LanguageDetectorBuilder
prefixFactor(double prefixFactor)
To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here.LanguageDetectorBuilder
probabilityThreshold(double probabilityThreshold)
LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this.LanguageDetectorBuilder
seed(long seed)
LanguageDetectorBuilder
seed(@NotNull com.google.common.base.Optional<java.lang.Long> seed)
LanguageDetectorBuilder
shortTextAlgorithm(int shortTextAlgorithm)
Defaults to 0, which means don't use this feature.LanguageDetectorBuilder
suffixFactor(double suffixFactor)
Defaults to 1.0, which means don't use this feature.LanguageDetectorBuilder
withProfile(LanguageProfile languageProfile)
LanguageDetectorBuilder
withProfiles(java.lang.Iterable<LanguageProfile> languageProfiles)
-
-
-
Field Detail
-
ALPHA_DEFAULT
private static final double ALPHA_DEFAULT
- See Also:
- Constant Field Values
-
ngramExtractor
@NotNull private final @NotNull NgramExtractor ngramExtractor
-
alpha
private double alpha
-
seed
private com.google.common.base.Optional<java.lang.Long> seed
-
shortTextAlgorithm
private int shortTextAlgorithm
-
prefixFactor
private double prefixFactor
-
suffixFactor
private double suffixFactor
-
probabilityThreshold
private double probabilityThreshold
-
minimalConfidence
private double minimalConfidence
-
langWeightingMap
@Nullable private @Nullable java.util.Map<LdLocale,java.lang.Double> langWeightingMap
-
languageProfiles
@NotNull private final @NotNull java.util.Set<LanguageProfile> languageProfiles
-
langsAdded
@NotNull private final @NotNull java.util.Set<LdLocale> langsAdded
-
-
Constructor Detail
-
LanguageDetectorBuilder
private LanguageDetectorBuilder(@NotNull @NotNull NgramExtractor ngramExtractor)
-
-
Method Detail
-
create
public static LanguageDetectorBuilder create(@NotNull @NotNull NgramExtractor ngramExtractor)
-
alpha
public LanguageDetectorBuilder alpha(double alpha)
-
seed
public LanguageDetectorBuilder seed(long seed)
-
seed
public LanguageDetectorBuilder seed(@NotNull @NotNull com.google.common.base.Optional<java.lang.Long> seed)
-
shortTextAlgorithm
public LanguageDetectorBuilder shortTextAlgorithm(int shortTextAlgorithm)
Defaults to 0, which means don't use this feature. That's the old behavior.
-
affixFactor
public LanguageDetectorBuilder affixFactor(double affixFactor)
Sets prefixFactor() and suffixFactor() both to the given value.- See Also:
prefixFactor(double)
-
prefixFactor
public LanguageDetectorBuilder prefixFactor(double prefixFactor)
To weight n-grams that are on the left border of a word differently from n-grams in the middle of words, assign a value here. Affixes (prefixes and suffixes) often distinguish the specific features of languages. Giving a value greater than 1.0 weights these n-grams higher. A 2.0 weights them double. Defaults to 1.0, which means don't use this feature.- Parameters:
prefixFactor
- 0.0 to 10.0, a suggested value is 1.5
-
suffixFactor
public LanguageDetectorBuilder suffixFactor(double suffixFactor)
Defaults to 1.0, which means don't use this feature.- Parameters:
suffixFactor
- 0.0 to 10.0, a suggested value is 2.0- See Also:
prefixFactor(double)
-
probabilityThreshold
public LanguageDetectorBuilder probabilityThreshold(double probabilityThreshold)
LanguageDetector.getProbabilities(java.lang.CharSequence)
does not return languages with less probability than this. The default currently is 0.1 (the old hardcoded value), but don't rely on it, if you need to be sure then set one.
-
minimalConfidence
public LanguageDetectorBuilder minimalConfidence(double minimalConfidence)
LanguageDetector.detect(java.lang.CharSequence)
returns a language if the best detected language has at least this probability. The default currently is 0.9999d, but don't rely on it, if you need to be sure then set one.
-
languagePriorities
public LanguageDetectorBuilder languagePriorities(@Nullable @Nullable java.util.Map<LdLocale,java.lang.Double> langWeightingMap)
TODO document exactly. Also explain how it influences the results. Maybe check for unsupported languages at some point, or not, but document whether it does throw or ignore. String key = language, Double value = priority (probably 0-1).
-
withProfile
public LanguageDetectorBuilder withProfile(LanguageProfile languageProfile) throws java.lang.IllegalStateException
- Throws:
java.lang.IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
withProfiles
public LanguageDetectorBuilder withProfiles(java.lang.Iterable<LanguageProfile> languageProfiles) throws java.lang.IllegalStateException
- Throws:
java.lang.IllegalStateException
- if a profile for the same language was added already (must be a userland bug).
-
build
public LanguageDetector build() throws java.lang.IllegalStateException
- Throws:
java.lang.IllegalStateException
- if no LanguageProfile wasadded
.
-
-