Class LdLocale
- java.lang.Object
-
- com.optimaize.langdetect.i18n.LdLocale
-
public final class LdLocale extends java.lang.Object
A language-detector implementation of a Locale, similar to the java.util.Locale.It represents a IETF BCP 47 tag, but does not implement all the features. Features can be added as needed.
It is constructed through the
fromString(java.lang.String)
factory method. ThetoString()
method produces a parseable and persistable string.The class is immutable.
The java.util.Locale cannot be used because it has issues for historical reasons, notably the script code conversion for Hebrew, Yiddish and Indonesian, and more. If one needs a Locale, it is simple to create one based on this object.
The ICU ULocale cannot be used because a) it has issues too (for our use case) and b) we're not using ICU in here [yet].This class does not perform any modifications on the input. The input is used as is, and the getters return it in exactly the same way. No standardization, canonicalization, cleaning.
The input is validated syntactically, but not for code existence. For example the script code must be a valid ISO 15924 like "Latn" or "Cyrl", in correct case. But whether the code exists or not is not checked. These code standards are not fixed, simply because regional entities like Countries can change for political reasons, and languages are living entities. Therefore certain codes may exist at some point in time only (be introduced late, or be deprecated or removed, or even be re-assigned another meaning). It is not up to us to decide whether Kosovo is a country in 2015 or not. If one needs to only work with a certain range of acceptable codes, he can validate the codes through other classes that have knowledge about the codes.
Language: as for BCP 47, the iso 639-1 code must be used if there is one. For example "fr" for French. If not, the ISO 639-3 should be used. It is highly discouraged to use 639-2. Right now this class enforces a 2 or 3 char code, but this may be relaxed in the future.
Script: Only ISO 15924, no discussion.
Region: same as for BCP 47. That means ISO 3166-1 alpha-2 and "UN M.49". I can imagine relaxing it in the future to also allow 3166-2 codes. In most cases the "region" is a "country".
-
-
Constructor Summary
Constructors Modifier Constructor Description private
LdLocale(@NotNull java.lang.String language, @NotNull com.google.common.base.Optional<java.lang.String> script, @NotNull com.google.common.base.Optional<java.lang.String> region)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static java.lang.String
assignLang(java.lang.String s)
boolean
equals(java.lang.Object o)
static @NotNull LdLocale
fromString(@NotNull java.lang.String string)
@NotNull java.lang.String
getLanguage()
@NotNull com.google.common.base.Optional<java.lang.String>
getRegion()
@NotNull com.google.common.base.Optional<java.lang.String>
getScript()
int
hashCode()
private static boolean
looksLikeGeoCode3166_1(java.lang.String string)
private static boolean
looksLikeGeoCodeNumeric(java.lang.String string)
private static boolean
looksLikeScriptCode(java.lang.String string)
java.lang.String
toString()
The output of this can be fed to the fromString() method.
-
-
-
Method Detail
-
fromString
@NotNull public static @NotNull LdLocale fromString(@NotNull @NotNull java.lang.String string)
- Parameters:
string
- The output of the toString() method.- Returns:
- either a new or possibly a cached (immutable) instance.
-
looksLikeScriptCode
private static boolean looksLikeScriptCode(java.lang.String string)
-
looksLikeGeoCode3166_1
private static boolean looksLikeGeoCode3166_1(java.lang.String string)
-
looksLikeGeoCodeNumeric
private static boolean looksLikeGeoCodeNumeric(java.lang.String string)
-
assignLang
private static java.lang.String assignLang(java.lang.String s)
-
toString
public java.lang.String toString()
The output of this can be fed to the fromString() method.- Overrides:
toString
in classjava.lang.Object
- Returns:
- for example "de" or "de-Latn" or "de-CH" or "de-Latn-CH", see class header.
-
getLanguage
@NotNull public @NotNull java.lang.String getLanguage()
- Returns:
- ISO 639-1 or 639-3 language code, eg "fr" or "gsw", see class header.
-
getScript
@NotNull public @NotNull com.google.common.base.Optional<java.lang.String> getScript()
- Returns:
- ISO 15924 script code, eg "Latn", see class header.
-
getRegion
@NotNull public @NotNull com.google.common.base.Optional<java.lang.String> getRegion()
- Returns:
- ISO 3166-1 or UN M.49 code, eg "DE" or 150, see class header.
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
-