Class LdLocale


  • public final class LdLocale
    extends java.lang.Object
    A language-detector implementation of a Locale, similar to the java.util.Locale.

    It represents a IETF BCP 47 tag, but does not implement all the features. Features can be added as needed.

    It is constructed through the fromString(java.lang.String) factory method. The toString() method produces a parseable and persistable string.

    The class is immutable.

    The java.util.Locale cannot be used because it has issues for historical reasons, notably the script code conversion for Hebrew, Yiddish and Indonesian, and more. If one needs a Locale, it is simple to create one based on this object.
    The ICU ULocale cannot be used because a) it has issues too (for our use case) and b) we're not using ICU in here [yet].

    This class does not perform any modifications on the input. The input is used as is, and the getters return it in exactly the same way. No standardization, canonicalization, cleaning.

    The input is validated syntactically, but not for code existence. For example the script code must be a valid ISO 15924 like "Latn" or "Cyrl", in correct case. But whether the code exists or not is not checked. These code standards are not fixed, simply because regional entities like Countries can change for political reasons, and languages are living entities. Therefore certain codes may exist at some point in time only (be introduced late, or be deprecated or removed, or even be re-assigned another meaning). It is not up to us to decide whether Kosovo is a country in 2015 or not. If one needs to only work with a certain range of acceptable codes, he can validate the codes through other classes that have knowledge about the codes.

    Language: as for BCP 47, the iso 639-1 code must be used if there is one. For example "fr" for French. If not, the ISO 639-3 should be used. It is highly discouraged to use 639-2. Right now this class enforces a 2 or 3 char code, but this may be relaxed in the future.

    Script: Only ISO 15924, no discussion.

    Region: same as for BCP 47. That means ISO 3166-1 alpha-2 and "UN M.49". I can imagine relaxing it in the future to also allow 3166-2 codes. In most cases the "region" is a "country".

    • Field Summary

      Fields 
      Modifier and Type Field Description
      private @NotNull java.lang.String language  
      private @NotNull com.google.common.base.Optional<java.lang.String> region  
      private @NotNull com.google.common.base.Optional<java.lang.String> script  
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      private LdLocale​(@NotNull java.lang.String language, @NotNull com.google.common.base.Optional<java.lang.String> script, @NotNull com.google.common.base.Optional<java.lang.String> region)  
    • Field Detail

      • language

        @NotNull
        private final @NotNull java.lang.String language
      • script

        @NotNull
        private final @NotNull com.google.common.base.Optional<java.lang.String> script
      • region

        @NotNull
        private final @NotNull com.google.common.base.Optional<java.lang.String> region
    • Constructor Detail

      • LdLocale

        private LdLocale​(@NotNull
                         @NotNull java.lang.String language,
                         @NotNull
                         @NotNull com.google.common.base.Optional<java.lang.String> script,
                         @NotNull
                         @NotNull com.google.common.base.Optional<java.lang.String> region)
    • Method Detail

      • fromString

        @NotNull
        public static @NotNull LdLocale fromString​(@NotNull
                                                   @NotNull java.lang.String string)
        Parameters:
        string - The output of the toString() method.
        Returns:
        either a new or possibly a cached (immutable) instance.
      • looksLikeScriptCode

        private static boolean looksLikeScriptCode​(java.lang.String string)
      • looksLikeGeoCode3166_1

        private static boolean looksLikeGeoCode3166_1​(java.lang.String string)
      • looksLikeGeoCodeNumeric

        private static boolean looksLikeGeoCodeNumeric​(java.lang.String string)
      • assignLang

        private static java.lang.String assignLang​(java.lang.String s)
      • toString

        public java.lang.String toString()
        The output of this can be fed to the fromString() method.
        Overrides:
        toString in class java.lang.Object
        Returns:
        for example "de" or "de-Latn" or "de-CH" or "de-Latn-CH", see class header.
      • getLanguage

        @NotNull
        public @NotNull java.lang.String getLanguage()
        Returns:
        ISO 639-1 or 639-3 language code, eg "fr" or "gsw", see class header.
      • getScript

        @NotNull
        public @NotNull com.google.common.base.Optional<java.lang.String> getScript()
        Returns:
        ISO 15924 script code, eg "Latn", see class header.
      • getRegion

        @NotNull
        public @NotNull com.google.common.base.Optional<java.lang.String> getRegion()
        Returns:
        ISO 3166-1 or UN M.49 code, eg "DE" or 150, see class header.
      • equals

        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object