Class UnitFormat

java.lang.Object
java.text.Format
org.apache.sis.measure.UnitFormat
All Implemented Interfaces:
Serializable, Cloneable, javax.measure.format.UnitFormat, Localized

public class UnitFormat extends Format implements javax.measure.format.UnitFormat, Localized
Parses and formats units of measurement as SI symbols, URI in OGC namespace or other symbols. This class combines in a single class the API from java.text and the API from javax.measure.format. In addition to the symbols of the Système international (SI), this class is also capable to handle some symbols found in Well Known Text (WKT) definitions or in XML files.

Parsing authority codes

If a character sequence given to the parse(CharSequence) method is of the form "EPSG:####", "urn:ogc:def:uom:EPSG::####" or "http://www.opengis.net/def/uom/EPSG/0/####" (ignoring case and whitespaces around path separators), then "####" is parsed as an integer and forwarded to the Units.valueOfEPSG(int) method.

Note on netCDF unit symbols

In netCDF files, values of "unit" attribute are concatenations of an angular unit with an axis direction, as in "degrees_east" or "degrees_north". This class ignores those suffixes and unconditionally returns Units.DEGREE for all axis directions.

Multi-threading

UnitFormat is generally not thread-safe. If units need to be parsed or formatted in different threads, each thread should have its own UnitFormat instance.
Since:
0.8
Version:
1.3
See Also:
  • Field Details

    • serialVersionUID

      private static final long serialVersionUID
      For cross-version compatibility.
      See Also:
    • PARSE_AUTHORITY_CODES

      private static final boolean PARSE_AUTHORITY_CODES
      Whether the parsing of authority codes such as "EPSG:9001" is allowed.
      See Also:
    • DEGREES

      private static final String DEGREES
      The unit name for degrees (not necessarily angular), to be handled in a special way. Must contain only ASCII lower case letters ([a … z]).
      See Also:
    • UNITY

      private static final String UNITY
      The unit name for dimensionless unit.
      See Also:
    • INSTANCE

      static final UnitFormat INSTANCE
      The default instance used by Units.valueOf(String) for parsing units of measurement. While UnitFormat is generally not thread-safe, this particular instance is safe if we never invoke any setter method and we do not format with UnitFormat.Style.NAME.
    • locale

      private Locale locale
      The locale specified at construction time or modified by setLocale(Locale).
      See Also:
    • style

      private UnitFormat.Style style
      Whether this UnitFormat should format long names like "metre" or use unit symbols.
      See Also:
    • unitToLabel

      private final Map<javax.measure.Unit<?>,String> unitToLabel
      Symbols or names to use for formatting units in replacement to the default unit symbols or names. The Unit instances are the ones specified by user in calls to label(Unit, String).
      See Also:
    • labelToUnit

      private final Map<String,javax.measure.Unit<?>> labelToUnit
      Units associated to a given label (in addition to the system-wide UnitRegistry). This map is the converse of unitToLabel. The Unit instances may differ from the ones specified by user since AbstractUnit.symbol may have been set to the label specified by the user. The labels may contain some characters normally not allowed in unit symbols, like white spaces.
      See Also:
    • symbolToName

      private transient volatile ResourceBundle symbolToName
      The mapping from unit symbols to long localized names. Those resources are locale-dependent and loaded when first needed.
      See Also:
    • nameToUnit

      private transient volatile Map<String,javax.measure.Unit<?>> nameToUnit
      Mapping from long localized and unlocalized names to unit instances. This map is used only for parsing and created when first needed.
      See Also:
    • SHARED

      private static final WeakValueHashMap<Locale,Map<String,javax.measure.Unit<?>>> SHARED
      Cached values of nameToUnit, for avoiding to load the same information many time and for saving memory if the user create many UnitFormat instances. Note that we do not cache symbolToName because ResourceBundle already provides its own caching mechanism.
      See Also:
  • Constructor Details

    • UnitFormat

      private UnitFormat()
      Creates the unique INSTANCE.
    • UnitFormat

      public UnitFormat(Locale locale)
      Creates a new format for the given locale.
      Parameters:
      locale - the locale to use for parsing and formatting units.
  • Method Details

    • getLocale

      public Locale getLocale()
      Returns the locale used by this UnitFormat.
      Specified by:
      getLocale in interface Localized
      Returns:
      the locale of this UnitFormat.
    • setLocale

      public void setLocale(Locale locale)
      Sets the locale that this UnitFormat will use for long names. For example, a call to setLocale(Locale.US) instructs this formatter to use the “meter” spelling instead of “metre”.
      Parameters:
      locale - the new locale for this UnitFormat.
      See Also:
    • isLocaleSensitive

      public boolean isLocaleSensitive()
      Returns whether this UnitFormat depends on the Locale given at construction time for performing its tasks. This method returns true if formatting long names (e.g. “metre” or “meter”} and false if formatting only the unit symbol (e.g. “m”).
      Specified by:
      isLocaleSensitive in interface javax.measure.format.UnitFormat
      Returns:
      true if formatting depends on the locale.
    • getStyle

      public UnitFormat.Style getStyle()
      Returns whether unit formatting uses ASCII symbols, Unicode symbols or full localized names.
      Returns:
      the style of units formatted by this UnitFormat instance.
    • setStyle

      public void setStyle(UnitFormat.Style style)
      Sets whether unit formatting should use ASCII symbols, Unicode symbols or full localized names.
      Parameters:
      style - the desired style of units.
    • label

      public void label(javax.measure.Unit<?> unit, String label)
      Attaches a label to the specified unit. A label can be a substitute to either the unit symbol or the unit name, depending on the format style. If the specified label is already associated to another unit, then the previous association is discarded.

      Restriction on character set

      Current implementation accepts only letters, subscripts, spaces (including non-breaking spaces but not CR/LF characters), the degree sign (°) and a few other characters like underscore. The set of legal characters may be expanded in future Apache SIS versions, but the following restrictions are likely to remain:
      • The following characters are reserved since they have special meaning in UCUM format, in URI or in Apache SIS parser:
        " # ( ) * + - . / : = ? [ ] { } ^ ⋅ ∕
      • The symbol cannot begin or end with digits, since such digits would be confused with unit power.
      Specified by:
      label in interface javax.measure.format.UnitFormat
      Parameters:
      unit - the unit being labeled.
      label - the new label for the given unit.
      Throws:
      IllegalArgumentException - if the given label is not a valid unit name.
    • getBundle

      static ResourceBundle getBundle(Locale locale)
      Loads the UnitNames resource bundle for the given locale.
    • symbolToName

      private ResourceBundle symbolToName()
      Returns the mapping from unit symbols to long localized names. This mapping is loaded when first needed and memorized as long as the locale does not change.
    • fromName

      private javax.measure.Unit<?> fromName(String uom)
      Returns the unit instance for the given long (un)localized name. This method is somewhat the converse of symbolToName(), but recognizes also international and American spelling of unit names in addition of localized names. The intent is to recognize "meter" as well as "metre".

      While we said that UnitFormat is not thread safe, we make an exception for this method for allowing the singleton INSTANCE to parse symbols in a multi-threads environment.

      Parameters:
      uom - the unit symbol, without leading or trailing spaces.
      Returns:
      the unit for the given name, or null if unknown.
    • copy

      private static void copy(Locale locale, ResourceBundle symbolToName, Map<String,javax.measure.Unit<?>> nameToUnit)
      Copies all entries from the given "symbols to names" mapping to the given "names to units" mapping. During this copy, keys are converted from symbols to names and values are converted from symbols to Unit instance. We use Unit values instead of their symbols because all Unit instances are created at Units class initialization anyway (so we do not create new instance here), and it avoid to retain references to the String instances loaded by the resource bundle.
    • format

      public Appendable format(javax.measure.Unit<?> unit, Appendable toAppendTo) throws IOException
      Formats the specified unit. This method performs the first of the following actions that can be done.
      1. If a label has been specified for the given unit, then that label is appended unconditionally.
      2. Otherwise if the formatting style is UnitFormat.Style.NAME and the Unit.getName() method returns a non-null value, then that value is appended. Unit instances implemented by Apache SIS are handled in a special way for localizing the name according the locale specified to this format.
      3. Otherwise if the Unit.getSymbol() method returns a non-null value, then that value is appended.
      4. Otherwise a default symbol is created from the entries returned by Unit.getBaseUnits().
      Specified by:
      format in interface javax.measure.format.UnitFormat
      Parameters:
      unit - the unit to format.
      toAppendTo - where to format the unit.
      Returns:
      the given toAppendTo argument, for method calls chaining.
      Throws:
      IOException - if an error occurred while writing to the destination.
    • formatComponents

      static void formatComponents(Map<?,? extends Number> components, UnitFormat.Style style, Appendable toAppendTo) throws IOException
      Creates a new symbol (e.g. "m/s") from the given symbols and factors. Keys in the given map can be either Unit or Dimension instances. Values in the given map are either Integer or Fraction instances.
      Parameters:
      components - the components of the symbol to format.
      style - whether to allow Unicode characters.
      toAppendTo - where to write the symbol.
      Throws:
      IOException
    • formatComponent

      private static void formatComponent(Map.Entry<?,? extends Number> entry, boolean inverse, UnitFormat.Style style, Appendable toAppendTo) throws IOException
      Formats a single unit or dimension raised to the given power.
      Parameters:
      entry - the base unit or base dimension to format, together with its power.
      inverse - true for inverting the power sign.
      style - whether to allow Unicode characters.
      Throws:
      IOException
    • formatSymbol

      private static void formatSymbol(Object base, UnitFormat.Style style, Appendable toAppendTo) throws IOException
      Appends the symbol for the given base unit of base dimension, or "?" if no symbol was found. If the given object is a unit, then it should be an instance of SystemUnit.
      Parameters:
      base - the base unit or base dimension to format.
      style - whether to allow Unicode characters.
      toAppendTo - where to append the symbol.
      Throws:
      IOException
    • format

      public StringBuffer format(Object unit, StringBuffer toAppendTo, FieldPosition pos)
      Formats the specified unit in the given buffer. This method delegates to format(Unit, Appendable).
      Specified by:
      format in class Format
      Parameters:
      unit - the unit to format.
      toAppendTo - where to format the unit.
      pos - where to store the position of a formatted field, or null if none.
      Returns:
      the given toAppendTo argument, for method calls chaining.
    • format

      public String format(javax.measure.Unit<?> unit)
      Formats the given unit. This method delegates to format(Unit, Appendable).
      Specified by:
      format in interface javax.measure.format.UnitFormat
      Parameters:
      unit - the unit to format.
      Returns:
      the formatted unit.
    • exponentOperator

      private static int exponentOperator(CharSequence symbols, int i, int length)
      Returns 0 or 1 if the '*' character at the given index stands for exponentiation instead of multiplication, or a negative value if the character stands for multiplication. This check is used for heuristic rules at parsing time. Current implementation applies the following rules:
      • The operation is presumed an exponentiation if the '*' symbol is doubled, as in "m**s-1".
      • The operation is presumed an exponentiation if it is surrounded by digits or a sign on its right side. Example: "10*-6", which means 1E-6 in UCUM syntax.
      • All other cases are currently presumed multiplication. Example: "m*s".
      Returns:
      -1 for parsing as a multiplication, or a positive value for exponentiation. If positive, this is the number of characters in the exponent symbol minus 1.
    • isDecimalSeparator

      private static boolean isDecimalSeparator(CharSequence symbols, int i, int length)
      Returns true if the '.' character at the given index is surrounded by digits or is at the beginning or the end of the character sequences. This check is used for heuristic rules.
      See Also:
    • isDigit

      private static boolean isDigit(int c)
      Returns true if the given character is a digit in the sense of the UnitFormat parser. Note that "digit" is taken here in a much more restrictive way than Character.isDigit(int).

      A return value of true guarantees that the given character is in the Basic Multilingual Plane (BMP). Consequently, the c argument value does not need to be the result of String.codePointAt(int); the result of String.charAt(int) is sufficient. We nevertheless use the int type for avoiding the need to cast if caller uses code points for another reason.

      See Also:
    • isSign

      private static boolean isSign(int c)
      Returns true if the given character is the sign of a number according the UnitFormat parser. A return value of true guarantees that the given character is in the Basic Multilingual Plane (BMP). Consequently, the c argument value does not need to be the result of String.codePointAt(int).
    • isDivisor

      private static boolean isDivisor(int c)
      Returns true if the given character is the sign of a division operator. A return value of true guarantees that the given character is in the Basic Multilingual Plane (BMP). Consequently, the c argument value does not need to be the result of String.codePointAt(int).
    • hasDigit

      private static boolean hasDigit(CharSequence symbol, int lower, int upper)
      Returns true if the given character sequence contains at least one digit. This is a hack for allowing to recognize units like "100 feet" (in principle not legal, but seen in practice). This verification has some value if digits are not allowed as unit label or symbol.
    • finish

      private static void finish(ParsePosition pos)
      Reports that the parsing is finished and no more content should be parsed. This method is invoked when the last parsed term is possibly one or more words instead of unit symbols. The intent is to avoid trying to parse "degree minute" as "degree × minute". By contrast, this method is not invoked if the string to parse is "m kg**-2" because it can be interpreted as "m × kg**-2".
    • parse

      public javax.measure.Unit<?> parse(CharSequence symbols) throws javax.measure.format.ParserException
      Parses the given text as an instance of Unit. If the parse completes without reading the entire length of the text, an exception is thrown.

      The parsing is lenient: symbols can be products or quotients of units like “m∕s”, words like “meters per second”, or authority codes like "urn:ogc:def:uom:EPSG::1026". The product operator can be either '.' (ASCII) or '⋅' (Unicode) character. Exponent after symbol can be decimal digits as in “m2” or a superscript as in “m²”.

      This method differs from parse(CharSequence, ParsePosition) in the treatment of white spaces: that method with a ParsePosition argument stops parsing at the first white space, while this parse(…) method treats white spaces as multiplications. The reason for this difference is that white space is normally not a valid multiplication symbol; it could be followed by a text which is not part of the unit symbol. But in the case of this parse(CharSequence) method, the whole CharSequence shall be a unit symbol. In such case, white spaces are less ambiguous.

      The default implementation delegates to parse(symbols, new ParsePosition(0)) and verifies that all non-white characters have been parsed. Units separated by spaces are multiplied; for example "kg m**-2" is parsed as kg/m².

      Specified by:
      parse in interface javax.measure.format.UnitFormat
      Parameters:
      symbols - the unit symbols or URI to parse.
      Returns:
      the unit parsed from the specified symbols.
      Throws:
      javax.measure.format.ParserException - if a problem occurred while parsing the given symbols.
      See Also:
    • parse

      public javax.measure.Unit<?> parse(CharSequence symbols, ParsePosition position) throws javax.measure.format.ParserException
      Parses a portion of the given text as an instance of Unit. Parsing begins at the index given by ParsePosition.getIndex(). After parsing, the above-cited index is updated to the first unparsed character.

      The parsing is lenient: symbols can be products or quotients of units like “m∕s”, words like “meters per second”, or authority codes like "urn:ogc:def:uom:EPSG::1026". The product operator can be either '.' (ASCII) or '⋅' (Unicode) character. Exponent after symbol can be decimal digits as in “m2” or a superscript as in “m²”.

      Note that contrarily to parseObject(String, ParsePosition), this method never return null. If an error occurs at parsing time, an unchecked ParserException is thrown.

      Parameters:
      symbols - the unit symbols to parse.
      position - on input, index of the first character to parse. On output, index after the last parsed character.
      Returns:
      the unit parsed from the specified symbols.
      Throws:
      javax.measure.format.ParserException - if a problem occurred while parsing the given symbols.
    • parseTerm

      private javax.measure.Unit<?> parseTerm(CharSequence symbols, int lower, int upper, UnitFormat.Operation operation) throws javax.measure.format.ParserException
      Parses a single unit symbol with its exponent. The given symbol shall not contain multiplication or division operator except in exponent. Parsing of fractional exponent as in "m2/3" is supported; other operations in the exponent will cause an exception to be thrown.
      Parameters:
      symbols - the complete string specified by the user.
      lower - index where to begin parsing in the symbols string.
      upper - index after the last character to parse in the symbols string.
      operation - the operation to be applied (e.g. the term to be parsed is a multiplier or divisor of another unit).
      Returns:
      the parsed unit symbol (never null).
      Throws:
      javax.measure.format.ParserException - if a problem occurred while parsing the given symbols.
    • parseMultiplicationFactor

      private static double parseMultiplicationFactor(String term) throws NumberFormatException
      Parses a multiplication factor, which may be a single number or a base raised to an exponent. For example, all the following strings are equivalent: "1000", "1000.0", "1E3", "10*3", "10^3", "10³".
      Throws:
      NumberFormatException
    • parseObject

      public Object parseObject(String source) throws ParseException
      Parses text from a string to produce a unit. The default implementation delegates to parse(CharSequence) and wraps the ParserException into a ParseException for compatibility with java.text API.
      Overrides:
      parseObject in class Format
      Parameters:
      source - the text, part of which should be parsed.
      Returns:
      a unit parsed from the string.
      Throws:
      ParseException - if the given string cannot be fully parsed.
    • parseObject

      public Object parseObject(String source, ParsePosition pos)
      Parses text from a string to produce a unit, or returns null if the parsing failed. The default implementation delegates to parse(CharSequence, ParsePosition) and catches the ParserException.
      Specified by:
      parseObject in class Format
      Parameters:
      source - the text, part of which should be parsed.
      pos - index and error index information as described above.
      Returns:
      a unit parsed from the string, or null in case of error.
    • clone

      public UnitFormat clone()
      Returns a clone of this unit format. The new unit format will be initialized to the same locale and labels than this format.
      Overrides:
      clone in class Format
      Returns:
      a clone of this unit format.
    • clone

      private static Object clone(Map<?,?> value)
      Clones the given map, which can be either a HashMap or the instance returned by Collections.emptyMap().