Class Transliterator

java.lang.Object
org.apache.sis.io.wkt.Transliterator
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
Transliterator.Default, Transliterator.Unicode

public abstract class Transliterator extends Object implements Serializable
Controls the replacement of characters, abbreviations and names between the objects in memory and their WKT representations. The mapping is not necessarily one-to-one, for example the replacement of a Unicode character by an ASCII character may not be reversible. The mapping may also depend on the element to transliterate, for example some Greek letters like φ, λ and θ are mapped differently when they are used as mathematical symbols in axis abbreviations rather than texts. Some mappings may also apply to words instead of characters, when the word come from a controlled vocabulary.

Permitted characters in Well Known Text

The ISO 19162 standard restricts Well Known Text to the following characters in all quoted texts except in REMARKS["…"] elements:
A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °
They are ASCII codes 32 to 125 inclusive except ! (33), # (35), $ (36), @ (64) and ` (96), plus the addition of ° (176) despite being formally outside the ASCII character set. The only exception to this rules is for the text inside REMARKS["…"] elements, where all Unicode characters are allowed.

The filter(String) method is responsible for replacing or removing characters outside the above-cited set of permitted characters.

Application to mathematical symbols

For Greek letters used as mathematical symbols in coordinate axis abbreviations, the ISO 19162 standard recommends:
  • (P, L) as the transliteration of the Greek letters (phi, lambda), or (B, L) from German “Breite” and “Länge” used in academic texts worldwide, or (lat, long).
  • (U) for (θ) in polar coordinate systems.
  • (U, V) for (Ω, θ) in spherical coordinate systems.
Note: at least two conventions exist about the meaning of (r, θ, φ) in a spherical coordinate system (see Wikipedia or MathWorld for more information). When using the mathematics convention, θ is the azimuthal angle in the equatorial plane (roughly equivalent to longitude λ) while φ is an angle measured from a pole (also known as colatitude). But when using the physics convention, the meaning of θ and φ are interchanged. Furthermore, some other conventions may measure the φ angle from the equatorial plane – like latitude – instead than from the pole. This class does not need to care about the meaning of those angles. The only recommendation is that φ is mapped to U and θ is mapped to V, regardless of their meaning.
The toLatinAbbreviation(…) and toUnicodeAbbreviation(…) methods are responsible for doing the transliteration at formatting and parsing time, respectively.

Replacement of names

The longitude and latitude axis names are explicitly fixed by ISO 19111:2007 to "Geodetic longitude" and "Geodetic latitude". But ISO 19162:2015 §7.5.3(ii) said that the "Geodetic" part in those names shall be omitted at WKT formatting time. The toShortAxisName(…) and toLongAxisName(…) methods are responsible for doing the transliteration at formatting and parsing time, respectively.
Since:
0.6
Version:
1.1
See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static final class 
    The DEFAULT implementation.
    private static final class 
    The IDENTITY implementation.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final Map<org.opengis.referencing.cs.AxisDirection,String>
    Default names to associate to axis directions in a Cartesian coordinate system.
    static final Transliterator
    A transliterator compliant with ISO 19162 on a "best effort" basis.
    static final Transliterator
    A transliterator that does not perform any replacement.
    private static final long
    For cross-version compatibility.
    (package private) static final int
    A bitmask of control characters that are considered as spaces according Character.isWhitespace(char).
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
    For sub-class constructors.
  • Method Summary

    Modifier and Type
    Method
    Description
    filter(String text)
    Returns a character sequences with the non-ASCII characters replaced or removed.
    private static boolean
    isLatLong(String expected, String name)
    Returns true if the given axis name is at least part of the given expected axis name.
    toLatinAbbreviation(org.opengis.referencing.cs.CoordinateSystem cs, org.opengis.referencing.cs.AxisDirection direction, String abbreviation)
    Returns the axis abbreviation to format in WKT, or null if none.
    toLongAxisName(String csType, org.opengis.referencing.cs.AxisDirection direction, String name)
    Returns the axis name to use in memory for an axis parsed from a WKT.
    toShortAxisName(org.opengis.referencing.cs.CoordinateSystem cs, org.opengis.referencing.cs.AxisDirection direction, String name)
    Returns the axis name to format in WKT, or null if none.
    toUnicodeAbbreviation(String csType, org.opengis.referencing.cs.AxisDirection direction, String abbreviation)
    Returns the axis abbreviation to use in memory for an axis parsed from a WKT.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • serialVersionUID

      private static final long serialVersionUID
      For cross-version compatibility.
      See Also:
    • SPACES

      static final int SPACES
      A bitmask of control characters that are considered as spaces according Character.isWhitespace(char).
      See Also:
    • CARTESIAN

      private static final Map<org.opengis.referencing.cs.AxisDirection,String> CARTESIAN
      Default names to associate to axis directions in a Cartesian coordinate system. Those names do not apply to other kind of coordinate systems.

      For thread safety reasons, this map shall not be modified after construction.

    • DEFAULT

      public static final Transliterator DEFAULT
      A transliterator compliant with ISO 19162 on a "best effort" basis. All methods perform the default implementation documented in this Transliterator class.
    • IDENTITY

      public static final Transliterator IDENTITY
      A transliterator that does not perform any replacement. All methods let names, abbreviations and Unicode characters pass-through unchanged.
  • Constructor Details

    • Transliterator

      protected Transliterator()
      For sub-class constructors.
  • Method Details

    • filter

      public String filter(String text)
      Returns a character sequences with the non-ASCII characters replaced or removed. For example, this method replaces “ç” by “c” in “Triangulation française”. This operation is usually not reversible; there is no converse method.

      Implementations shall not care about opening or closing quotes. The quotes will be doubled by the caller if needed after this method has been invoked.

      The default implementation invokes CharSequences.toASCII(CharSequence), replaces line feed and tabulations by single spaces, then remove control characters.

      Parameters:
      text - the text to format without non-ASCII characters.
      Returns:
      the text to write in Well Known Text.
      See Also:
    • toShortAxisName

      public String toShortAxisName(org.opengis.referencing.cs.CoordinateSystem cs, org.opengis.referencing.cs.AxisDirection direction, String name)
      Returns the axis name to format in WKT, or null if none. This method performs the mapping between the names of axes in memory (designated by "long axis names" in this class) and the names to format in the WKT (designated by "short axis names").
      Note: the "long axis names" are defined by ISO 19111 — referencing by coordinates while the "short axis names" are defined by ISO 19162 — Well-known text representation of coordinate reference systems.
      This method can return null if the name should be omitted. ISO 19162 recommends to omit the axis name when it is already given through the mandatory axis direction.

      The default implementation performs at least the following replacements:

      • Replace “Geodetic latitude” (case insensitive) by “Latitude”.
      • Replace “Geodetic longitude” (case insensitive) by “Longitude”.
      • Return null if the axis direction is AxisDirection.GEOCENTRIC_X, GEOCENTRIC_Y or GEOCENTRIC_Z and the name is the same than the axis direction (ignoring case).
      Parameters:
      cs - the enclosing coordinate system, or null if unknown.
      direction - the direction of the axis to format.
      name - the axis name, to be eventually replaced by this method.
      Returns:
      the axis name to format, or null if the name shall be omitted.
      See Also:
    • toLongAxisName

      public String toLongAxisName(String csType, org.opengis.referencing.cs.AxisDirection direction, String name)
      Returns the axis name to use in memory for an axis parsed from a WKT. Since this method is invoked before the CoordinateSystem instance is created, most coordinate system characteristics are known only as String. In particular the csType argument, if non-null, should be one of the following values:
      "affine", "Cartesian" (note the upper-case "C"), "cylindrical", "ellipsoidal", "linear", "parametric", "polar", "spherical", "temporal" or "vertical"
      This method is the converse of toShortAxisName(CoordinateSystem, AxisDirection, String). The default implementation performs at least the following replacements:
      • Replace “Lat” or “Latitude” (case insensitive) by “Geodetic latitude” or “Spherical latitude”, depending on whether the axis is part of an ellipsoidal or spherical CS respectively.
      • Replace “Lon”, “Long” or “Longitude” (case insensitive) by “Geodetic longitude” or “Spherical longitude”, depending on whether the axis is part of an ellipsoidal or spherical CS respectively.
      • Return “Geocentric X”, “Geocentric Y” and “Geocentric Z” for AxisDirection.GEOCENTRIC_X, GEOCENTRIC_Y and GEOCENTRIC_Z respectively in a Cartesian CS, if the given axis name is only an abbreviation.
      • Use unique camel-case names for axis names defined by ISO 19111 and ISO 19162. For example, this method replaces ellipsoidal height” by Ellipsoidal height”.
      Rational: Axis names are not really free text. They are specified by ISO 19111 and ISO 19162. SIS does not put restriction on axis names, but we nevertheless try to use a unique name when we recognize it.
      Parameters:
      csType - the type of the coordinate system, or null if unknown.
      direction - the parsed axis direction.
      name - the parsed axis abbreviation, to be eventually replaced by this method.
      Returns:
      the axis name to use. Cannot be null.
    • isLatLong

      private static boolean isLatLong(String expected, String name)
      Returns true if the given axis name is at least part of the given expected axis name.
      Parameters:
      expected - AxisNames.LATITUDE or AxisNames.LONGITUDE.
      name - the parsed axis name.
    • toLatinAbbreviation

      public String toLatinAbbreviation(org.opengis.referencing.cs.CoordinateSystem cs, org.opengis.referencing.cs.AxisDirection direction, String abbreviation)
      Returns the axis abbreviation to format in WKT, or null if none. The given abbreviation may contain Greek letters, in particular φ, λ and θ. This toLatinAbbreviation(…) method is responsible for replacing Greek letters by Latin letters for ISO 19162 compliance, if desired.

      The default implementation performs at least the following mapping:

      Note that while this method may return a string of any length, ISO 19162 requires abbreviations to be a single Latin character.
      Parameters:
      cs - the enclosing coordinate system, or null if unknown.
      direction - the direction of the axis to format.
      abbreviation - the axis abbreviation, to be eventually replaced by this method.
      Returns:
      the axis abbreviation to format.
      See Also:
    • toUnicodeAbbreviation

      public String toUnicodeAbbreviation(String csType, org.opengis.referencing.cs.AxisDirection direction, String abbreviation)
      Returns the axis abbreviation to use in memory for an axis parsed from a WKT. Since this method is invoked before the CoordinateSystem instance is created, most coordinate system characteristics are known only as String. In particular the csType argument, if non-null, should be one of the following values:
      "affine", "Cartesian" (note the upper-case "C"), "cylindrical", "ellipsoidal", "linear", "parametric", "polar", "spherical", "temporal" or "vertical"
      This method is the converse of toLatinAbbreviation(CoordinateSystem, AxisDirection, String). The default implementation performs at least the following mapping:
      • P or L → λ if csType is "ellipsoidal".
      • B → φ if csType is "ellipsoidal".
      • U → Ω if csType is "spherical", regardless of coordinate system convention.
      • V → θ if csType is "spherical", regardless of coordinate system convention.
      • U → θ if csType is "polar".
      Parameters:
      csType - the type of the coordinate system, or null if unknown.
      direction - the parsed axis direction.
      abbreviation - the parsed axis abbreviation, to be eventually replaced by this method.
      Returns:
      the axis abbreviation to use. Cannot be null.