Class ParsedIRI

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable

    public class ParsedIRI
    extends java.lang.Object
    implements java.lang.Cloneable, java.io.Serializable
    Represents an Internationalized Resource Identifier (IRI) reference.

    Aside from some minor deviations noted below, an instance of this class represents a IRI reference as defined by RFC 3987: Internationalized Resource Identifiers (IRI): IRI Syntax. This class provides constructors for creating IRI instances from their components or by parsing their string forms, methods for accessing the various components of an instance, and methods for normalizing, resolving, and relativizing IRI instances. Instances of this class are immutable.

    An IRI instance has the following seven components in string form has the syntax

    [scheme:][//[user-info@]host[:port]][path][?query][#fragment]

    In a given instance any particular component is either undefined or defined with a distinct value. Undefined string components are represented by null, while undefined integer components are represented by -1. A string component may be defined to have the empty string as its value; this is not equivalent to that component being undefined.

    Whether a particular component is or is not defined in an instance depends upon the type of the IRI being represented. An absolute IRI has a scheme component. An opaque IRI has a scheme, a scheme-specific part, and possibly a fragment, but has no other components. A hierarchical IRI always has a path (though it may be empty) and a scheme-specific-part (which at least contains the path), and may have any of the other components.

    IRIs, URIs, URLs, and URNs

    IRIs are meant to replace URIs in identifying resources for protocols, formats, and software components that use a UCS-based character repertoire.

    Internationalized Resource Identifier (IRI) is a complement to the Uniform Resource Identifier (URI). An IRI is a sequence of characters from the Universal Character Set (Unicode/ISO 10646). A mapping from IRIs to URIs is defined using toASCIIString(), which means that IRIs can be used instead of URIs, where appropriate, to identify resources. While all URIs are also IRIs, the normalize() method can be used to convert a URI back into a normalized IRI.

    A URI is a uniform resource identifier while a URL is a uniform resource locator. Hence every URL is a URI, abstractly speaking, but not every URI is a URL. This is because there is another subcategory of URIs, uniform resource names (URNs), which name resources but do not specify how to locate them. The mailto, news, and isbn URIs shown above are examples of URNs.

    Deviations

    jar: This implementation treats the first colon as part of the scheme if the scheme starts with "jar:". For example the IRI jar:http://www.foo.com/bar/jar.jar!/baz/entry.txt is parsed with the scheme jar:http and the path /bar/jar.jar!/baz/entry.txt.

    Since:
    2.3
    See Also:
    RFC 3987: Internationalized Resource Identifiers (IRIs), RFC 3986: Uniform Resource Identifiers (URI): Generic Syntax, Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static int[][] ALPHA  
      private static int[] ascii  
      private static java.util.Comparator<int[]> CMP  
      private static int[] common  
      private static java.lang.String[] common_pct  
      private static int[][] DIGIT  
      private static int EOF  
      private static int[][] fchar  
      private java.lang.String fragment  
      private static int[][] gen_delims  
      private static int[][] hchar  
      private static int[] HEXDIG  
      private java.lang.String host  
      private static int[][] iprivate  
      private java.lang.String iri  
      private java.lang.String path  
      private static int[][] pchar  
      private int port  
      private int pos  
      private static int[][] qchar  
      private java.lang.String query  
      private static int[][] reserved  
      private static int[][] schar  
      private java.lang.String scheme  
      private static long serialVersionUID  
      private static int[][] sub_delims  
      private static int[][] uchar  
      private static int[][] ucschar  
      private static int[][] unreserved  
      private static int[][] unreserved_rfc3986  
      private java.lang.String userInfo  
    • Constructor Summary

      Constructors 
      Constructor Description
      ParsedIRI​(java.lang.String iri)
      Constructs a ParsedIRI by parsing the given string.
      ParsedIRI​(java.lang.String scheme, java.lang.String userInfo, java.lang.String host, int port, java.lang.String path, java.lang.String query, java.lang.String fragment)
      Constructs a hierarchical IRI from the given components.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void advance​(int ahead)  
      private void appendAscii​(java.lang.StringBuilder sb, java.lang.String input)  
      private java.lang.String buildIRI​(java.lang.String scheme, java.lang.String userInfo, java.lang.String host, int port, java.lang.String path, java.lang.String query, java.lang.String fragment)  
      static ParsedIRI create​(java.lang.String str)
      Creates a ParsedIRI by parsing the given string.
      boolean equals​(java.lang.Object obj)
      Tests this IRI for simple string comparison with another object.
      private java.net.URISyntaxException error​(java.lang.String reason)  
      private static int[] flatten​(int[]... arrays)  
      java.lang.String getFragment()
      Returns the raw fragment component of this IRI after the hash.
      java.lang.String getHost()
      Returns the host component of this IRI.
      java.lang.String getPath()
      Returns the raw path component of this IRI.
      int getPort()
      Returns the port number of this IRI.
      java.lang.String getQuery()
      Returns the raw query component of this IRI after the first question mark.
      java.lang.String getScheme()
      Returns the scheme component of this IRI.
      java.lang.String getUserInfo()
      Returns the raw user-information component of this IRI.
      int hashCode()  
      boolean isAbsolute()
      Tells whether or not this IRI is absolute.
      private boolean isMember​(int[][] set, int chr)  
      boolean isOpaque()
      Tells whether or not this IRI is opaque.
      private boolean isScheme​(java.lang.String scheme)  
      private boolean isTLDValid​(int hostStartPos)  
      private java.lang.String[] listPctEncodings​(java.lang.String path)  
      ParsedIRI normalize()
      Normalizes this IRI's components.
      private java.lang.String normalizePath​(java.lang.String path)  
      private java.lang.String normalizePctEncoding​(java.lang.String encoded)  
      private void parse()  
      private java.lang.String parseHost()  
      private java.lang.String parseMember​(int[][] set, int end)  
      private java.lang.String parsePath()  
      private java.lang.String parsePctEncoded​(int[][] set, int end1, int end2)  
      private java.lang.String parseScheme()  
      private java.lang.String parseUserInfo()  
      private java.lang.String pathSegmentNormalization​(java.lang.String _path)
      Normalizes the path of this URI if it has one.
      private java.lang.String pctDecode​(java.lang.String encoded)  
      private java.lang.String pctEncode​(int chr)  
      private static java.lang.String[] pctEncode​(int[] unencoded)  
      private java.lang.String pctEncodingNormalization​(java.lang.String path)  
      private int peek()  
      private int peek​(int ahead)  
      java.lang.String relativize​(java.lang.String iri)
      Relativizes the given IRI against this ParsedIRI.
      ParsedIRI relativize​(ParsedIRI absolute)
      Relativizes the given IRI against this ParsedIRI.
      private java.lang.String relativizePath​(java.lang.String absolute)  
      java.lang.String resolve​(java.lang.String iri)
      Resolves the given IRI against this ParsedIRI.
      ParsedIRI resolve​(ParsedIRI relative)
      Resolves the given IRI against this ParsedIRI.
      java.lang.String toASCIIString()
      Returns the content of this IRI as a US-ASCII string.
      private java.lang.String toLowerCase​(java.lang.String string)  
      java.lang.String toString()
      Returns the content of this IRI as a string.
      private java.lang.String toUpperCase​(java.lang.String string)  
      private static int[][] union​(java.lang.Object... sets)  
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Field Detail

      • CMP

        private static final java.util.Comparator<int[]> CMP
      • iprivate

        private static final int[][] iprivate
      • ucschar

        private static final int[][] ucschar
      • ALPHA

        private static final int[][] ALPHA
      • DIGIT

        private static final int[][] DIGIT
      • sub_delims

        private static final int[][] sub_delims
      • gen_delims

        private static final int[][] gen_delims
      • reserved

        private static final int[][] reserved
      • unreserved_rfc3986

        private static final int[][] unreserved_rfc3986
      • unreserved

        private static final int[][] unreserved
      • schar

        private static final int[][] schar
      • uchar

        private static final int[][] uchar
      • hchar

        private static final int[][] hchar
      • pchar

        private static final int[][] pchar
      • qchar

        private static final int[][] qchar
      • fchar

        private static final int[][] fchar
      • HEXDIG

        private static final int[] HEXDIG
      • ascii

        private static final int[] ascii
      • common

        private static final int[] common
      • common_pct

        private static final java.lang.String[] common_pct
      • iri

        private final java.lang.String iri
      • pos

        private int pos
      • scheme

        private java.lang.String scheme
      • userInfo

        private java.lang.String userInfo
      • host

        private java.lang.String host
      • port

        private int port
      • path

        private java.lang.String path
      • query

        private java.lang.String query
      • fragment

        private java.lang.String fragment
    • Constructor Detail

      • ParsedIRI

        public ParsedIRI​(java.lang.String iri)
                  throws java.net.URISyntaxException
        Constructs a ParsedIRI by parsing the given string.
        Parameters:
        iri - The string to be parsed into a IRI
        Throws:
        java.lang.NullPointerException - If iri is null
        java.net.URISyntaxException - If the given string violates RFC 3987, as augmented by the above deviations
      • ParsedIRI

        public ParsedIRI​(java.lang.String scheme,
                         java.lang.String userInfo,
                         java.lang.String host,
                         int port,
                         java.lang.String path,
                         java.lang.String query,
                         java.lang.String fragment)
        Constructs a hierarchical IRI from the given components.

        This constructor first builds a IRI string from the given components according to the rules specified in RFC 3987

        Parameters:
        scheme - Scheme name
        userInfo - User name and authorization information
        host - Host name
        port - Port number
        path - Path
        query - Query
        fragment - Fragment
    • Method Detail

      • union

        private static int[][] union​(java.lang.Object... sets)
      • flatten

        private static int[] flatten​(int[]... arrays)
      • pctEncode

        private static java.lang.String[] pctEncode​(int[] unencoded)
      • create

        public static ParsedIRI create​(java.lang.String str)
        Creates a ParsedIRI by parsing the given string.

        This convenience factory method works as if by invoking the ParsedIRI(String) constructor; any URISyntaxException thrown by the constructor is caught and the error code point is percent encoded. This process is repeated until a syntactically valid IRI is formed or a IllegalArgumentException is thrown.

        This method is provided for use in situations where it is known that the given string is an IRI, even if it is not completely syntactically valid, for example a IRI constants declared within in a program. The constructors, which throw URISyntaxException directly, should be used situations where a IRI is being constructed from user input or from some other source that may be prone to errors.

        Parameters:
        str - The string to be parsed into an IRI
        Returns:
        The new ParsedIRI
        Throws:
        java.lang.NullPointerException - If str is null
        java.lang.IllegalArgumentException - If the given string could not be converted into an IRI
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • equals

        public boolean equals​(java.lang.Object obj)
        Tests this IRI for simple string comparison with another object.

        If two IRI strings are identical, then it is safe to conclude that they are equivalent. However, even if the IRI strings are not identical the IRIs might still be equivalent. Further comparison can be made using the normalize() forms.

        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj - The object to which this object is to be compared
        Returns:
        true if the given object is a ParsedIRI that represents the same IRI
      • toString

        public java.lang.String toString()
        Returns the content of this IRI as a string.

        If this URI was created by invoking one of the constructors in this class then a string equivalent to the original input string, or to the string computed from the originally-given components, as appropriate, is returned. Otherwise this IRI was created by normalization, resolution, or relativization, and so a string is constructed from this IRI's components according to the rules specified in RFC 3987

        Overrides:
        toString in class java.lang.Object
        Returns:
        The string form of this IRI
      • toASCIIString

        public java.lang.String toASCIIString()
        Returns the content of this IRI as a US-ASCII string.

        If this IRI only contains 8bit characters then an invocation of this method will return the same value as an invocation of the toString method. Otherwise this method works as if by encoding the host via RFC 3490 and all other components by percent encoding their UTF-8 values.

        Returns:
        The string form of this IRI, encoded as needed so that it only contains characters in the US-ASCII charset
      • isAbsolute

        public boolean isAbsolute()
        Tells whether or not this IRI is absolute.
        Returns:
        true if, and only if, this IRI has a scheme component
      • isOpaque

        public boolean isOpaque()
        Tells whether or not this IRI is opaque.

        A IRI is opaque if, and only if, it is absolute and its path part does not begin with a slash character ('/'). An opaque IRI has a scheme, a path, and possibly a query or fragment; all other components (userInfo, host, and port) are undefined.

        Returns:
        true if, and only if, this IRI is absolute and its path does not start with a slash
      • getScheme

        public java.lang.String getScheme()
        Returns the scheme component of this IRI.

        The scheme component of a IRI, if defined, only contains characters in the alphanum category and in the string "-.+", unless the scheme starts with "jar:", in which case it may also contain one colon. A scheme always starts with an alpha character.

        The scheme component of a IRI cannot contain escaped octets.

        Returns:
        The scheme component of this IRI, or null if the scheme is undefined
      • getUserInfo

        public java.lang.String getUserInfo()
        Returns the raw user-information component of this IRI.
        Returns:
        The raw user-information component of this IRI, or null if the user information is undefined
      • getHost

        public java.lang.String getHost()
        Returns the host component of this IRI.
        Returns:
        The host component of this IRI, or null if the host is undefined
      • getPort

        public int getPort()
        Returns the port number of this IRI.

        The port component of a IRI, if defined, is a non-negative integer.

        Returns:
        The port component of this IRI, or -1 if the port is undefined
      • getPath

        public java.lang.String getPath()
        Returns the raw path component of this IRI.
        Returns:
        The path component of this IRI (never null)
      • getQuery

        public java.lang.String getQuery()
        Returns the raw query component of this IRI after the first question mark.

        The query component of a IRI, if defined, only contains legal IRI characters.

        Returns:
        The raw query component of this IRI, or null if the IRI does not contain a question mark
      • getFragment

        public java.lang.String getFragment()
        Returns the raw fragment component of this IRI after the hash.

        The fragment component of a IRI, if defined, only contains legal IRI characters and does not contain a hash.

        Returns:
        The raw fragment component of this IRI, or null if the IRI does not contain a hash
      • normalize

        public ParsedIRI normalize()
        Normalizes this IRI's components.

        Because IRIs exist to identify resources, presumably they should be considered equivalent when they identify the same resource. However, this definition of equivalence is not of much practical use, as there is no way for an implementation to compare two resources unless it has full knowledge or control of them. Therefore, IRI normalization is designed to minimize false negatives while strictly avoiding false positives.

        Case Normalization the hexadecimal digits within a percent-encoding triplet (e.g., "%3a" versus "%3A") are case-insensitive and are normalized to use uppercase letters for the digits A - F. The scheme and host are case insensitive and are normalized to lowercase.

        Character Normalization The Unicode Standard defines various equivalences between sequences of characters for various purposes. Unicode Standard Annex defines various Normalization Forms for these equivalences and is applied to the IRI components.

        Percent-Encoding Normalization decodes any percent-encoded octet sequence that corresponds to an unreserved character anywhere in the IRI.

        Path Segment Normalization is the process of removing unnecessary "." and ".." segments from the path component of a hierarchical IRI. Each "." segment is simply removed. A ".." segment is removed only if it is preceded by a non-".." segment or the start of the path.

        HTTP(S) Scheme Normalization if the port uses the default port number or not given it is set to undefined. An empty path is replaced with "/".

        File Scheme Normalization if the host is "localhost" or empty it is set to undefined.

        Internationalized Domain Name Normalization of the host component to Unicode.

        Returns:
        normalized IRI
      • resolve

        public java.lang.String resolve​(java.lang.String iri)
        Resolves the given IRI against this ParsedIRI.
        Parameters:
        iri - The IRI to be resolved against this ParsedIRI
        Returns:
        The resulting IRI
        Throws:
        java.lang.NullPointerException - If relative is null
        See Also:
        resolve(ParsedIRI)
      • resolve

        public ParsedIRI resolve​(ParsedIRI relative)
        Resolves the given IRI against this ParsedIRI.

        Resolution is the process of resolving one IRI against another, base IRI. The resulting IRI is constructed from components of both IRIs in the manner specified by RFC 3986, taking components from the base IRI for those not specified in the original. For hierarchical IRIs, the path of the original is resolved against the path of the base and then normalized.

        If the given IRI is already absolute, or if this IRI is opaque, then the given IRI is returned.

        If the given URI's fragment component is defined, its path component is empty, and its scheme, authority, and query components are undefined, then a URI with the given fragment but with all other components equal to those of this URI is returned. This allows an IRI representing a standalone fragment reference, such as "#foo", to be usefully resolved against a base IRI.

        Otherwise this method constructs a new hierarchical IRI in a manner consistent with RFC 3987

        The result of this method is absolute if, and only if, either this IRI is absolute or the given IRI is absolute.

        Parameters:
        relative - The IRI to be resolved against this ParsedIRI
        Returns:
        The resulting IRI
        Throws:
        java.lang.NullPointerException - If relative is null
      • relativize

        public java.lang.String relativize​(java.lang.String iri)
        Relativizes the given IRI against this ParsedIRI.
        Parameters:
        iri - The IRI to be relativized against this ParsedIRI
        Returns:
        The resulting IRI
        Throws:
        java.lang.NullPointerException - If absolute is null
        See Also:
        relativize(ParsedIRI)
      • relativize

        public ParsedIRI relativize​(ParsedIRI absolute)
        Relativizes the given IRI against this ParsedIRI.

        Relativization is the inverse of resolution. This operation is often useful when constructing a document containing IRIs that must be made relative to the base IRI of the document wherever possible.

        The relativization of the given URI against this URI is computed as follows:

        1. If either this IRI or the given IRI are opaque, or if the scheme and authority components of the two IRIs are not identical, or if the path of this IRI is not a prefix of the path of the given URI, then the given IRI is returned.

        2. Otherwise a new relative hierarchical IRI is constructed with query and fragment components taken from the given IRI and with a path component computed by removing this IRI's path from the beginning of the given IRI's path.

        Parameters:
        absolute - The IRI to be relativized against this ParsedIRI
        Returns:
        The resulting IRI
        Throws:
        java.lang.NullPointerException - If absolute is null
      • parse

        private void parse()
                    throws java.net.URISyntaxException
        Throws:
        java.net.URISyntaxException
      • buildIRI

        private java.lang.String buildIRI​(java.lang.String scheme,
                                          java.lang.String userInfo,
                                          java.lang.String host,
                                          int port,
                                          java.lang.String path,
                                          java.lang.String query,
                                          java.lang.String fragment)
      • parseScheme

        private java.lang.String parseScheme()
      • parseUserInfo

        private java.lang.String parseUserInfo()
                                        throws java.net.URISyntaxException
        Throws:
        java.net.URISyntaxException
      • parseHost

        private java.lang.String parseHost()
                                    throws java.net.URISyntaxException
        Throws:
        java.net.URISyntaxException
      • isTLDValid

        private boolean isTLDValid​(int hostStartPos)
      • parsePath

        private java.lang.String parsePath()
                                    throws java.net.URISyntaxException
        Throws:
        java.net.URISyntaxException
      • parsePctEncoded

        private java.lang.String parsePctEncoded​(int[][] set,
                                                 int end1,
                                                 int end2)
                                          throws java.net.URISyntaxException
        Throws:
        java.net.URISyntaxException
      • parseMember

        private java.lang.String parseMember​(int[][] set,
                                             int end)
      • isMember

        private boolean isMember​(int[][] set,
                                 int chr)
      • peek

        private int peek()
      • peek

        private int peek​(int ahead)
      • advance

        private void advance​(int ahead)
      • error

        private java.net.URISyntaxException error​(java.lang.String reason)
      • appendAscii

        private void appendAscii​(java.lang.StringBuilder sb,
                                 java.lang.String input)
      • toLowerCase

        private java.lang.String toLowerCase​(java.lang.String string)
      • toUpperCase

        private java.lang.String toUpperCase​(java.lang.String string)
      • isScheme

        private boolean isScheme​(java.lang.String scheme)
      • normalizePath

        private java.lang.String normalizePath​(java.lang.String path)
      • pctEncodingNormalization

        private java.lang.String pctEncodingNormalization​(java.lang.String path)
      • listPctEncodings

        private java.lang.String[] listPctEncodings​(java.lang.String path)
      • normalizePctEncoding

        private java.lang.String normalizePctEncoding​(java.lang.String encoded)
      • pctDecode

        private java.lang.String pctDecode​(java.lang.String encoded)
      • pctEncode

        private java.lang.String pctEncode​(int chr)
      • pathSegmentNormalization

        private java.lang.String pathSegmentNormalization​(java.lang.String _path)
        Normalizes the path of this URI if it has one. Normalizing a path means that any unnecessary '.' and '..' segments are removed. For example, the URI http://server.com/a/b/../c/./d would be normalized to http://server.com/a/c/d. A URI doens't have a path if it is opaque.
      • relativizePath

        private java.lang.String relativizePath​(java.lang.String absolute)