Class URIUtil


  • public class URIUtil
    extends java.lang.Object
    Utility functions for working with URIs.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static char[] LOCAL_ESCAPED_CHARS  
      private static java.util.Set<java.lang.Character> mark
      Punctuation mark characters, which are part of the set of unreserved chars and therefore allowed to occur in unescaped form.
      private static java.util.Set<java.lang.Character> reserved
      Reserved characters: their usage within the URI component is limited to their reserved purpose.
      private static java.util.regex.Pattern unicodeControlCharPattern
      Regular expression pattern for matching unicode control characters.
    • Constructor Summary

      Constructors 
      Constructor Description
      URIUtil()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static java.lang.String escapeExcludedChars​(java.lang.String unescaped)
      Escapes any character that is not either reserved or in the legal range of unreserved characters, according to RFC 2396.
      static int getLocalNameIndex​(java.lang.String uri)
      Finds the index of the first local name character in an (non-relative) URI.
      static boolean isCorrectURISplit​(java.lang.String namespace, java.lang.String localName)
      Checks whether the URI consisting of the specified namespace and local name has been split correctly according to the URI splitting rules specified in URI.
      private static boolean isNameChar​(int codePoint)
      Check if the supplied code point represents a valid name character.
      private static boolean isNameStartChar​(int codePoint)
      Check if the supplied code point represents a valid name start character.
      private static boolean isPERCENT​(java.lang.String name)  
      private static boolean isPLX_START​(java.lang.String name)  
      private static boolean isPN_CHARS​(int codePoint)
      Check if the supplied code point represents a valid prefixed name character.
      private static boolean isPN_CHARS_BASE​(int codePoint)
      Check if the supplied code point represents a valid prefixed name base character.
      private static boolean isPN_CHARS_U​(int codePoint)
      Check if the supplied code point represents either a valid prefixed name base character or an underscore.
      private static boolean isPN_LOCAL_ESC​(java.lang.String name)  
      private static boolean isUnreserved​(char c)
      A character is unreserved according to RFC 2396 if it is either an alphanumeric char or a punctuation mark.
      static boolean isValidLocalName​(java.lang.String name)
      Checks whether the specified name is allowed as the local name part of an IRI according to the SPARQL 1.1/Turtle 1.1 spec.
      static boolean isValidURIReference​(java.lang.String uriRef)
      Verifies that the supplied string is a valid RDF (1.0) URI reference, as defined in section 6.4 of the RDF Concepts and Abstract Syntax specification (RDF 1.0 Recommendation of February 10, 2004).
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • reserved

        private static final java.util.Set<java.lang.Character> reserved
        Reserved characters: their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. http://www.isi.edu/in-notes/rfc2396.txt section 2.2.
      • mark

        private static final java.util.Set<java.lang.Character> mark
        Punctuation mark characters, which are part of the set of unreserved chars and therefore allowed to occur in unescaped form. See http://www.isi.edu/in-notes/rfc2396.txt
      • unicodeControlCharPattern

        private static final java.util.regex.Pattern unicodeControlCharPattern
        Regular expression pattern for matching unicode control characters.
      • LOCAL_ESCAPED_CHARS

        private static final char[] LOCAL_ESCAPED_CHARS
    • Constructor Detail

      • URIUtil

        public URIUtil()
    • Method Detail

      • getLocalNameIndex

        public static int getLocalNameIndex​(java.lang.String uri)
        Finds the index of the first local name character in an (non-relative) URI. This index is determined by the following the following steps:
        • Find the first occurrence of the '#' character,
        • If this fails, find the last occurrence of the '/' character,
        • If this fails, find the last occurrence of the ':' character.
        • Add 1 to the found index and return this value.
        Note that the third step should never fail as every legal (non-relative) URI contains at least one ':' character to seperate the scheme from the rest of the URI. If this fails anyway, the method will throw an IllegalArgumentException.
        Parameters:
        uri - A URI string.
        Returns:
        The index of the first local name character in the URI string. Note that this index does not reference an actual character if the algorithm determines that there is not local name. In that case, the return index is equal to the length of the URI string.
        Throws:
        java.lang.IllegalArgumentException - If the supplied URI string doesn't contain any of the separator characters. Every legal (non-relative) URI contains at least one ':' character to seperate the scheme from the rest of the URI.
      • isCorrectURISplit

        public static boolean isCorrectURISplit​(java.lang.String namespace,
                                                java.lang.String localName)
        Checks whether the URI consisting of the specified namespace and local name has been split correctly according to the URI splitting rules specified in URI.
        Parameters:
        namespace - The URI's namespace, must not be null.
        localName - The URI's local name, must not be null.
        Returns:
        true if the specified URI has been correctly split into a namespace and local name, false otherwise.
        See Also:
        URI, getLocalNameIndex(String)
      • isValidURIReference

        public static boolean isValidURIReference​(java.lang.String uriRef)
        Verifies that the supplied string is a valid RDF (1.0) URI reference, as defined in section 6.4 of the RDF Concepts and Abstract Syntax specification (RDF 1.0 Recommendation of February 10, 2004).

        An RDF URI reference is valid if it is a Unicode string that:

        • does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
        • and would produce a valid URI character sequence (per RFC2396 , section 2.1) representing an absolute URI with optional fragment identifier when subjected to the encoding described below
        The encoding consists of:
        1. encoding the Unicode string as UTF-8, giving a sequence of octet values.
        2. %-escaping octets that do not correspond to permitted US-ASCII characters.
        Parameters:
        uriRef - a string representing an RDF URI reference.
        Returns:
        true iff the supplied string is a syntactically valid RDF URI reference, false otherwise.
        See Also:
        section 6.4 of the RDF Concepts and Abstract Syntax specification, RFC 3986, RFC 2396
      • escapeExcludedChars

        private static java.lang.String escapeExcludedChars​(java.lang.String unescaped)
        Escapes any character that is not either reserved or in the legal range of unreserved characters, according to RFC 2396.
        Parameters:
        unescaped - a (relative or absolute) uri reference.
        Returns:
        a (relative or absolute) uri reference with all characters that can not appear as-is in a URI %-escaped.
        See Also:
        RFC 2396
      • isUnreserved

        private static boolean isUnreserved​(char c)
        A character is unreserved according to RFC 2396 if it is either an alphanumeric char or a punctuation mark.
      • isValidLocalName

        public static boolean isValidLocalName​(java.lang.String name)
        Checks whether the specified name is allowed as the local name part of an IRI according to the SPARQL 1.1/Turtle 1.1 spec.
        Parameters:
        name - the candidate local name
        Returns:
        true if it is a local name
      • isPN_CHARS_U

        private static boolean isPN_CHARS_U​(int codePoint)
        Check if the supplied code point represents either a valid prefixed name base character or an underscore.

        From Turtle Spec:

        http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS_U

        [164s] PN_CHARS_U ::= PN_CHARS_BASE | '_'

      • isPLX_START

        private static boolean isPLX_START​(java.lang.String name)
      • isPERCENT

        private static boolean isPERCENT​(java.lang.String name)
      • isPN_LOCAL_ESC

        private static boolean isPN_LOCAL_ESC​(java.lang.String name)
      • isPN_CHARS_BASE

        private static boolean isPN_CHARS_BASE​(int codePoint)
        Check if the supplied code point represents a valid prefixed name base character.

        From Turtle Spec:

        http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS_BASE

        [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

      • isNameStartChar

        private static boolean isNameStartChar​(int codePoint)
        Check if the supplied code point represents a valid name start character.
        Parameters:
        codePoint - a Unicode code point.
        Returns:
        true if the supplied code point represents a valid name start char, false otherwise.
      • isNameChar

        private static boolean isNameChar​(int codePoint)
        Check if the supplied code point represents a valid name character.
        Parameters:
        codePoint - a Unicode code point.
        Returns:
        true if the supplied code point represents a valid name char, false otherwise.
      • isPN_CHARS

        private static boolean isPN_CHARS​(int codePoint)
        Check if the supplied code point represents a valid prefixed name character.

        From Turtle Spec:

        http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS

        [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]