Class URIUtil

java.lang.Object
org.eclipse.rdf4j.model.util.URIUtil

public class URIUtil extends Object
Utility functions for working with URIs.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static final char[]
     
    private static final Set<Character>
    Punctuation mark characters, which are part of the set of unreserved chars and therefore allowed to occur in unescaped form.
    private static final Set<Character>
    Reserved characters: their usage within the URI component is limited to their reserved purpose.
    private static final Pattern
    Regular expression pattern for matching unicode control characters.
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private static String
    Escapes any character that is not either reserved or in the legal range of unreserved characters, according to RFC 2396.
    static int
    Finds the index of the first local name character in an (non-relative) URI.
    static boolean
    isCorrectURISplit(String namespace, String localName)
    Checks whether the URI consisting of the specified namespace and local name has been split correctly according to the URI splitting rules specified in URI.
    private static boolean
    isNameChar(int codePoint)
    Check if the supplied code point represents a valid name character.
    private static boolean
    isNameStartChar(int codePoint)
    Check if the supplied code point represents a valid name start character.
    private static boolean
     
    private static boolean
     
    private static boolean
    isPN_CHARS(int codePoint)
    Check if the supplied code point represents a valid prefixed name character.
    private static boolean
    isPN_CHARS_BASE(int codePoint)
    Check if the supplied code point represents a valid prefixed name base character.
    private static boolean
    isPN_CHARS_U(int codePoint)
    Check if the supplied code point represents either a valid prefixed name base character or an underscore.
    private static boolean
     
    private static boolean
    isUnreserved(char c)
    A character is unreserved according to RFC 2396 if it is either an alphanumeric char or a punctuation mark.
    static boolean
    Checks whether the specified name is allowed as the local name part of an IRI according to the SPARQL 1.1/Turtle 1.1 spec.
    static boolean
    Verifies that the supplied string is a valid RDF (1.0) URI reference, as defined in section 6.4 of the RDF Concepts and Abstract Syntax specification (RDF 1.0 Recommendation of February 10, 2004).

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • reserved

      private static final Set<Character> reserved
      Reserved characters: their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI. http://www.isi.edu/in-notes/rfc2396.txt section 2.2.
    • mark

      private static final Set<Character> mark
      Punctuation mark characters, which are part of the set of unreserved chars and therefore allowed to occur in unescaped form. See http://www.isi.edu/in-notes/rfc2396.txt
    • unicodeControlCharPattern

      private static final Pattern unicodeControlCharPattern
      Regular expression pattern for matching unicode control characters.
    • LOCAL_ESCAPED_CHARS

      private static final char[] LOCAL_ESCAPED_CHARS
  • Constructor Details

    • URIUtil

      public URIUtil()
  • Method Details

    • getLocalNameIndex

      public static int getLocalNameIndex(String uri)
      Finds the index of the first local name character in an (non-relative) URI. This index is determined by the following the following steps:
      • Find the first occurrence of the '#' character,
      • If this fails, find the last occurrence of the '/' character,
      • If this fails, find the last occurrence of the ':' character.
      • Add 1 to the found index and return this value.
      Note that the third step should never fail as every legal (non-relative) URI contains at least one ':' character to seperate the scheme from the rest of the URI. If this fails anyway, the method will throw an IllegalArgumentException.
      Parameters:
      uri - A URI string.
      Returns:
      The index of the first local name character in the URI string. Note that this index does not reference an actual character if the algorithm determines that there is not local name. In that case, the return index is equal to the length of the URI string.
      Throws:
      IllegalArgumentException - If the supplied URI string doesn't contain any of the separator characters. Every legal (non-relative) URI contains at least one ':' character to seperate the scheme from the rest of the URI.
    • isCorrectURISplit

      public static boolean isCorrectURISplit(String namespace, String localName)
      Checks whether the URI consisting of the specified namespace and local name has been split correctly according to the URI splitting rules specified in URI.
      Parameters:
      namespace - The URI's namespace, must not be null.
      localName - The URI's local name, must not be null.
      Returns:
      true if the specified URI has been correctly split into a namespace and local name, false otherwise.
      See Also:
    • isValidURIReference

      public static boolean isValidURIReference(String uriRef)
      Verifies that the supplied string is a valid RDF (1.0) URI reference, as defined in section 6.4 of the RDF Concepts and Abstract Syntax specification (RDF 1.0 Recommendation of February 10, 2004).

      An RDF URI reference is valid if it is a Unicode string that:

      • does not contain any control characters ( #x00 - #x1F, #x7F-#x9F)
      • and would produce a valid URI character sequence (per RFC2396 , section 2.1) representing an absolute URI with optional fragment identifier when subjected to the encoding described below
      The encoding consists of:
      1. encoding the Unicode string as UTF-8, giving a sequence of octet values.
      2. %-escaping octets that do not correspond to permitted US-ASCII characters.
      Parameters:
      uriRef - a string representing an RDF URI reference.
      Returns:
      true iff the supplied string is a syntactically valid RDF URI reference, false otherwise.
      See Also:
    • escapeExcludedChars

      private static String escapeExcludedChars(String unescaped)
      Escapes any character that is not either reserved or in the legal range of unreserved characters, according to RFC 2396.
      Parameters:
      unescaped - a (relative or absolute) uri reference.
      Returns:
      a (relative or absolute) uri reference with all characters that can not appear as-is in a URI %-escaped.
      See Also:
    • isUnreserved

      private static boolean isUnreserved(char c)
      A character is unreserved according to RFC 2396 if it is either an alphanumeric char or a punctuation mark.
    • isValidLocalName

      public static boolean isValidLocalName(String name)
      Checks whether the specified name is allowed as the local name part of an IRI according to the SPARQL 1.1/Turtle 1.1 spec.
      Parameters:
      name - the candidate local name
      Returns:
      true if it is a local name
    • isPN_CHARS_U

      private static boolean isPN_CHARS_U(int codePoint)
      Check if the supplied code point represents either a valid prefixed name base character or an underscore.

      From Turtle Spec:

      http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS_U

      [164s] PN_CHARS_U ::= PN_CHARS_BASE | '_'

    • isPLX_START

      private static boolean isPLX_START(String name)
    • isPERCENT

      private static boolean isPERCENT(String name)
    • isPN_LOCAL_ESC

      private static boolean isPN_LOCAL_ESC(String name)
    • isPN_CHARS_BASE

      private static boolean isPN_CHARS_BASE(int codePoint)
      Check if the supplied code point represents a valid prefixed name base character.

      From Turtle Spec:

      http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS_BASE

      [163s] PN_CHARS_BASE ::= [A-Z] | [a-z] | [#x00C0-#x00D6] | [#x00D8-#x00F6] | [#x00F8-#x02FF] | [#x0370-#x037D] | [#x037F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

    • isNameStartChar

      private static boolean isNameStartChar(int codePoint)
      Check if the supplied code point represents a valid name start character.
      Parameters:
      codePoint - a Unicode code point.
      Returns:
      true if the supplied code point represents a valid name start char, false otherwise.
    • isNameChar

      private static boolean isNameChar(int codePoint)
      Check if the supplied code point represents a valid name character.
      Parameters:
      codePoint - a Unicode code point.
      Returns:
      true if the supplied code point represents a valid name char, false otherwise.
    • isPN_CHARS

      private static boolean isPN_CHARS(int codePoint)
      Check if the supplied code point represents a valid prefixed name character.

      From Turtle Spec:

      http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS

      [166s] PN_CHARS ::= PN_CHARS_U | '-' | [0-9] | #x00B7 | [#x0300-#x036F] | [#x203F-#x2040]