Class URL2

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Comparable<URL2>

    public final class URL2
    extends java.lang.Object
    implements java.io.Serializable, java.lang.Comparable<URL2>
    A reimplementation of URL better tailored to our needs. This class performs some normalization on URL names, etc. In particular, it strips references.
    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      URL2​(URL2 context, java.lang.String spec)
      Creates a URL by parsing the given spec within a specified context.
      URL2​(java.lang.String spec)
      Creates a URL object from the String representation.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      int compareTo​(URL2 url)  
      boolean equals​(java.lang.Object obj)
      Compares two URLs.
      java.lang.String getAuthority()
      Returns the authority part of this URL.
      java.lang.String getDomain()
      Extracts domain name for a given URL.
      java.lang.String getFile()
      Returns the file name of this URL.
      java.lang.String getFileExtension()
      Returns the file name extension of this URL.
      java.lang.String getFragment()
      An alias for getRef().
      java.lang.String getHost()
      Returns the host name of this URL, if applicable.
      java.lang.String getPath()
      Returns the path part of this URL.
      int getPort()
      Returns the port number of this URL.
      java.lang.String getProtocol()
      Returns the protocol name of this URL.
      java.lang.String getQuery()
      Returns the query part of this URL.
      java.lang.String getRef()
      Returns the anchor (also known as the "reference") of this URL.
      java.lang.String getScheme()
      An alias for getProtocol().
      java.lang.String getUserInfo()
      Returns the userInfo part of this URL.
      int hashCode()  
      long hashCode64()  
      boolean isValid()  
      static java.lang.String normalizeURLFragment​(java.lang.String fragment)
      Normalizes a URL fragment.
      static java.lang.String normalizeURLFragment​(java.nio.charset.CharsetEncoder UTF8Encoder, java.lang.String fragment)
      Normalizes a URL fragment.
      protected void parseURL​(URL2 u, java.lang.String spec, int start, int limit)
      Parses the string representation of a URL into a URL object.
      protected void set​(java.lang.String protocol, java.lang.String host, int port, java.lang.String authority, java.lang.String userInfo, java.lang.String path, java.lang.String query, java.lang.String ref)
      Sets the specified 8 fields of the URL.
      java.lang.String toString()
      Constructs a string representation of this URL.
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • URL2

        public URL2​(java.lang.String spec)
        Creates a URL object from the String representation.

        This constructor is equivalent to a call to the two-argument constructor with a null first argument.

        Parameters:
        spec - the String to parse as a URL.
      • URL2

        public URL2​(URL2 context,
                    java.lang.String spec)
        Creates a URL by parsing the given spec within a specified context. The new URL is created from the given context URL and the spec argument as described in RFC2396 "Uniform Resource Identifiers : Generic Syntax" :
                  <scheme>://<authority><path>?<query>#<fragment>
         
        The reference is parsed into the scheme, authority, path, query and fragment parts. If the path component is empty and the scheme, authority, and query components are undefined, then the new URL is a reference to the current document. Otherwise the any fragment and query parts present in the spec are used in the new URL. If the scheme component is defined in the given spec and does not match the scheme of the context, then the new URL is created as an absolute URL based on the spec alone. Otherwise the scheme component is inherited from the context URL. If the authority component is present in the spec then the spec is treated as absolute and the spec authority and path will replace the context authority and path. If the authority component is absent in the spec then the authority of the new URL will be inherited from the context. If the spec's path component begins with a slash character "/" then the path is treated as absolute and the spec path replaces the context path. Otherwise the path is treated as a relative path and is appended to the context path. The path is canonicalized through the removal of directory changes made by occurences of ".." and ".". For a more detailed description of URL parsing, refer to RFC2396. NOTE: some sanitization is now performed on paths and queries. In particular, "//" sequences are collapsed in paths, and "/" is %-encoded in queries.
        Parameters:
        context - the context in which to parse the specification.
        spec - the String to parse as a URL.
    • Method Detail

      • normalizeURLFragment

        public static java.lang.String normalizeURLFragment​(java.nio.charset.CharsetEncoder UTF8Encoder,
                                                            java.lang.String fragment)
                                                     throws java.nio.charset.CharacterCodingException
        Normalizes a URL fragment.

        This method return the normalization of its argument. All character that are illegal are first UTF-8 encoded, and then represented with the %-notation.

        Parameters:
        UTF8Encoder - a (possibly cached) UTF-8 encoder.
        fragment - a URL fragment (possibly null).
        Returns:
        the normalized version.
        Throws:
        java.nio.charset.CharacterCodingException
      • normalizeURLFragment

        public static java.lang.String normalizeURLFragment​(java.lang.String fragment)
                                                     throws java.nio.charset.CharacterCodingException
        Normalizes a URL fragment.

        This method return the normalization of its argument. All character that are illegal are first UTF-8 encoded, and then represented with the %-notation.

        Parameters:
        fragment - a URL fragment (possibly null).
        Returns:
        the normalized version.
        Throws:
        java.nio.charset.CharacterCodingException
      • parseURL

        protected void parseURL​(URL2 u,
                                java.lang.String spec,
                                int start,
                                int limit)
        Parses the string representation of a URL into a URL object.

        If there is any inherited context, then it has already been copied into the URL argument.

        The parseURL method of URLStreamHandler parses the string representation as if it were an http specification. Most URL protocol families have a similar parsing. A stream protocol handler for a protocol that has a different syntax must override this routine.

        Parameters:
        u - the URL to receive the result of parsing the spec.
        spec - the String representing the URL that must be parsed.
        start - the character index at which to begin parsing. This is just past the ':' (if there is one) that specifies the determination of the protocol name.
        limit - the character position to stop parsing at. This is the end of the string or the position of the "#" character, if present. All information after the sharp sign indicates an anchor.
      • set

        protected void set​(java.lang.String protocol,
                           java.lang.String host,
                           int port,
                           java.lang.String authority,
                           java.lang.String userInfo,
                           java.lang.String path,
                           java.lang.String query,
                           java.lang.String ref)
        Sets the specified 8 fields of the URL. This is not a public method so that only URLStreamHandlers can modify URL fields. URLs are otherwise constant.
        Parameters:
        protocol - the name of the protocol to use
        host - the name of the host
        port - the port number on the host
        authority - the authority part for the url
        userInfo - the username and password
        path - the file on the host
        ref - the internal reference in the URL
        query - the query part of this URL
      • isValid

        public boolean isValid()
      • getQuery

        public java.lang.String getQuery()
        Returns the query part of this URL.
        Returns:
        the query part of this URL.
      • getPath

        public java.lang.String getPath()
        Returns the path part of this URL.
        Returns:
        the path part of this URL.
      • getUserInfo

        public java.lang.String getUserInfo()
        Returns the userInfo part of this URL.
        Returns:
        the userInfo part of this URL.
      • getAuthority

        public java.lang.String getAuthority()
        Returns the authority part of this URL.
        Returns:
        the authority part of this URL.
      • getPort

        public int getPort()
        Returns the port number of this URL. Returns -1 if the port is not set.
        Returns:
        the port number
      • getProtocol

        public java.lang.String getProtocol()
        Returns the protocol name of this URL.
        Returns:
        the protocol of this URL.
      • getScheme

        public java.lang.String getScheme()
        An alias for getProtocol().
        Returns:
        the protocol of this URL.
      • getHost

        public java.lang.String getHost()
        Returns the host name of this URL, if applicable.
        Returns:
        the host name of this URL.
      • getFile

        public java.lang.String getFile()
        Returns the file name of this URL.
        Returns:
        the file name of this URL.
      • getFileExtension

        public java.lang.String getFileExtension()
        Returns the file name extension of this URL. In case of file name is index.html, html will be returned but if no valid extension is found null will be returned.
        Returns:
        the file name extension of this URL.
      • getRef

        public java.lang.String getRef()
        Returns the anchor (also known as the "reference") of this URL.
        Returns:
        the anchor (also known as the "reference") of this URL.
      • getFragment

        public java.lang.String getFragment()
        An alias for getRef().
        Returns:
        the anchor (also known as the "reference") of this URL.
      • equals

        public boolean equals​(java.lang.Object obj)
        Compares two URLs. The result is true if and only if the argument is not null and is a URL object that represents the same URL as this object. Two URL objects are equal if they have the same protocol and reference the same host, the same port number on the host, and the same file and anchor on the host.
        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj - the URL to compare against.
        Returns:
        true if the objects are the same; false otherwise.
      • compareTo

        public int compareTo​(URL2 url)
        Specified by:
        compareTo in interface java.lang.Comparable<URL2>
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • hashCode64

        public long hashCode64()
      • toString

        public java.lang.String toString()
        Constructs a string representation of this URL.
        Overrides:
        toString in class java.lang.Object
        Returns:
        a string representation of this object.
      • getDomain

        public java.lang.String getDomain()
        Extracts domain name for a given URL. Very useful to avoid correlated-links. This method works by considering the right-most, most-significant and non-common suffix of a given URL. Examples:

        http://www.ox.ac.uk/ returns: ox.ac.uk http://something.somethingelse.web.com/ returns: somethingelse.web.com http://www.microsoft.com/ returns: microsoft.com http://www.dsi.unimi.it/ returns: unimi.it

        Returns:
        a String indicating the domain name.