Class HDTParser

  • All Implemented Interfaces:
    RDFParser

    public class HDTParser
    extends AbstractRDFParser
    RDF parser for HDT v1.0 files. This parser is not thread-safe, therefore its public methods are synchronized.

    Unfortunately the draft specification is not entirely clear and probably slightly out of date, since the open source reference implementation HDT-It seems to implement a slightly different version. This parser tries to be compatible with HDT-It 1.0.

    The most important parts are the Dictionaries containing the actual values (S, P, O part of a triple), and the Triples containing the numeric references to construct the triples.

    Since objects in one triple are often subjects in another triple, these "shared" parts are stored in a shared Dictionary, which may significantly reduce the file size.

    File structure:

     +---------------------+
     | Global              |
     | Header              |
     | Dictionary (Shared) |
     | Dictionary (S)      |
     | Dictionary (P)      |
     | Dictionary (O)      |
     | Triples             |
     +---------------------+
     
    See Also:
    HDT draft (2015), W3C Member Submission (2011)
    • Constructor Detail

      • HDTParser

        public HDTParser()
        Creates a new HDTParser that will use a SimpleValueFactory to create RDF model objects.
      • HDTParser

        public HDTParser​(ValueFactory valueFactory)
        Creates a new HDTParser that will use the supplied ValueFactory to create RDF model objects.
        Parameters:
        valueFactory - A ValueFactory.
    • Method Detail

      • getRDFFormat

        public RDFFormat getRDFFormat()
        Description copied from interface: RDFParser
        Gets the RDF format that this parser can parse.
      • parse

        public void parse​(java.io.InputStream in,
                          java.lang.String baseURI)
                   throws java.io.IOException,
                          RDFParseException,
                          RDFHandlerException
        Description copied from interface: RDFParser
        Parses the data from the supplied InputStream, using the supplied baseURI to resolve any relative URI references.
        Parameters:
        in - The InputStream from which to read the data.
        baseURI - The URI associated with the data in the InputStream. May be null. Parsers for syntax formats that do not support relative URIs will ignore this argument.

        Note that if the data contains an embedded base URI, that embedded base URI will overrule the value supplied here (see RFC 3986 section 5.1 for details).

        Throws:
        java.io.IOException - If an I/O error occurred while data was read from the InputStream.
        RDFParseException - If the parser has found an unrecoverable parse error.
        RDFHandlerException - If the configured statement handler has encountered an unrecoverable error.
      • parse

        public void parse​(java.io.Reader reader,
                          java.lang.String baseURI)
                   throws java.io.IOException,
                          RDFParseException,
                          RDFHandlerException
        Not supported, since HDT is a binary format.
        Parameters:
        reader - The Reader from which to read the data.
        baseURI - The URI associated with the data in the InputStream. May be null. Parsers for syntax formats that do not support relative URIs will ignore this argument.

        Note that if the data contains an embedded base URI, that embedded base URI will overrule the value supplied here (see RFC 3986 section 5.1 for details).

        Throws:
        java.io.IOException - If an I/O error occurred while data was read from the InputStream.
        RDFParseException - If the parser has found an unrecoverable parse error.
        RDFHandlerException - If the configured statement handler has encountered an unrecoverable error.
      • getSO

        private byte[] getSO​(int pos,
                             int size,
                             HDTDictionarySection shared,
                             HDTDictionarySection other)
                      throws java.io.IOException
        Get part of triple from shared HDT Dictionary or (if not found) from specific HDT Dictionary
        Parameters:
        pos - position
        size - size of shared Dictionary
        shared - shared Dictionary
        other - specific Dictionary
        Returns:
        subject or object
        Throws:
        java.io.IOException
      • isBNodeID

        private boolean isBNodeID​(byte[] b)
      • createSubject

        private Resource createSubject​(byte[] b)
        Create subject IRI or blank node
        Parameters:
        b - byte buffer
        Returns:
        IRI or blank node
      • createPredicate

        private IRI createPredicate​(byte[] b)
        Create predicate IRI
        Parameters:
        b - byte buffer
        Returns:
        IRI
      • createObject

        private Value createObject​(byte[] b)
        Create object (typed) literal, IRI or blank node
        Parameters:
        b - byte buffer
        Returns:
        literal, IRI or blank node