Interface RDFParser

All Known Implementing Classes:
AbstractRDFParser, JsonLdParser, RDF4JParser

public interface RDFParser
Parse an RDF source into a target (e.g. a Graph/Dataset).

Experimental

This interface (and its implementations) should be considered at risk; they might change or be removed in the next minor update of Commons RDF. It may move to the the org.apache.commons.rdf.api package when it has stabilized.

Description

This interface follows the Builder pattern, allowing to set parser settings like contentType(RDFSyntax) and base(IRI). A caller MUST call one of the source methods (e.g. source(IRI), source(Path), source(InputStream)), and MUST call one of the target methods (e.g. target(Consumer), target(Dataset), target(Graph)) before calling parse() on the returned RDFParser - however methods can be called in any order.

The call to parse() returns a Future, allowing asynchronous parse operations. Callers are recommended to check Future.get() to ensure parsing completed successfully, or catch exceptions thrown during parsing.

Setting a method that has already been set will override any existing value in the returned builder - regardless of the parameter type (e.g. source(IRI) will override a previous source(Path). Settings can be unset by passing null - note that this may require casting, e.g. contentType( (RDFSyntax) null ) to undo a previous call to contentType(RDFSyntax).

It is undefined if a RDFParser is mutable or thread-safe, so callers should always use the returned modified RDFParser from the builder methods. The builder may return itself after modification, or a cloned builder with the modified settings applied. Implementations are however encouraged to be immutable, thread-safe and document this. As an example starting point, see org.apache.commons.rdf.simple.AbstractRDFParser.

Example usage:

 Graph g1 = rDFTermFactory.createGraph();
 new ExampleRDFParserBuilder().source(Paths.get("/tmp/graph.ttl")).contentType(RDFSyntax.TURTLE).target(g1).parse()
         .get(30, TimeUnit.Seconds);
 
  • Method Details

    • rdfTermFactory

      RDFParser rdfTermFactory(RDF rdfTermFactory)
      Specify which RDF to use for generating RDFTerms.

      This option may be used together with target(Graph) to override the implementation's default factory and graph.

      Warning: Using the same RDF for multiple parse() calls may accidentally merge BlankNodes having the same label, as the parser may use the RDF.createBlankNode(String) method from the parsed blank node labels.

      Parameters:
      rdfTermFactory - RDF to use for generating RDFTerms.
      Returns:
      An RDFParser that will use the specified rdfTermFactory
      See Also:
    • contentType

      RDFParser contentType(RDFSyntax rdfSyntax) throws IllegalArgumentException
      Specify the content type of the RDF syntax to parse.

      This option can be used to select the RDFSyntax of the source, overriding any Content-Type headers or equivalent.

      The character set of the RDFSyntax is assumed to be StandardCharsets.UTF_8 unless overridden within the document (e.g. <?xml version="1.0" encoding="iso-8859-1"?> in RDFSyntax.RDFXML).

      This method will override any contentType set with contentType(String).

      Parameters:
      rdfSyntax - An RDFSyntax to parse the source according to, e.g. RDFSyntax.TURTLE.
      Returns:
      An RDFParser that will use the specified content type.
      Throws:
      IllegalArgumentException - If this RDFParser does not support the specified RDFSyntax.
      See Also:
    • contentType

      RDFParser contentType(String contentType) throws IllegalArgumentException
      Specify the content type of the RDF syntax to parse.

      This option can be used to select the RDFSyntax of the source, overriding any Content-Type headers or equivalent.

      The content type MAY include a charset parameter if the RDF media types permit it; the default charset is StandardCharsets.UTF_8 unless overridden within the document.

      This method will override any contentType set with contentType(RDFSyntax).

      Parameters:
      contentType - A content-type string, e.g. application/ld+json or text/turtle;charset="UTF-8" as specified by RFC7231.
      Returns:
      An RDFParser that will use the specified content type.
      Throws:
      IllegalArgumentException - If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.
      See Also:
    • target

      default RDFParser target(Graph graph)
      Specify a Graph to add parsed triples to.

      If the source supports datasets (e.g. the contentType(RDFSyntax) set has RDFSyntax.supportsDataset() is true)), then only quads in the default graph will be added to the Graph as Triples.

      It is undefined if any triples are added to the specified Graph if parse() throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). If Future.get() does not indicate an exception, the parser implementation SHOULD have inserted all parsed triples to the specified graph.

      Calling this method will override any earlier targets set with target(Graph), target(Consumer) or target(Dataset).

      The default implementation of this method calls target(Consumer) with a Consumer that does Graph.add(Triple) with Quad.asTriple() if the quad is in the default graph.

      Parameters:
      graph - The Graph to add triples to.
      Returns:
      An RDFParser that will insert triples into the specified graph.
    • target

      default RDFParser target(Dataset dataset)
      Specify a Dataset to add parsed quads to.

      It is undefined if any quads are added to the specified Dataset if parse() throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). On the other hand, if parse() does not indicate an exception, the implementation SHOULD have inserted all parsed quads to the specified dataset.

      Calling this method will override any earlier targets set with target(Graph), target(Consumer) or target(Dataset).

      The default implementation of this method calls target(Consumer) with a Consumer that does Dataset.add(Quad).

      Parameters:
      dataset - The Dataset to add quads to.
      Returns:
      An RDFParser that will insert triples into the specified dataset.
    • target

      RDFParser target(Consumer<Quad> consumer)
      Specify a consumer for parsed quads.

      The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their Quad.getGraphName() will be as Optional.empty()), while for a source

      It is undefined if any quads are consumed if parse() throws any exceptions. On the other hand, if parse() does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.

      Calling this method will override any earlier targets set with target(Graph), target(Consumer) or target(Dataset).

      The consumer is not assumed to be thread safe - only one Consumer.accept(Object) is delivered at a time for a given parse() call.

      This method is typically called with a functional consumer, for example:

       
       List<Quad> quads = new ArrayList<Quad>;
       parserBuilder.target(quads::add).parse();
       
       
      Parameters:
      consumer - A Consumer of Quads
      Returns:
      An RDFParser that will call the consumer for into the specified dataset.
    • base

      RDFParser base(IRI base)
      Specify a base IRI to use for parsing any relative IRI references.

      Setting this option will override any protocol-specific base IRI (e.g. Content-Location header) or the source(IRI) IRI, but does not override any base IRIs set within the source document (e.g. @base in Turtle documents).

      If the source is in a syntax that does not support relative IRI references (e.g. RDFSyntax.NTRIPLES), setting the base has no effect.

      This method will override any base IRI set with base(String).

      Parameters:
      base - An absolute IRI to use as a base.
      Returns:
      An RDFParser that will use the specified base IRI.
      See Also:
    • base

      Specify a base IRI to use for parsing any relative IRI references.

      Setting this option will override any protocol-specific base IRI (e.g. Content-Location header) or the source(IRI) IRI, but does not override any base IRIs set within the source document (e.g. @base in Turtle documents).

      If the source is in a syntax that does not support relative IRI references (e.g. RDFSyntax.NTRIPLES), setting the base has no effect.

      This method will override any base IRI set with base(IRI).

      Parameters:
      base - An absolute IRI to use as a base.
      Returns:
      An RDFParser that will use the specified base IRI.
      Throws:
      IllegalArgumentException - If the base is not a valid absolute IRI string
      See Also:
    • source

      RDFParser source(InputStream inputStream)
      Specify a source InputStream to parse.

      The source set will not be read before the call to parse().

      The InputStream will not be closed after parsing. The InputStream does not need to support InputStream.markSupported().

      The parser might not consume the complete stream (e.g. an RDF/XML parser may not read beyond the closing tag of </rdf:Description>).

      The contentType(RDFSyntax) or contentType(String) SHOULD be set before calling parse().

      The character set is assumed to be StandardCharsets.UTF_8 unless the contentType(String) specifies otherwise or the document declares its own charset (e.g. RDF/XML with a <?xml encoding="iso-8859-1"> header).

      The base(IRI) or base(String) MUST be set before calling parse(), unless the RDF syntax does not permit relative IRIs (e.g. RDFSyntax.NTRIPLES).

      This method will override any source set with source(IRI), source(Path) or source(String).

      Parameters:
      inputStream - An InputStream to consume
      Returns:
      An RDFParser that will use the specified source.
    • source

      RDFParser source(Path file)
      Specify a source file Path to parse.

      The source set will not be read before the call to parse().

      The contentType(RDFSyntax) or contentType(String) SHOULD be set before calling parse().

      The character set is assumed to be StandardCharsets.UTF_8 unless the contentType(String) specifies otherwise or the document declares its own charset (e.g. RDF/XML with a <?xml encoding="iso-8859-1"> header).

      The base(IRI) or base(String) MAY be set before calling parse(), otherwise Path.toUri() will be used as the base IRI.

      This method will override any source set with source(IRI), source(InputStream) or source(String).

      Parameters:
      file - A Path for a file to parse
      Returns:
      An RDFParser that will use the specified source.
    • source

      RDFParser source(IRI iri)
      Specify an absolute source IRI to retrieve and parse.

      The source set will not be read before the call to parse().

      If this builder does not support the given IRI protocol (e.g. urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while the parse() should throw an IOException.

      The contentType(RDFSyntax) or contentType(String) MAY be set before calling parse(), in which case that type MAY be used for content negotiation (e.g. Accept header in HTTP), and SHOULD be used for selecting the RDFSyntax.

      The character set is assumed to be StandardCharsets.UTF_8 unless the protocol's equivalent of Content-Type specifies otherwise or the document declares its own charset (e.g. RDF/XML with a <?xml encoding="iso-8859-1"> header).

      The base(IRI) or base(String) MAY be set before calling parse(), otherwise the source IRI will be used as the base IRI.

      This method will override any source set with source(Path), source(InputStream) or source(String).

      Parameters:
      iri - An IRI to retrieve and parse
      Returns:
      An RDFParser that will use the specified source.
    • source

      Specify an absolute source IRI to retrieve and parse.

      The source set will not be read before the call to parse().

      If this builder does not support the given IRI (e.g. urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while the parse() should throw an IOException.

      The contentType(RDFSyntax) or contentType(String) MAY be set before calling parse(), in which case that type MAY be used for content negotiation (e.g. Accept header in HTTP), and SHOULD be used for selecting the RDFSyntax.

      The character set is assumed to be StandardCharsets.UTF_8 unless the protocol's equivalent of Content-Type specifies otherwise or the document declares its own charset (e.g. RDF/XML with a <?xml encoding="iso-8859-1"> header).

      The base(IRI) or base(String) MAY be set before calling parse(), otherwise the source IRI will be used as the base IRI.

      This method will override any source set with source(Path), source(InputStream) or source(IRI).

      Parameters:
      iri - An IRI to retrieve and parse
      Returns:
      An RDFParser that will use the specified source.
      Throws:
      IllegalArgumentException - If the base is not a valid absolute IRI string
    • parse

      Parse the specified source.

      A source method (e.g. source(InputStream), source(IRI), source(Path), source(String) or an equivalent subclass method) MUST have been called before calling this method, otherwise an IllegalStateException will be thrown.

      A target method (e.g. target(Consumer), target(Dataset), target(Graph) or an equivalent subclass method) MUST have been called before calling parse(), otherwise an IllegalStateException will be thrown.

      It is undefined if this method is thread-safe, however the RDFParser may be reused (e.g. setting a different source) as soon as the Future has been returned from this method.

      The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the Future as soon as preliminary checks (such as validity of the source(IRI) and contentType(RDFSyntax) settings) have finished. The future SHOULD not mark Future.isDone() before parsing is complete. A synchronous implementation MAY be blocking on the parse() call and return a Future that is already Future.isDone().

      The returned Future contains a RDFParser.ParseResult. Implementations may subclass this interface to provide any parser details, e.g. list of warnings. null is a possible return value if no details are available, but parsing succeeded.

      If an exception occurs during parsing, (e.g. IOException or org.apache.commons.rdf.simple.experimental.RDFParseException), it should be indicated as the Throwable.getCause() in the ExecutionException thrown on Future.get().

      Returns:
      A Future that will return the populated Graph when the parsing has finished.
      Throws:
      IOException - If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be the Throwable.getCause() of the ExecutionException thrown on Future.get().
      IllegalStateException - If the builder is in an invalid state, e.g. a source has not been set.