Interface RDFParser
- All Known Implementing Classes:
AbstractRDFParser
,JsonLdParser
,RDF4JParser
Experimental
This interface (and its implementations) should be considered at risk; they might change or be removed in the next minor update of Commons RDF. It may move to the theorg.apache.commons.rdf.api
package when it has stabilized.
Description
This interface follows the
Builder pattern,
allowing to set parser settings like contentType(RDFSyntax)
and
base(IRI)
. A caller MUST call one of the source
methods
(e.g. source(IRI)
, source(Path)
,
source(InputStream)
), and MUST call one of the target
methods (e.g. target(Consumer)
, target(Dataset)
,
target(Graph)
) before calling parse()
on the returned
RDFParser - however methods can be called in any order.
The call to parse()
returns a Future
, allowing asynchronous
parse operations. Callers are recommended to check Future.get()
to
ensure parsing completed successfully, or catch exceptions thrown during
parsing.
Setting a method that has already been set will override any existing value
in the returned builder - regardless of the parameter type (e.g.
source(IRI)
will override a previous source(Path)
. Settings
can be unset by passing null
- note that this may require
casting, e.g. contentType( (RDFSyntax) null )
to undo a previous
call to contentType(RDFSyntax)
.
It is undefined if a RDFParser is mutable or thread-safe, so callers should
always use the returned modified RDFParser from the builder methods. The
builder may return itself after modification, or a cloned builder with the
modified settings applied. Implementations are however encouraged to be
immutable, thread-safe and document this. As an example starting point, see
org.apache.commons.rdf.simple.AbstractRDFParser
.
Example usage:
Graph g1 = rDFTermFactory.createGraph(); new ExampleRDFParserBuilder().source(Paths.get("/tmp/graph.ttl")).contentType(RDFSyntax.TURTLE).target(g1).parse() .get(30, TimeUnit.Seconds);
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic interface
The result ofparse()
indicating parsing completed. -
Method Summary
Modifier and TypeMethodDescriptionSpecify a base IRI to use for parsing any relative IRI references.Specify a base IRI to use for parsing any relative IRI references.contentType
(String contentType) Specify the content type of the RDF syntax to parse.contentType
(RDFSyntax rdfSyntax) Specify the content type of the RDF syntax to parse.Future
<? extends RDFParser.ParseResult> parse()
Parse the specified source.rdfTermFactory
(RDF rdfTermFactory) source
(InputStream inputStream) Specify a sourceInputStream
to parse.Specify an absolute source IRI to retrieve and parse.Specify a source filePath
to parse.Specify an absolute sourceIRI
to retrieve and parse.Specify a consumer for parsed quads.default RDFParser
Specify aDataset
to add parsed quads to.default RDFParser
Specify aGraph
to add parsed triples to.
-
Method Details
-
rdfTermFactory
Specify whichRDF
to use for generatingRDFTerm
s.This option may be used together with
target(Graph)
to override the implementation's default factory and graph.Warning: Using the same
RDF
for multipleparse()
calls may accidentally mergeBlankNode
s having the same label, as the parser may use theRDF.createBlankNode(String)
method from the parsed blank node labels. -
contentType
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Type
headers or equivalent.The character set of the RDFSyntax is assumed to be
StandardCharsets.UTF_8
unless overridden within the document (e.g.<?xml version="1.0" encoding="iso-8859-1"?>
inRDFSyntax.RDFXML
).This method will override any contentType set with
contentType(String)
.- Parameters:
rdfSyntax
- AnRDFSyntax
to parse the source according to, e.g.RDFSyntax.TURTLE
.- Returns:
- An
RDFParser
that will use the specified content type. - Throws:
IllegalArgumentException
- If this RDFParser does not support the specified RDFSyntax.- See Also:
-
contentType
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Type
headers or equivalent.The content type MAY include a
charset
parameter if the RDF media types permit it; the default charset isStandardCharsets.UTF_8
unless overridden within the document.This method will override any contentType set with
contentType(RDFSyntax)
.- Parameters:
contentType
- A content-type string, e.g.application/ld+json
ortext/turtle;charset="UTF-8"
as specified by RFC7231.- Returns:
- An
RDFParser
that will use the specified content type. - Throws:
IllegalArgumentException
- If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.- See Also:
-
target
Specify aGraph
to add parsed triples to.If the source supports datasets (e.g. the
contentType(RDFSyntax)
set hasRDFSyntax.supportsDataset()
is true)), then only quads in the default graph will be added to the Graph asTriple
s.It is undefined if any triples are added to the specified
Graph
ifparse()
throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). IfFuture.get()
does not indicate an exception, the parser implementation SHOULD have inserted all parsed triples to the specified graph.Calling this method will override any earlier targets set with
target(Graph)
,target(Consumer)
ortarget(Dataset)
.The default implementation of this method calls
target(Consumer)
with aConsumer
that doesGraph.add(Triple)
withQuad.asTriple()
if the quad is in the default graph. -
target
Specify aDataset
to add parsed quads to.It is undefined if any quads are added to the specified
Dataset
ifparse()
throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). On the other hand, ifparse()
does not indicate an exception, the implementation SHOULD have inserted all parsed quads to the specified dataset.Calling this method will override any earlier targets set with
target(Graph)
,target(Consumer)
ortarget(Dataset)
.The default implementation of this method calls
target(Consumer)
with aConsumer
that doesDataset.add(Quad)
. -
target
Specify a consumer for parsed quads.The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their
Quad.getGraphName()
will be asOptional.empty()
), while for a sourceIt is undefined if any quads are consumed if
parse()
throws any exceptions. On the other hand, ifparse()
does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.Calling this method will override any earlier targets set with
target(Graph)
,target(Consumer)
ortarget(Dataset)
.The consumer is not assumed to be thread safe - only one
Consumer.accept(Object)
is delivered at a time for a givenparse()
call.This method is typically called with a functional consumer, for example:
List<Quad> quads = new ArrayList<Quad>; parserBuilder.target(quads::add).parse();
-
base
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or thesource(IRI)
IRI, but does not override any base IRIs set within the source document (e.g.@base
in Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES
), setting thebase
has no effect.This method will override any base IRI set with
base(String)
.- Parameters:
base
- An absolute IRI to use as a base.- Returns:
- An
RDFParser
that will use the specified base IRI. - See Also:
-
base
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or thesource(IRI)
IRI, but does not override any base IRIs set within the source document (e.g.@base
in Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES
), setting thebase
has no effect.This method will override any base IRI set with
base(IRI)
.- Parameters:
base
- An absolute IRI to use as a base.- Returns:
- An
RDFParser
that will use the specified base IRI. - Throws:
IllegalArgumentException
- If the base is not a valid absolute IRI string- See Also:
-
source
Specify a sourceInputStream
to parse.The source set will not be read before the call to
parse()
.The InputStream will not be closed after parsing. The InputStream does not need to support
InputStream.markSupported()
.The parser might not consume the complete stream (e.g. an RDF/XML parser may not read beyond the closing tag of
</rdf:Description>
).The
contentType(RDFSyntax)
orcontentType(String)
SHOULD be set before callingparse()
.The character set is assumed to be
StandardCharsets.UTF_8
unless thecontentType(String)
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
base(IRI)
orbase(String)
MUST be set before callingparse()
, unless the RDF syntax does not permit relative IRIs (e.g.RDFSyntax.NTRIPLES
).This method will override any source set with
source(IRI)
,source(Path)
orsource(String)
.- Parameters:
inputStream
- An InputStream to consume- Returns:
- An
RDFParser
that will use the specified source.
-
source
Specify a source filePath
to parse.The source set will not be read before the call to
parse()
.The
contentType(RDFSyntax)
orcontentType(String)
SHOULD be set before callingparse()
.The character set is assumed to be
StandardCharsets.UTF_8
unless thecontentType(String)
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
base(IRI)
orbase(String)
MAY be set before callingparse()
, otherwisePath.toUri()
will be used as the base IRI.This method will override any source set with
source(IRI)
,source(InputStream)
orsource(String)
.- Parameters:
file
- A Path for a file to parse- Returns:
- An
RDFParser
that will use the specified source.
-
source
Specify an absolute sourceIRI
to retrieve and parse.The source set will not be read before the call to
parse()
.If this builder does not support the given IRI protocol (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method should succeed, while theparse()
should throw anIOException
.The
contentType(RDFSyntax)
orcontentType(String)
MAY be set before callingparse()
, in which case that type MAY be used for content negotiation (e.g.Accept
header in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8
unless the protocol's equivalent ofContent-Type
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
base(IRI)
orbase(String)
MAY be set before callingparse()
, otherwise the source IRI will be used as the base IRI.This method will override any source set with
source(Path)
,source(InputStream)
orsource(String)
.- Parameters:
iri
- An IRI to retrieve and parse- Returns:
- An
RDFParser
that will use the specified source.
-
source
Specify an absolute source IRI to retrieve and parse.The source set will not be read before the call to
parse()
.If this builder does not support the given IRI (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method should succeed, while theparse()
should throw anIOException
.The
contentType(RDFSyntax)
orcontentType(String)
MAY be set before callingparse()
, in which case that type MAY be used for content negotiation (e.g.Accept
header in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8
unless the protocol's equivalent ofContent-Type
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
base(IRI)
orbase(String)
MAY be set before callingparse()
, otherwise the source IRI will be used as the base IRI.This method will override any source set with
source(Path)
,source(InputStream)
orsource(IRI)
.- Parameters:
iri
- An IRI to retrieve and parse- Returns:
- An
RDFParser
that will use the specified source. - Throws:
IllegalArgumentException
- If the base is not a valid absolute IRI string
-
parse
Parse the specified source.A source method (e.g.
source(InputStream)
,source(IRI)
,source(Path)
,source(String)
or an equivalent subclass method) MUST have been called before calling this method, otherwise anIllegalStateException
will be thrown.A target method (e.g.
target(Consumer)
,target(Dataset)
,target(Graph)
or an equivalent subclass method) MUST have been called before calling parse(), otherwise anIllegalStateException
will be thrown.It is undefined if this method is thread-safe, however the
RDFParser
may be reused (e.g. setting a different source) as soon as theFuture
has been returned from this method.The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the
Future
as soon as preliminary checks (such as validity of thesource(IRI)
andcontentType(RDFSyntax)
settings) have finished. The future SHOULD not markFuture.isDone()
before parsing is complete. A synchronous implementation MAY be blocking on theparse()
call and return a Future that is alreadyFuture.isDone()
.The returned
Future
contains aRDFParser.ParseResult
. Implementations may subclass this interface to provide any parser details, e.g. list of warnings.null
is a possible return value if no details are available, but parsing succeeded.If an exception occurs during parsing, (e.g.
IOException
ororg.apache.commons.rdf.simple.experimental.RDFParseException
), it should be indicated as theThrowable.getCause()
in theExecutionException
thrown onFuture.get()
.- Returns:
- A Future that will return the populated
Graph
when the parsing has finished. - Throws:
IOException
- If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be theThrowable.getCause()
of theExecutionException
thrown onFuture.get()
.IllegalStateException
- If the builder is in an invalid state, e.g. asource
has not been set.
-