Class AbstractRDFParser<T extends AbstractRDFParser<T>>
- Direct Known Subclasses:
JsonLdParser
,RDF4JParser
This abstract class keeps the properties in protected fields like
sourceFile
using Optional
. Some basic checking like
checkIsAbsolute(IRI)
is performed.
This class and its subclasses are Cloneable
, immutable and
(therefore) thread-safe - each call to option methods like
contentType(String)
or source(IRI)
will return a cloned,
mutated copy.
By default, parsing is done by the abstract method
parseSynchronusly()
- which is executed in a cloned snapshot - hence
multiple parse()
calls are thread-safe. The default parse()
uses a thread pool in threadGroup
- but implementations can override
parse()
(e.g. because it has its own threading model or use
asynchronous remote execution).
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.commons.rdf.experimental.RDFParser
RDFParser.ParseResult
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static RDF
private Optional
<InputStream> static final ThreadGroup
private static final ExecutorService
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected T
asT()
Specify a base IRI to use for parsing any relative IRI references.Specify a base IRI to use for parsing any relative IRI references.protected void
Check if base is required.protected void
Subclasses can override this method to check compatibility with the contentType setting.protected void
checkIsAbsolute
(IRI iri) Check if an iri is absolute.protected void
Check that one and only one source is present and valid.protected void
Subclasses can override this method to check the target is valid.clone()
contentType
(String contentType) Specify the content type of the RDF syntax to parse.contentType
(RDFSyntax rdfSyntax) Specify the content type of the RDF syntax to parse.protected RDF
Create a newRDF
for a parse session.fileExtension
(Path path) Return the file extension of a Path - if any.getBase()
Get the set baseIRI
, if present.Get the set content-type String, if any.Get the set content-typeRDFSyntax
, if any.Get the setRDF
, if any.Get the set sourcePath
.Get the set sourceInputStream
.Get the set sourcePath
.Get the target to consume parsed Quads.Get the target dataset as set bytarget(Dataset)
.Get the target graph as set bytarget(Graph)
.guessRDFSyntax
(Path path) Guess RDFSyntax from a local file's extension.parse()
Parse the specified source.protected abstract void
protected T
Prepare a clone of this RDFParser which have been checked and completed.rdfTermFactory
(RDF rdfTermFactory) protected void
Reset all source* fields to Optional.empty()protected void
Reset all optional target* fields toOptional.empty()
.source
(InputStream inputStream) Specify a sourceInputStream
to parse.Specify an absolute source IRI to retrieve and parse.Specify a source filePath
to parse.Specify an absolute sourceIRI
to retrieve and parse.Specify a consumer for parsed quads.Specify aDataset
to add parsed quads to.Specify aGraph
to add parsed triples to.
-
Field Details
-
threadGroup
-
threadpool
-
internalRdfTermFactory
-
rdfTermFactory
-
contentTypeSyntax
-
contentType
-
base
-
sourceInputStream
-
sourceFile
-
sourceIri
-
target
-
targetDataset
-
targetGraph
-
-
Constructor Details
-
AbstractRDFParser
public AbstractRDFParser()
-
-
Method Details
-
getRdfTermFactory
Get the setRDF
, if any.- Returns:
- The
RDF
to use, orOptional.empty()
if it has not been set
-
getContentTypeSyntax
Get the set content-typeRDFSyntax
, if any.If this is
Optional.isPresent()
, thengetContentType()
contains the value ofRDFSyntax.mediaType()
.- Returns:
- The
RDFSyntax
of the content type, orOptional.empty()
if it has not been set
-
getContentType
Get the set content-type String, if any.If this is
Optional.isPresent()
and is recognized byRDFSyntax.byMediaType(String)
, then the correspondingRDFSyntax
is set ongetContentType()
, otherwise that isOptional.empty()
.- Returns:
- The Content-Type IANA media type, e.g.
text/turtle
, orOptional.empty()
if it has not been set
-
getTarget
Get the target to consume parsed Quads.From the call to
parseSynchronusly()
, this will be a non-null
value (as a target is a required setting).- Returns:
- The target consumer of
Quad
s, ornull
if it has not yet been set.
-
getTargetDataset
Get the target dataset as set bytarget(Dataset)
.The return value is
Optional.isPresent()
if and only iftarget(Dataset)
has been set, meaning that the implementation may choose to append parsed quads to theDataset
directly instead of relying on the generatedgetTarget()
consumer.If this value is present, then
getTargetGraph()
MUST beOptional.empty()
.- Returns:
- The target Dataset, or
Optional.empty()
if another kind of target has been set.
-
getTargetGraph
Get the target graph as set bytarget(Graph)
.The return value is
Optional.isPresent()
if and only iftarget(Graph)
has been set, meaning that the implementation may choose to append parsed triples to theGraph
directly instead of relying on the generatedgetTarget()
consumer.If this value is present, then
getTargetDataset()
MUST beOptional.empty()
.- Returns:
- The target Graph, or
Optional.empty()
if another kind of target has been set.
-
getBase
Get the set baseIRI
, if present.- Returns:
- The base
IRI
, orOptional.empty()
if it has not been set
-
getSourceInputStream
Get the set sourceInputStream
.If this is
Optional.isPresent()
, thengetSourceFile()
andgetSourceIri()
areOptional.empty()
.- Returns:
- The source
InputStream
, orOptional.empty()
if it has not been set
-
getSourceFile
Get the set sourcePath
.If this is
Optional.isPresent()
, thengetSourceInputStream()
andgetSourceIri()
areOptional.empty()
.- Returns:
- The source
Path
, orOptional.empty()
if it has not been set
-
getSourceIri
Get the set sourcePath
.If this is
Optional.isPresent()
, thengetSourceInputStream()
andgetSourceInputStream()
areOptional.empty()
.- Returns:
- The source
IRI
, orOptional.empty()
if it has not been set
-
clone
-
asT
-
rdfTermFactory
Description copied from interface:RDFParser
Specify whichRDF
to use for generatingRDFTerm
s.This option may be used together with
RDFParser.target(Graph)
to override the implementation's default factory and graph.Warning: Using the same
RDF
for multipleRDFParser.parse()
calls may accidentally mergeBlankNode
s having the same label, as the parser may use theRDF.createBlankNode(String)
method from the parsed blank node labels.- Specified by:
rdfTermFactory
in interfaceRDFParser
- Parameters:
rdfTermFactory
-RDF
to use for generating RDFTerms.- Returns:
- An
RDFParser
that will use the specified rdfTermFactory - See Also:
-
contentType
Description copied from interface:RDFParser
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Type
headers or equivalent.The character set of the RDFSyntax is assumed to be
StandardCharsets.UTF_8
unless overridden within the document (e.g.<?xml version="1.0" encoding="iso-8859-1"?>
inRDFSyntax.RDFXML
).This method will override any contentType set with
RDFParser.contentType(String)
.- Specified by:
contentType
in interfaceRDFParser
- Parameters:
rdfSyntax
- AnRDFSyntax
to parse the source according to, e.g.RDFSyntax.TURTLE
.- Returns:
- An
RDFParser
that will use the specified content type. - Throws:
IllegalArgumentException
- If this RDFParser does not support the specified RDFSyntax.- See Also:
-
contentType
Description copied from interface:RDFParser
Specify the content type of the RDF syntax to parse.This option can be used to select the RDFSyntax of the source, overriding any
Content-Type
headers or equivalent.The content type MAY include a
charset
parameter if the RDF media types permit it; the default charset isStandardCharsets.UTF_8
unless overridden within the document.This method will override any contentType set with
RDFParser.contentType(RDFSyntax)
.- Specified by:
contentType
in interfaceRDFParser
- Parameters:
contentType
- A content-type string, e.g.application/ld+json
ortext/turtle;charset="UTF-8"
as specified by RFC7231.- Returns:
- An
RDFParser
that will use the specified content type. - Throws:
IllegalArgumentException
- If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.- See Also:
-
base
Description copied from interface:RDFParser
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or theRDFParser.source(IRI)
IRI, but does not override any base IRIs set within the source document (e.g.@base
in Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES
), setting thebase
has no effect.This method will override any base IRI set with
RDFParser.base(String)
. -
base
Description copied from interface:RDFParser
Specify a base IRI to use for parsing any relative IRI references.Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or theRDFParser.source(IRI)
IRI, but does not override any base IRIs set within the source document (e.g.@base
in Turtle documents).If the source is in a syntax that does not support relative IRI references (e.g.
RDFSyntax.NTRIPLES
), setting thebase
has no effect.This method will override any base IRI set with
RDFParser.base(IRI)
.- Specified by:
base
in interfaceRDFParser
- Parameters:
base
- An absolute IRI to use as a base.- Returns:
- An
RDFParser
that will use the specified base IRI. - Throws:
IllegalArgumentException
- If the base is not a valid absolute IRI string- See Also:
-
source
Description copied from interface:RDFParser
Specify a sourceInputStream
to parse.The source set will not be read before the call to
RDFParser.parse()
.The InputStream will not be closed after parsing. The InputStream does not need to support
InputStream.markSupported()
.The parser might not consume the complete stream (e.g. an RDF/XML parser may not read beyond the closing tag of
</rdf:Description>
).The
RDFParser.contentType(RDFSyntax)
orRDFParser.contentType(String)
SHOULD be set before callingRDFParser.parse()
.The character set is assumed to be
StandardCharsets.UTF_8
unless theRDFParser.contentType(String)
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
RDFParser.base(IRI)
orRDFParser.base(String)
MUST be set before callingRDFParser.parse()
, unless the RDF syntax does not permit relative IRIs (e.g.RDFSyntax.NTRIPLES
).This method will override any source set with
RDFParser.source(IRI)
,RDFParser.source(Path)
orRDFParser.source(String)
. -
source
Description copied from interface:RDFParser
Specify a source filePath
to parse.The source set will not be read before the call to
RDFParser.parse()
.The
RDFParser.contentType(RDFSyntax)
orRDFParser.contentType(String)
SHOULD be set before callingRDFParser.parse()
.The character set is assumed to be
StandardCharsets.UTF_8
unless theRDFParser.contentType(String)
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
RDFParser.base(IRI)
orRDFParser.base(String)
MAY be set before callingRDFParser.parse()
, otherwisePath.toUri()
will be used as the base IRI.This method will override any source set with
RDFParser.source(IRI)
,RDFParser.source(InputStream)
orRDFParser.source(String)
. -
source
Description copied from interface:RDFParser
Specify an absolute sourceIRI
to retrieve and parse.The source set will not be read before the call to
RDFParser.parse()
.If this builder does not support the given IRI protocol (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method should succeed, while theRDFParser.parse()
should throw anIOException
.The
RDFParser.contentType(RDFSyntax)
orRDFParser.contentType(String)
MAY be set before callingRDFParser.parse()
, in which case that type MAY be used for content negotiation (e.g.Accept
header in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8
unless the protocol's equivalent ofContent-Type
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
RDFParser.base(IRI)
orRDFParser.base(String)
MAY be set before callingRDFParser.parse()
, otherwise the source IRI will be used as the base IRI.This method will override any source set with
RDFParser.source(Path)
,RDFParser.source(InputStream)
orRDFParser.source(String)
. -
source
Description copied from interface:RDFParser
Specify an absolute source IRI to retrieve and parse.The source set will not be read before the call to
RDFParser.parse()
.If this builder does not support the given IRI (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method should succeed, while theRDFParser.parse()
should throw anIOException
.The
RDFParser.contentType(RDFSyntax)
orRDFParser.contentType(String)
MAY be set before callingRDFParser.parse()
, in which case that type MAY be used for content negotiation (e.g.Accept
header in HTTP), and SHOULD be used for selecting the RDFSyntax.The character set is assumed to be
StandardCharsets.UTF_8
unless the protocol's equivalent ofContent-Type
specifies otherwise or the document declares its own charset (e.g. RDF/XML with a<?xml encoding="iso-8859-1">
header).The
RDFParser.base(IRI)
orRDFParser.base(String)
MAY be set before callingRDFParser.parse()
, otherwise the source IRI will be used as the base IRI.This method will override any source set with
RDFParser.source(Path)
,RDFParser.source(InputStream)
orRDFParser.source(IRI)
.- Specified by:
source
in interfaceRDFParser
- Parameters:
iri
- An IRI to retrieve and parse- Returns:
- An
RDFParser
that will use the specified source. - Throws:
IllegalArgumentException
- If the base is not a valid absolute IRI string
-
checkIsAbsolute
Check if an iri is absolute.Used by
source(String)
andbase(String)
.- Parameters:
iri
- IRI to check- Throws:
IllegalArgumentException
- If the IRI is not absolute
-
checkSource
Check that one and only one source is present and valid.Used by
parse()
.Subclasses might override this method, e.g. to support other source combinations, or to check if the sourceIri is resolvable.
- Throws:
IOException
- If a source file can't be read
-
checkBaseRequired
Check if base is required.- Throws:
IllegalStateException
- if base is required, but not set.
-
resetSource
protected void resetSource()Reset all source* fields to Optional.empty()Subclasses should override this and call
super.resetSource()
if they need to reset any additional source* fields. -
resetTarget
protected void resetTarget()Reset all optional target* fields toOptional.empty()
.Note that the consumer set for
getTarget()
is note reset.Subclasses should override this and call
super.resetTarget()
if they need to reset any additional target* fields. -
parseSynchronusly
ParsesourceInputStream
,sourceFile
orsourceIri
.One of the source fields MUST be present, as checked by
checkSource()
.checkBaseRequired()
is called to verify ifgetBase()
is required.- Throws:
IOException
- If the source could not be readRDFParseException
- If the source could not be parsed (e.g. a .ttl file was not valid Turtle)
-
prepareForParsing
Prepare a clone of this RDFParser which have been checked and completed.The returned clone will always have
getTarget()
andgetRdfTermFactory()
present.If the
getSourceFile()
is present, but thegetBase()
is not present, the base will be set to thefile:///
IRI for the Path's real path (e.g. resolving any symbolic links).- Returns:
- A completed and checked clone of this RDFParser
- Throws:
IOException
- If the source was not accessible (e.g. a file was not found)IllegalStateException
- If the parser was not in a compatible setting (e.g. contentType was an invalid string)
-
checkTarget
protected void checkTarget()Subclasses can override this method to check the target is valid.The default implementation throws an IllegalStateException if the target has not been set.
-
checkContentType
Subclasses can override this method to check compatibility with the contentType setting.- Throws:
IllegalStateException
- if thegetContentType()
orgetContentTypeSyntax()
is not compatible or invalid
-
guessRDFSyntax
Guess RDFSyntax from a local file's extension.This method can be used by subclasses if
getContentType()
is not present andgetSourceFile()
is set.- Parameters:
path
- Path which extension should be checked- Returns:
- The
RDFSyntax
which has a matchingRDFSyntax.fileExtension()
, otherwiseOptional.empty()
.
-
fileExtension
Return the file extension of a Path - if any.The returned file extension includes the leading
.
Note that this only returns the last extension, e.g. the file extension for
archive.tar.gz
would be.gz
- Parameters:
path
- Path which filename might contain an extension- Returns:
- File extension (including the leading
.
, orOptional.empty()
if the path has no extension
-
createRDFTermFactory
Create a newRDF
for a parse session.This is called by
parse()
to setrdfTermFactory(RDF)
if it isOptional.empty()
.As parsed blank nodes might be made with
RDF.createBlankNode(String)
, each call to this method SHOULD return a new RDF instance.- Returns:
- A new
RDF
-
parse
Description copied from interface:RDFParser
Parse the specified source.A source method (e.g.
RDFParser.source(InputStream)
,RDFParser.source(IRI)
,RDFParser.source(Path)
,RDFParser.source(String)
or an equivalent subclass method) MUST have been called before calling this method, otherwise anIllegalStateException
will be thrown.A target method (e.g.
RDFParser.target(Consumer)
,RDFParser.target(Dataset)
,RDFParser.target(Graph)
or an equivalent subclass method) MUST have been called before calling parse(), otherwise anIllegalStateException
will be thrown.It is undefined if this method is thread-safe, however the
RDFParser
may be reused (e.g. setting a different source) as soon as theFuture
has been returned from this method.The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the
Future
as soon as preliminary checks (such as validity of theRDFParser.source(IRI)
andRDFParser.contentType(RDFSyntax)
settings) have finished. The future SHOULD not markFuture.isDone()
before parsing is complete. A synchronous implementation MAY be blocking on theparse()
call and return a Future that is alreadyFuture.isDone()
.The returned
Future
contains aRDFParser.ParseResult
. Implementations may subclass this interface to provide any parser details, e.g. list of warnings.null
is a possible return value if no details are available, but parsing succeeded.If an exception occurs during parsing, (e.g.
IOException
ororg.apache.commons.rdf.simple.experimental.RDFParseException
), it should be indicated as theThrowable.getCause()
in theExecutionException
thrown onFuture.get()
.- Specified by:
parse
in interfaceRDFParser
- Returns:
- A Future that will return the populated
Graph
when the parsing has finished. - Throws:
IOException
- If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be theThrowable.getCause()
of theExecutionException
thrown onFuture.get()
.IllegalStateException
- If the builder is in an invalid state, e.g. asource
has not been set.
-
target
Description copied from interface:RDFParser
Specify a consumer for parsed quads.The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their
Quad.getGraphName()
will be asOptional.empty()
), while for a sourceIt is undefined if any quads are consumed if
RDFParser.parse()
throws any exceptions. On the other hand, ifRDFParser.parse()
does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.Calling this method will override any earlier targets set with
RDFParser.target(Graph)
,RDFParser.target(Consumer)
orRDFParser.target(Dataset)
.The consumer is not assumed to be thread safe - only one
Consumer.accept(Object)
is delivered at a time for a givenRDFParser.parse()
call.This method is typically called with a functional consumer, for example:
List<Quad> quads = new ArrayList<Quad>; parserBuilder.target(quads::add).parse();
-
target
Description copied from interface:RDFParser
Specify aDataset
to add parsed quads to.It is undefined if any quads are added to the specified
Dataset
ifRDFParser.parse()
throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). On the other hand, ifRDFParser.parse()
does not indicate an exception, the implementation SHOULD have inserted all parsed quads to the specified dataset.Calling this method will override any earlier targets set with
RDFParser.target(Graph)
,RDFParser.target(Consumer)
orRDFParser.target(Dataset)
.The default implementation of this method calls
RDFParser.target(Consumer)
with aConsumer
that doesDataset.add(Quad)
. -
target
Description copied from interface:RDFParser
Specify aGraph
to add parsed triples to.If the source supports datasets (e.g. the
RDFParser.contentType(RDFSyntax)
set hasRDFSyntax.supportsDataset()
is true)), then only quads in the default graph will be added to the Graph asTriple
s.It is undefined if any triples are added to the specified
Graph
ifRDFParser.parse()
throws any exceptions. (However implementations are free to prevent this using transaction mechanisms or similar). IfFuture.get()
does not indicate an exception, the parser implementation SHOULD have inserted all parsed triples to the specified graph.Calling this method will override any earlier targets set with
RDFParser.target(Graph)
,RDFParser.target(Consumer)
orRDFParser.target(Dataset)
.The default implementation of this method calls
RDFParser.target(Consumer)
with aConsumer
that doesGraph.add(Triple)
withQuad.asTriple()
if the quad is in the default graph.
-