Class AbstractRDFParser<T extends AbstractRDFParser<T>>

  • All Implemented Interfaces:
    java.lang.Cloneable, RDFParser
    Direct Known Subclasses:
    JsonLdParser, RDF4JParser

    public abstract class AbstractRDFParser<T extends AbstractRDFParser<T>>
    extends java.lang.Object
    implements RDFParser, java.lang.Cloneable
    Abstract RDFParser

    This abstract class keeps the properties in protected fields like sourceFile using Optional. Some basic checking like checkIsAbsolute(IRI) is performed.

    This class and its subclasses are Cloneable, immutable and (therefore) thread-safe - each call to option methods like contentType(String) or source(IRI) will return a cloned, mutated copy.

    By default, parsing is done by the abstract method parseSynchronusly() - which is executed in a cloned snapshot - hence multiple parse() calls are thread-safe. The default parse() uses a thread pool in threadGroup - but implementations can override parse() (e.g. because it has its own threading model or use asynchronous remote execution).

    • Field Detail

      • threadGroup

        public static final java.lang.ThreadGroup threadGroup
      • threadpool

        private static final java.util.concurrent.ExecutorService threadpool
      • internalRdfTermFactory

        private static RDF internalRdfTermFactory
      • rdfTermFactory

        private java.util.Optional<RDF> rdfTermFactory
      • contentTypeSyntax

        private java.util.Optional<RDFSyntax> contentTypeSyntax
      • contentType

        private java.util.Optional<java.lang.String> contentType
      • base

        private java.util.Optional<IRI> base
      • sourceInputStream

        private java.util.Optional<java.io.InputStream> sourceInputStream
      • sourceFile

        private java.util.Optional<java.nio.file.Path> sourceFile
      • sourceIri

        private java.util.Optional<IRI> sourceIri
      • target

        private java.util.function.Consumer<Quad> target
      • targetDataset

        private java.util.Optional<Dataset> targetDataset
      • targetGraph

        private java.util.Optional<Graph> targetGraph
    • Constructor Detail

      • AbstractRDFParser

        public AbstractRDFParser()
    • Method Detail

      • getRdfTermFactory

        public java.util.Optional<RDF> getRdfTermFactory()
        Get the set RDF, if any.
        Returns:
        The RDF to use, or Optional.empty() if it has not been set
      • getContentType

        public final java.util.Optional<java.lang.String> getContentType()
        Get the set content-type String, if any.

        If this is Optional.isPresent() and is recognized by RDFSyntax.byMediaType(String), then the corresponding RDFSyntax is set on getContentType(), otherwise that is Optional.empty().

        Returns:
        The Content-Type IANA media type, e.g. text/turtle, or Optional.empty() if it has not been set
      • getTarget

        public java.util.function.Consumer<Quad> getTarget()
        Get the target to consume parsed Quads.

        From the call to parseSynchronusly(), this will be a non-null value (as a target is a required setting).

        Returns:
        The target consumer of Quads, or null if it has not yet been set.
      • getTargetDataset

        public java.util.Optional<Dataset> getTargetDataset()
        Get the target dataset as set by target(Dataset).

        The return value is Optional.isPresent() if and only if target(Dataset) has been set, meaning that the implementation may choose to append parsed quads to the Dataset directly instead of relying on the generated getTarget() consumer.

        If this value is present, then getTargetGraph() MUST be Optional.empty().

        Returns:
        The target Dataset, or Optional.empty() if another kind of target has been set.
      • getTargetGraph

        public java.util.Optional<Graph> getTargetGraph()
        Get the target graph as set by target(Graph).

        The return value is Optional.isPresent() if and only if target(Graph) has been set, meaning that the implementation may choose to append parsed triples to the Graph directly instead of relying on the generated getTarget() consumer.

        If this value is present, then getTargetDataset() MUST be Optional.empty().

        Returns:
        The target Graph, or Optional.empty() if another kind of target has been set.
      • getBase

        public java.util.Optional<IRI> getBase()
        Get the set base IRI, if present.
        Returns:
        The base IRI, or Optional.empty() if it has not been set
      • getSourceInputStream

        public java.util.Optional<java.io.InputStream> getSourceInputStream()
        Get the set source InputStream.

        If this is Optional.isPresent(), then getSourceFile() and getSourceIri() are Optional.empty().

        Returns:
        The source InputStream, or Optional.empty() if it has not been set
      • getSourceFile

        public java.util.Optional<java.nio.file.Path> getSourceFile()
        Get the set source Path.

        If this is Optional.isPresent(), then getSourceInputStream() and getSourceIri() are Optional.empty().

        Returns:
        The source Path, or Optional.empty() if it has not been set
      • getSourceIri

        public java.util.Optional<IRI> getSourceIri()
        Get the set source Path.

        If this is Optional.isPresent(), then getSourceInputStream() and getSourceInputStream() are Optional.empty().

        Returns:
        The source IRI, or Optional.empty() if it has not been set
      • clone

        public T clone()
        Overrides:
        clone in class java.lang.Object
      • asT

        protected T asT()
      • contentType

        public T contentType​(RDFSyntax rdfSyntax)
                      throws java.lang.IllegalArgumentException
        Description copied from interface: RDFParser
        Specify the content type of the RDF syntax to parse.

        This option can be used to select the RDFSyntax of the source, overriding any Content-Type headers or equivalent.

        The character set of the RDFSyntax is assumed to be StandardCharsets.UTF_8 unless overridden within the document (e.g. <?xml version="1.0" encoding="iso-8859-1"?> in RDFSyntax.RDFXML).

        This method will override any contentType set with RDFParser.contentType(String).

        Specified by:
        contentType in interface RDFParser
        Parameters:
        rdfSyntax - An RDFSyntax to parse the source according to, e.g. RDFSyntax.TURTLE.
        Returns:
        An RDFParser that will use the specified content type.
        Throws:
        java.lang.IllegalArgumentException - If this RDFParser does not support the specified RDFSyntax.
        See Also:
        RDFParser.contentType(String)
      • contentType

        public T contentType​(java.lang.String contentType)
                      throws java.lang.IllegalArgumentException
        Description copied from interface: RDFParser
        Specify the content type of the RDF syntax to parse.

        This option can be used to select the RDFSyntax of the source, overriding any Content-Type headers or equivalent.

        The content type MAY include a charset parameter if the RDF media types permit it; the default charset is StandardCharsets.UTF_8 unless overridden within the document.

        This method will override any contentType set with RDFParser.contentType(RDFSyntax).

        Specified by:
        contentType in interface RDFParser
        Parameters:
        contentType - A content-type string, e.g. application/ld+json or text/turtle;charset="UTF-8" as specified by RFC7231.
        Returns:
        An RDFParser that will use the specified content type.
        Throws:
        java.lang.IllegalArgumentException - If the contentType has an invalid syntax, or this RDFParser does not support the specified contentType.
        See Also:
        RDFParser.contentType(RDFSyntax)
      • base

        public T base​(IRI base)
        Description copied from interface: RDFParser
        Specify a base IRI to use for parsing any relative IRI references.

        Setting this option will override any protocol-specific base IRI (e.g. Content-Location header) or the RDFParser.source(IRI) IRI, but does not override any base IRIs set within the source document (e.g. @base in Turtle documents).

        If the source is in a syntax that does not support relative IRI references (e.g. RDFSyntax.NTRIPLES), setting the base has no effect.

        This method will override any base IRI set with RDFParser.base(String).

        Specified by:
        base in interface RDFParser
        Parameters:
        base - An absolute IRI to use as a base.
        Returns:
        An RDFParser that will use the specified base IRI.
        See Also:
        RDFParser.base(String)
      • base

        public T base​(java.lang.String base)
               throws java.lang.IllegalArgumentException
        Description copied from interface: RDFParser
        Specify a base IRI to use for parsing any relative IRI references.

        Setting this option will override any protocol-specific base IRI (e.g. Content-Location header) or the RDFParser.source(IRI) IRI, but does not override any base IRIs set within the source document (e.g. @base in Turtle documents).

        If the source is in a syntax that does not support relative IRI references (e.g. RDFSyntax.NTRIPLES), setting the base has no effect.

        This method will override any base IRI set with RDFParser.base(IRI).

        Specified by:
        base in interface RDFParser
        Parameters:
        base - An absolute IRI to use as a base.
        Returns:
        An RDFParser that will use the specified base IRI.
        Throws:
        java.lang.IllegalArgumentException - If the base is not a valid absolute IRI string
        See Also:
        RDFParser.base(IRI)
      • source

        public T source​(java.lang.String iri)
                 throws java.lang.IllegalArgumentException
        Description copied from interface: RDFParser
        Specify an absolute source IRI to retrieve and parse.

        The source set will not be read before the call to RDFParser.parse().

        If this builder does not support the given IRI (e.g. urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890), this method should succeed, while the RDFParser.parse() should throw an IOException.

        The RDFParser.contentType(RDFSyntax) or RDFParser.contentType(String) MAY be set before calling RDFParser.parse(), in which case that type MAY be used for content negotiation (e.g. Accept header in HTTP), and SHOULD be used for selecting the RDFSyntax.

        The character set is assumed to be StandardCharsets.UTF_8 unless the protocol's equivalent of Content-Type specifies otherwise or the document declares its own charset (e.g. RDF/XML with a <?xml encoding="iso-8859-1"> header).

        The RDFParser.base(IRI) or RDFParser.base(String) MAY be set before calling RDFParser.parse(), otherwise the source IRI will be used as the base IRI.

        This method will override any source set with RDFParser.source(Path), RDFParser.source(InputStream) or RDFParser.source(IRI).

        Specified by:
        source in interface RDFParser
        Parameters:
        iri - An IRI to retrieve and parse
        Returns:
        An RDFParser that will use the specified source.
        Throws:
        java.lang.IllegalArgumentException - If the base is not a valid absolute IRI string
      • checkIsAbsolute

        protected void checkIsAbsolute​(IRI iri)
                                throws java.lang.IllegalArgumentException
        Check if an iri is absolute.

        Used by source(String) and base(String).

        Parameters:
        iri - IRI to check
        Throws:
        java.lang.IllegalArgumentException - If the IRI is not absolute
      • checkSource

        protected void checkSource()
                            throws java.io.IOException
        Check that one and only one source is present and valid.

        Used by parse().

        Subclasses might override this method, e.g. to support other source combinations, or to check if the sourceIri is resolvable.

        Throws:
        java.io.IOException - If a source file can't be read
      • checkBaseRequired

        protected void checkBaseRequired()
                                  throws java.lang.IllegalStateException
        Check if base is required.
        Throws:
        java.lang.IllegalStateException - if base is required, but not set.
      • resetSource

        protected void resetSource()
        Reset all source* fields to Optional.empty()

        Subclasses should override this and call super.resetSource() if they need to reset any additional source* fields.

      • resetTarget

        protected void resetTarget()
        Reset all optional target* fields to Optional.empty().

        Note that the consumer set for getTarget() is note reset.

        Subclasses should override this and call super.resetTarget() if they need to reset any additional target* fields.

      • prepareForParsing

        protected T prepareForParsing()
                               throws java.io.IOException,
                                      java.lang.IllegalStateException
        Prepare a clone of this RDFParser which have been checked and completed.

        The returned clone will always have getTarget() and getRdfTermFactory() present.

        If the getSourceFile() is present, but the getBase() is not present, the base will be set to the file:/// IRI for the Path's real path (e.g. resolving any symbolic links).

        Returns:
        A completed and checked clone of this RDFParser
        Throws:
        java.io.IOException - If the source was not accessible (e.g. a file was not found)
        java.lang.IllegalStateException - If the parser was not in a compatible setting (e.g. contentType was an invalid string)
      • checkTarget

        protected void checkTarget()
        Subclasses can override this method to check the target is valid.

        The default implementation throws an IllegalStateException if the target has not been set.

      • checkContentType

        protected void checkContentType()
                                 throws java.lang.IllegalStateException
        Subclasses can override this method to check compatibility with the contentType setting.
        Throws:
        java.lang.IllegalStateException - if the getContentType() or getContentTypeSyntax() is not compatible or invalid
      • guessRDFSyntax

        protected static java.util.Optional<RDFSyntax> guessRDFSyntax​(java.nio.file.Path path)
        Guess RDFSyntax from a local file's extension.

        This method can be used by subclasses if getContentType() is not present and getSourceFile() is set.

        Parameters:
        path - Path which extension should be checked
        Returns:
        The RDFSyntax which has a matching RDFSyntax.fileExtension(), otherwise Optional.empty().
      • fileExtension

        private static java.util.Optional<java.lang.String> fileExtension​(java.nio.file.Path path)
        Return the file extension of a Path - if any.

        The returned file extension includes the leading .

        Note that this only returns the last extension, e.g. the file extension for archive.tar.gz would be .gz

        Parameters:
        path - Path which filename might contain an extension
        Returns:
        File extension (including the leading ., or Optional.empty() if the path has no extension
      • createRDFTermFactory

        protected RDF createRDFTermFactory()
        Create a new RDF for a parse session.

        This is called by parse() to set rdfTermFactory(RDF) if it is Optional.empty().

        As parsed blank nodes might be made with RDF.createBlankNode(String), each call to this method SHOULD return a new RDF instance.

        Returns:
        A new RDF
      • parse

        public java.util.concurrent.Future<RDFParser.ParseResult> parse()
                                                                 throws java.io.IOException,
                                                                        java.lang.IllegalStateException
        Description copied from interface: RDFParser
        Parse the specified source.

        A source method (e.g. RDFParser.source(InputStream), RDFParser.source(IRI), RDFParser.source(Path), RDFParser.source(String) or an equivalent subclass method) MUST have been called before calling this method, otherwise an IllegalStateException will be thrown.

        A target method (e.g. RDFParser.target(Consumer), RDFParser.target(Dataset), RDFParser.target(Graph) or an equivalent subclass method) MUST have been called before calling parse(), otherwise an IllegalStateException will be thrown.

        It is undefined if this method is thread-safe, however the RDFParser may be reused (e.g. setting a different source) as soon as the Future has been returned from this method.

        The RDFParser SHOULD perform the parsing as an asynchronous operation, and return the Future as soon as preliminary checks (such as validity of the RDFParser.source(IRI) and RDFParser.contentType(RDFSyntax) settings) have finished. The future SHOULD not mark Future.isDone() before parsing is complete. A synchronous implementation MAY be blocking on the parse() call and return a Future that is already Future.isDone().

        The returned Future contains a RDFParser.ParseResult. Implementations may subclass this interface to provide any parser details, e.g. list of warnings. null is a possible return value if no details are available, but parsing succeeded.

        If an exception occurs during parsing, (e.g. IOException or org.apache.commons.rdf.simple.experimental.RDFParseException), it should be indicated as the Throwable.getCause() in the ExecutionException thrown on Future.get().

        Specified by:
        parse in interface RDFParser
        Returns:
        A Future that will return the populated Graph when the parsing has finished.
        Throws:
        java.io.IOException - If an error occurred while starting to read the source (e.g. file not found, unsupported IRI protocol). Note that IO errors during parsing would instead be the Throwable.getCause() of the ExecutionException thrown on Future.get().
        java.lang.IllegalStateException - If the builder is in an invalid state, e.g. a source has not been set.
      • target

        public T target​(java.util.function.Consumer<Quad> consumer)
        Description copied from interface: RDFParser
        Specify a consumer for parsed quads.

        The quads will include triples in all named graphs of the parsed source, including any triples in the default graph. When parsing a source format which do not support datasets, all quads delivered to the consumer will be in the default graph (e.g. their Quad.getGraphName() will be as Optional.empty()), while for a source

        It is undefined if any quads are consumed if RDFParser.parse() throws any exceptions. On the other hand, if RDFParser.parse() does not indicate an exception, the implementation SHOULD have produced all parsed quads to the specified consumer.

        Calling this method will override any earlier targets set with RDFParser.target(Graph), RDFParser.target(Consumer) or RDFParser.target(Dataset).

        The consumer is not assumed to be thread safe - only one Consumer.accept(Object) is delivered at a time for a given RDFParser.parse() call.

        This method is typically called with a functional consumer, for example:

         
         List<Quad> quads = new ArrayList<Quad>;
         parserBuilder.target(quads::add).parse();
         
         
        Specified by:
        target in interface RDFParser
        Parameters:
        consumer - A Consumer of Quads
        Returns:
        An RDFParser that will call the consumer for into the specified dataset.