Class JsoupHtmlParser

  • All Implemented Interfaces:
    IXmlParser

    public class JsoupHtmlParser
    extends java.lang.Object
    implements IXmlParser
    Class that uses JSoup to parse HTML.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static org.slf4j.Logger logger
      The logger.
    • Constructor Summary

      Constructors 
      Constructor Description
      JsoupHtmlParser()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      IDocumentNode parse​(java.io.InputStream htmlStream, java.lang.String charset)
      Parses XML provided as an InputStream and an encoding.
      IDocumentNode parse​(java.lang.String html)
      Parses XML provided as a String.
      private INode wrapJsoupHierarchy​(Node jsoupNode)
      Wraps JSoup nodes into pdfHTML INode classes.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • logger

        private static org.slf4j.Logger logger
        The logger.
    • Constructor Detail

      • JsoupHtmlParser

        public JsoupHtmlParser()
    • Method Detail

      • parse

        public IDocumentNode parse​(java.io.InputStream htmlStream,
                                   java.lang.String charset)
                            throws java.io.IOException
        Description copied from interface: IXmlParser
        Parses XML provided as an InputStream and an encoding.
        Specified by:
        parse in interface IXmlParser
        Parameters:
        htmlStream - the Xml stream
        charset - the character set. If null then parser should detect encoding from stream.
        Returns:
        a document node
        Throws:
        java.io.IOException - Signals that an I/O exception has occurred.
      • parse

        public IDocumentNode parse​(java.lang.String html)
        Description copied from interface: IXmlParser
        Parses XML provided as a String.
        Specified by:
        parse in interface IXmlParser
        Parameters:
        html - the Xml string
        Returns:
        a document node
      • wrapJsoupHierarchy

        private INode wrapJsoupHierarchy​(Node jsoupNode)
        Wraps JSoup nodes into pdfHTML INode classes.
        Parameters:
        jsoupNode - the JSoup node instance
        Returns:
        the INode instance