Class PDFParser


public class PDFParser extends COSParser
  • Constructor Details

    • PDFParser

      public PDFParser(RandomAccessRead source) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - source representing the pdf.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, ScratchFile scratchFile) throws IOException
      Constructor.
      Parameters:
      source - input representing the pdf.
      scratchFile - use a ScratchFile for temporary storage.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword, ScratchFile scratchFile) throws IOException
      Constructor.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      scratchFile - use a ScratchFile for temporary storage.
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) throws IOException
      Constructor. Unrestricted main memory will be used for buffering PDF streams.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      keyStore - key store to be used for decryption when using public key security
      alias - alias to be used for decryption when using public key security
      Throws:
      IOException - If something went wrong.
    • PDFParser

      public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, ScratchFile scratchFile) throws IOException
      Constructor.
      Parameters:
      source - input representing the pdf.
      decryptionPassword - password to be used for decryption.
      keyStore - key store to be used for decryption when using public key security
      alias - alias to be used for decryption when using public key security
      scratchFile - buffer handler for temporary storage; it will be closed on COSDocument.close()
      Throws:
      IOException - If something went wrong.
  • Method Details

    • getPDDocument

      public PDDocument getPDDocument() throws IOException
      This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
      Returns:
      The document at the PD layer.
      Throws:
      IOException - If there is an error getting the document.
    • initialParse

      protected void initialParse() throws IOException
      The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If something went wrong.
    • parse

      public void parse() throws IOException
      This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.
      Throws:
      InvalidPasswordException - If the password is incorrect.
      IOException - If there is an error reading from the stream or corrupt data is found.