Class PDFFile

java.lang.Object
com.sun.pdfview.PDFFile

public class PDFFile extends Object
An encapsulation of a .pdf file. The methods of this class can parse the contents of a PDF file, but those methods are hidden. Instead, the public methods of this class allow access to the pages in the PDF file. Typically, you create a new PDFFile, ask it for the number of pages, and then request one or more PDFPages.
  • Field Details

    • NUL_CHAR

      public static final int NUL_CHAR
      See Also:
    • FF_CHAR

      public static final int FF_CHAR
      See Also:
    • versionString

      private String versionString
    • majorVersion

      private int majorVersion
    • minorVersion

      private int minorVersion
    • VERSION_COMMENT

      private static final String VERSION_COMMENT
      the comment text to begin the file to determine it's version
      See Also:
    • buf

      A ByteBuffer containing the file data
    • objIdx

      PDFXref[] objIdx
      the cross reference table mapping object numbers to locations in the PDF file
    • root

      PDFObject root
      the root PDFObject, as specified in the PDF file
    • encrypt

      PDFObject encrypt
      the Encrypt PDFObject, from the trailer
    • info

      PDFObject info
      The Info PDFPbject, from the trailer, for simple metadata
    • cache

      Cache cache
      a mapping of page numbers to parsed PDF commands
    • printable

      private boolean printable
      whether the file is printable or not (trailer -> Encrypt -> P invalid input: '&' 0x4)
    • saveable

      private boolean saveable
      whether the file is saveable or not (trailer -> Encrypt -> P invalid input: '&' 0x10)
    • defaultDecrypter

      private PDFDecrypter defaultDecrypter
      The default decrypter for streams and strings. By default, no encryption is expected, and thus the IdentityDecrypter is used.
  • Constructor Details

    • PDFFile

      public PDFFile(ByteBuffer buf) throws IOException
      get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.

      Use the getPage(...) methods to get a page from the PDF file.

      Parameters:
      buf - the RandomAccessFile containing the PDF.
      Throws:
      IOException - if there's a problem reading from the buffer
      PDFParseException - if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception's cause will be an instance of UnsupportedEncryptionException.
      PDFAuthenticationFailureException - if the file is password protected and requires a password
    • PDFFile

      public PDFFile(ByteBuffer buf, PDFPassword password) throws IOException
      get a PDFFile from a .pdf file. The file must me a random access file at the moment. It should really be a file mapping from the nio package.

      Use the getPage(...) methods to get a page from the PDF file.

      Parameters:
      buf - the RandomAccessFile containing the PDF.
      password - the user or owner password
      Throws:
      IOException - if there's a problem reading from the buffer
      PDFParseException - if the document appears to be malformed, or its features are unsupported. If the file is encrypted in a manner that the product or platform does not support then the exception's cause will be an instance of UnsupportedEncryptionException.
      PDFAuthenticationFailureException - if the file is password protected and the supplied password does not decrypt the document
  • Method Details

    • isPrintable

      public boolean isPrintable()
      Gets whether the owner of the file has given permission to print the file.
      Returns:
      true if it is okay to print the file
    • isSaveable

      public boolean isSaveable()
      Gets whether the owner of the file has given permission to save a copy of the file.
      Returns:
      true if it is okay to save the file
    • getRoot

      public PDFObject getRoot()
      get the root PDFObject of this PDFFile. You generally shouldn't need this, but we've left it open in case you want to go spelunking.
    • getNumPages

      public int getNumPages()
      return the number of pages in this PDFFile. The pages will be numbered from 1 to getNumPages(), inclusive.
    • getStringMetadata

      public String getStringMetadata(String name) throws IOException
      Get metadata (e.g., Author, Title, Creator) from the Info dictionary as a string.
      Parameters:
      name - the name of the metadata key (e.g., Author)
      Returns:
      the info
      Throws:
      IOException - if the metadata cannot be read
    • getMetadataKeys

      public Iterator<String> getMetadataKeys() throws IOException
      Get the keys into the Info metadata, for use with getStringMetadata(String)
      Returns:
      the keys present into the Info dictionary
      Throws:
      IOException - if the keys cannot be read
    • dereference

      public PDFObject dereference(PDFXref ref, PDFDecrypter decrypter) throws IOException
      Used internally to track down PDFObject references. You should never need to call this.

      Since this is the only public method for tracking down PDF objects, it is synchronized. This means that the PDFFile can only hunt down one object at a time, preventing the file's location from getting messed around.

      This call stores the current buffer position before any changes are made and restores it afterwards, so callers need not know that the position has changed.

      Throws:
      IOException
    • isWhiteSpace

      public static boolean isWhiteSpace(int c)
      Is the argument a white space character according to the PDF spec?. ISO Spec 32000-1:2008 - Table 1
    • isDelimiter

      public static boolean isDelimiter(int c)
      Is the argument a delimiter according to the PDF spec?

      ISO 32000-1:2008 - Table 2

      Parameters:
      c - the character to test
    • isRegularCharacter

      public static boolean isRegularCharacter(int c)
      return true if the character is neither a whitespace or a delimiter.
      Parameters:
      c - the character to test
      Returns:
      boolean
    • readObject

      private PDFObject readObject(int objNum, int objGen, PDFDecrypter decrypter) throws IOException
      read the next object from the file
      Parameters:
      objNum - the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      decrypter - the decrypter to use
      Throws:
      IOException
    • readObject

      private PDFObject readObject(int objNum, int objGen, boolean numscan, PDFDecrypter decrypter) throws IOException
      read the next object with a special catch for numbers
      Parameters:
      objNum - the object number of the object containing the object being read; negative only if the object number is unavailable (e.g., if reading from the trailer, or reading at the top level, in which case we can expect to be reading an object description)
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      numscan - if true, don't bother trying to see if a number is an object reference (used when already in the middle of testing for an object reference, and not otherwise)
      decrypter - the decrypter to use
      Throws:
      IOException
    • nextItemIs

      private boolean nextItemIs(String match) throws IOException
      requires the next few characters (after whitespace) to match the argument.
      Parameters:
      match - the next few characters after any whitespace that must be in the file
      Returns:
      true if the next characters match; false otherwise.
      Throws:
      IOException
    • processVersion

      private void processVersion(String versionString)
      process a version string, to determine the major and minor versions of the file.
      Parameters:
      versionString -
    • getMajorVersion

      public int getMajorVersion()
      return the major version of the PDF header.
      Returns:
      int
    • getMinorVersion

      public int getMinorVersion()
      return the minor version of the PDF header.
      Returns:
      int
    • getVersionString

      public String getVersionString()
      return the version string from the PDF header.
      Returns:
      String
    • readDictionary

      private PDFObject readDictionary(int objNum, int objGen, PDFDecrypter decrypter) throws IOException
      read an entire << dictionary >>. The initial << has already been read.
      Parameters:
      objNum - the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailer
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      decrypter - the decrypter to use
      Returns:
      the Dictionary as a PDFObject.
      Throws:
      IOException
    • readHexDigit

      private int readHexDigit() throws IOException
      read a character, and return its value as if it were a hexidecimal digit.
      Returns:
      a number between 0 and 15 whose value matches the next hexidecimal character. Returns -1 if the next character isn't in [0-9a-fA-F]
      Throws:
      IOException
    • readHexPair

      private int readHexPair() throws IOException
      return the 8-bit value represented by the next two hex characters. If the next two characters don't represent a hex value, return -1 and reset the read head. If there is only one hex character, return its value as if there were an implicit 0 after it.
      Throws:
      IOException
    • readHexString

      private PDFObject readHexString(int objNum, int objGen, PDFDecrypter decrypter) throws IOException
      read a invalid input: '<' hex string >. The initial invalid input: '<' has already been read.
      Parameters:
      objNum - the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a string placed directly in the trailer
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      decrypter - the decrypter to use
      Throws:
      IOException
    • unicode

      private String unicode(String input)
      take a string and determine if it is unicode by looking at the lead characters, and that the string must be a multiple of 2 chars long. Convert a unicoded string's characters into the true unicode.
      Parameters:
      input -
      Returns:
    • readLiteralString

      private PDFObject readLiteralString(int objNum, int objGen, PDFDecrypter decrypter) throws IOException

      read a ( character string ). The initial ( has already been read. Read until a *balanced* ) appears.

      PDF Reference Section 3.8.1, Table 3.31 "PDF Data Types" defines String data as:

       "text string     Bytes that represent characters encoded
                        using either PDFDocEncoding or UTF-16BE with a
                        leading byte-order marker (as defined in
                        "Text String Type" on page 158.)
       

      Section 5.3.2 defines character sequences and escapes.
      "The strings must conform to the syntax for string objects. When a string is written by enclosing the data in parentheses, bytes whose values are the same as those of the ASCII characters left parenthesis (40), right parenthesis (41), and backslash (92) must be preceded by a backslash character. All other byte values between 0 and 255 may be used in a string object.
      These rules apply to each individual byte in a string object, whether the string is interpreted by the text-showing operators as single-byte or multiple-byte character codes."

      This only reads 8 bit basic 'strings' so as to avoid a text string interpretation when one is not desired (e.g., for byte strings). For a text string interpretation of a string, use PDFStringUtil.asTextString(java.lang.String) ()} or PDFObject.getTextStringValue()

      Parameters:
      objNum - the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading a dictionary placed directly in the trailer
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      decrypter - the decrypter to use
      Throws:
      IOException
    • readLine

      private String readLine()
      Read a line of text. This follows the semantics of readLine() in DataInput -- it reads character by character until a '/n' is encountered. If a '/r' is encountered, it is discarded.
    • readArray

      private PDFObject readArray(int objNum, int objGen, PDFDecrypter decrypter) throws IOException
      read an [ array ]. The initial [ has already been read. PDFObjects are read until ].
      Parameters:
      objNum - the object number of the object containing the dictionary being read; negative only if the object number is unavailable, which should only happen if we're reading an array placed directly in the trailer
      objGen - the object generation of the object containing the object being read; negative only if the objNum is unavailable
      decrypter - the decrypter to use
      Throws:
      IOException
    • readName

      private PDFObject readName() throws IOException
      read a /name. The / has already been read.
      Throws:
      IOException
    • readNumber

      private PDFObject readNumber(char start) throws IOException
      read a number. The initial digit or . or - is passed in as the argument.
      Throws:
      IOException
    • readKeyword

      private PDFObject readKeyword(char start) throws IOException
      read a bare keyword. The initial character is passed in as the argument.
      Throws:
      IOException
    • readObjectDescription

      private PDFObject readObjectDescription(int objNum, int objGen, PDFDecrypter decrypter) throws IOException
      read an entire PDFObject. The intro line, which looks something like "4 0 obj" has already been read.
      Parameters:
      objNum - the object number of the object being read, being the first number in the intro line (4 in "4 0 obj")
      objGen - the object generation of the object being read, being the second number in the intro line (0 in "4 0 obj").
      decrypter - the decrypter to use
      Throws:
      IOException
    • readStream

      private ByteBuffer readStream(PDFObject dict) throws IOException
      read the stream portion of a PDFObject. Calls decodeStream to un-filter the stream as necessary.
      Parameters:
      dict - the dictionary associated with this stream.
      Returns:
      a ByteBuffer with the encoded stream data
      Throws:
      IOException
    • readTrailer

      read the cross reference table from a PDF file. When this method is called, the file pointer must point to the start of the word "xref" in the file. Reads the xref table and the trailer dictionary. If dictionary has a /Prev entry, move file pointer and read new trailer
      Parameters:
      password -
      Throws:
      IOException
      PDFAuthenticationFailureException
      EncryptionUnsupportedByProductException
      EncryptionUnsupportedByPlatformException
    • parseFile

      private void parseFile(PDFPassword password) throws IOException
      build the PDFFile reference table. Nothing in the PDFFile actually gets parsed, despite the name of this function. Things only get read and parsed when they're needed.
      Parameters:
      password -
      Throws:
      IOException
    • getOutline

      public OutlineNode getOutline() throws IOException
      Gets the outline tree as a tree of OutlineNode, which is a subclass of DefaultMutableTreeNode. If there is no outline tree, this method returns null.
      Throws:
      IOException
    • getPageNumber

      public int getPageNumber(PDFObject page) throws IOException
      Gets the page number (starting from 1) of the page represented by a particular PDFObject. The PDFObject must be a Page dictionary or a destination description (or an action).
      Returns:
      a number between 1 and the number of pages indicating the page number, or 0 if the PDFObject is not in the page tree.
      Throws:
      IOException
    • getPage

      public PDFPage getPage(int pagenum)
      Get the page commands for a given page in a separate thread.
      Parameters:
      pagenum - the number of the page to get commands for
    • getPage

      public PDFPage getPage(int pagenum, boolean wait)
      Get the page commands for a given page.
      Parameters:
      pagenum - the number of the page to get commands for
      wait - if true, do not exit until the page is complete.
    • stop

      public void stop(int pageNum)
      Stop the rendering of a particular image on this page
    • getContents

      private byte[] getContents(PDFObject pageObj) throws IOException
      get the stream representing the content of a particular page.
      Parameters:
      pageObj - the page object to get the contents of
      Returns:
      a concatenation of any content streams for the requested page.
      Throws:
      IOException
    • createPage

      private PDFPage createPage(int pagenum, PDFObject pageObj) throws IOException
      Create a PDF Page object by finding the relevant inherited properties
      Parameters:
      pageObj - the PDF object for the page to be created
      Throws:
      IOException
    • findPage

      private PDFObject findPage(PDFObject pagedict, int start, int getPage, Map<String,PDFObject> resources) throws IOException
      Get the PDFObject representing the content of a particular page. Note that the number of the page need not have anything to do with the label on that page. If there are two blank pages, and then roman numerals for the page number, then passing in 6 will get page (iv).
      Parameters:
      pagedict - the top of the pages tree
      start - the page number of the first page in this dictionary
      getPage - the number of the page to find; NOT the page's label.
      resources - a HashMap that will be filled with any resource definitions encountered on the search for the page
      Throws:
      IOException
    • getInheritedValue

      private PDFObject getInheritedValue(PDFObject pageObj, String propName) throws IOException
      Find a property value in a page that may be inherited. If the value is not defined in the page itself, follow the page's "parent" links until the value is found or the top of the tree is reached.
      Parameters:
      pageObj - the object representing the page
      propName - the name of the property we are looking for
      Throws:
      IOException
    • parseRect

      public Rectangle2D.Float parseRect(PDFObject obj) throws IOException
      get a Rectangle2D.Float representation for a PDFObject that is an array of four Numbers.
      Parameters:
      obj - a PDFObject that represents an Array of exactly four Numbers.
      Throws:
      IOException
    • getDefaultDecrypter

      public PDFDecrypter getDefaultDecrypter()
      Get the default decrypter for the document
      Returns:
      the default decrypter; never null, even for documents that aren't encrypted