Class PdfTokenizer

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable

    public class PdfTokenizer
    extends java.lang.Object
    implements java.io.Closeable
    • Field Detail

      • delims

        public static final boolean[] delims
      • Obj

        public static final byte[] Obj
      • R

        public static final byte[] R
      • Xref

        public static final byte[] Xref
      • Startxref

        public static final byte[] Startxref
      • Stream

        public static final byte[] Stream
      • Trailer

        public static final byte[] Trailer
      • N

        public static final byte[] N
      • F

        public static final byte[] F
      • Null

        public static final byte[] Null
      • True

        public static final byte[] True
      • False

        public static final byte[] False
      • reference

        protected int reference
      • generation

        protected int generation
      • hexString

        protected boolean hexString
      • closeStream

        private boolean closeStream
        Streams are closed automatically.
    • Constructor Detail

      • PdfTokenizer

        public PdfTokenizer​(RandomAccessFileOrArray file)
        Creates a PdfTokenizer for the specified RandomAccessFileOrArray. The beginning of the file is read to determine the location of the header, and the data source is adjusted as necessary to account for any junk that occurs in the byte source before the header
        Parameters:
        file - the source
    • Method Detail

      • seek

        public void seek​(long pos)
      • readFully

        public void readFully​(byte[] bytes)
                       throws java.io.IOException
        Throws:
        java.io.IOException
      • getPosition

        public long getPosition()
      • close

        public void close()
                   throws java.io.IOException
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Throws:
        java.io.IOException
      • length

        public long length()
      • read

        public int read()
                 throws java.io.IOException
        Throws:
        java.io.IOException
      • readString

        public java.lang.String readString​(int size)
                                    throws java.io.IOException
        Throws:
        java.io.IOException
      • getByteContent

        public byte[] getByteContent()
      • getStringValue

        public java.lang.String getStringValue()
      • getDecodedStringContent

        public byte[] getDecodedStringContent()
      • tokenValueEqualsTo

        public boolean tokenValueEqualsTo​(byte[] cmp)
      • getObjNr

        public int getObjNr()
      • getGenNr

        public int getGenNr()
      • backOnePosition

        public void backOnePosition​(int ch)
      • getHeaderOffset

        public int getHeaderOffset()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • checkPdfHeader

        public java.lang.String checkPdfHeader()
                                        throws java.io.IOException
        Throws:
        java.io.IOException
      • checkFdfHeader

        public void checkFdfHeader()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • getStartxref

        public long getStartxref()
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • getNextEof

        public long getNextEof()
                        throws java.io.IOException
        Gets next %%EOF marker in current PDF file.
        Returns:
        next %%EOF marker position
        Throws:
        java.io.IOException - in case of input-output related exceptions during PDF document reading
      • nextValidToken

        public void nextValidToken()
                            throws java.io.IOException
        Throws:
        java.io.IOException
      • nextToken

        public boolean nextToken()
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • getLongValue

        public long getLongValue()
      • getIntValue

        public int getIntValue()
      • isHexString

        public boolean isHexString()
      • isCloseStream

        public boolean isCloseStream()
      • setCloseStream

        public void setCloseStream​(boolean closeStream)
      • decodeStringContent

        protected static byte[] decodeStringContent​(byte[] content,
                                                    int from,
                                                    int to,
                                                    boolean hexWriting)
        Resolve escape symbols or hexadecimal symbols.

        NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.

        Parameters:
        content - string bytes to be decoded
        from - given start index
        to - given end index
        hexWriting - true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)'
        Returns:
        byte[] for decrypting or for creating String.
      • decodeStringContent

        public static byte[] decodeStringContent​(byte[] content,
                                                 boolean hexWriting)
        Resolve escape symbols or hexadecimal symbols.
        NOTE Due to PdfReference 1.7 part 3.2.3 String value contain ASCII characters, so we can convert it directly to byte array.
        Parameters:
        content - string bytes to be decoded
        hexWriting - true if given string is hex-encoded, e.g. '<69546578…>'. False otherwise, e.g. '((iText( some version)…)'
        Returns:
        byte[] for decrypting or for creating String.
      • isWhitespace

        public static boolean isWhitespace​(int ch)
        Is a certain character a whitespace? Currently checks on the following: '0', '9', '10', '12', '13', '32'.
        The same as calling isWhiteSpace(ch, true).
        Parameters:
        ch - int
        Returns:
        boolean
      • isWhitespace

        protected static boolean isWhitespace​(int ch,
                                              boolean isWhitespace)
        Checks whether a character is a whitespace. Currently checks on the following: '0', '9', '10', '12', '13', '32'.
        Parameters:
        ch - int
        isWhitespace - boolean
        Returns:
        boolean
      • isDelimiter

        protected static boolean isDelimiter​(int ch)
      • isDelimiterWhitespace

        protected static boolean isDelimiterWhitespace​(int ch)
      • throwError

        public void throwError​(java.lang.String error,
                               java.lang.Object... messageParams)
        Helper method to handle content errors. Add file position to PdfRuntimeException.
        Parameters:
        error - message.
        messageParams - error params.
        Throws:
        IOException - wrap error message into PdfRuntimeException and add position in file.
      • checkTrailer

        public static boolean checkTrailer​(ByteBuffer line)
        Checks whether line equals to 'trailer'.
        Parameters:
        line - for check
        Returns:
        true, if line is equals to 'trailer', otherwise false
      • readLineSegment

        public boolean readLineSegment​(ByteBuffer buffer)
                                throws java.io.IOException
        Reads data into the provided byte[]. Checks on leading whitespace. See isWhiteSpace(int) or isWhiteSpace(int, boolean) for a list of whitespace characters.
        The same as calling readLineSegment(input, true).
        Parameters:
        buffer - a ByteBuffer to which the result of reading will be saved
        Returns:
        true, if something was read or if the end of the input stream is not reached
        Throws:
        java.io.IOException - in case of any reading error
      • readLineSegment

        public boolean readLineSegment​(ByteBuffer buffer,
                                       boolean isNullWhitespace)
                                throws java.io.IOException
        Reads data into the provided byte[]. Checks on leading whitespace. See isWhiteSpace(int) or isWhiteSpace(int, boolean) for a list of whitespace characters.
        Parameters:
        buffer - a ByteBuffer to which the result of reading will be saved
        isNullWhitespace - boolean to indicate whether '0' is whitespace or not. If in doubt, use true or overloaded method readLineSegment(input)
        Returns:
        true, if something was read or if the end of the input stream is not reached
        Throws:
        java.io.IOException - in case of any reading error
      • checkObjectStart

        public static int[] checkObjectStart​(PdfTokenizer lineTokenizer)
        Check whether line starts with object declaration.
        Parameters:
        lineTokenizer - tokenizer, built by single line.
        Returns:
        object number and generation if check is successful, otherwise - null.