Class InlineImageParsingUtils

java.lang.Object
com.itextpdf.kernel.pdf.canvas.parser.util.InlineImageParsingUtils

public final class InlineImageParsingUtils extends Object
Utility methods to help with processing of inline images
  • Field Details

    • LOGGER

      private static final org.slf4j.Logger LOGGER
    • inlineImageEntryAbbreviationMap

      private static final Map<PdfName,PdfName> inlineImageEntryAbbreviationMap
      Map between key abbreviations allowed in dictionary of inline images and their equivalent image dictionary keys
    • inlineImageColorSpaceAbbreviationMap

      private static final Map<PdfName,PdfName> inlineImageColorSpaceAbbreviationMap
      Map between value abbreviations allowed in dictionary of inline images for COLORSPACE
    • inlineImageFilterAbbreviationMap

      private static final Map<PdfName,PdfName> inlineImageFilterAbbreviationMap
      Map between value abbreviations allowed in dictionary of inline images for FILTER
  • Constructor Details

    • InlineImageParsingUtils

      private InlineImageParsingUtils()
  • Method Details

    • parse

      public static PdfStream parse(PdfCanvasParser ps, PdfDictionary colorSpaceDic) throws IOException
      Parses an inline image from the provided content parser. The parser must be positioned immediately following the BI operator in the content stream. The parser will be left with current position immediately following the EI operator that terminates the inline image
      Parameters:
      ps - the content parser to use for reading the image.
      colorSpaceDic - a color space dictionary
      Returns:
      the parsed image
      Throws:
      IOException - if anything goes wring with the parsing
      InlineImageParsingUtils.InlineImageParseException - if parsing of the inline image failed due to issues specific to inline image processing
    • getComponentsPerPixel

      static int getComponentsPerPixel(PdfName colorSpaceName, PdfDictionary colorSpaceDic)
      Parameters:
      colorSpaceName - the name of the color space. If null, a bi-tonal (black and white) color space is assumed.
      Returns:
      the components per pixel for the specified color space
    • parseDictionary

      private static PdfDictionary parseDictionary(PdfCanvasParser ps) throws IOException
      Parses the next inline image dictionary from the parser. The parser must be positioned immediately following the BI operator. The parser will be left with position immediately following the whitespace character that follows the ID operator that ends the inline image dictionary.
      Parameters:
      ps - the parser to extract the embedded image information from
      Returns:
      the dictionary for the inline image, with any abbreviations converted to regular image dictionary keys and values
      Throws:
      IOException - if the parse fails
    • getAlternateValue

      private static PdfObject getAlternateValue(PdfName key, PdfObject value)
      Transforms value abbreviations into their corresponding real value
      Parameters:
      key - the key that the value is for
      value - the value that might be an abbreviation
      Returns:
      if value is an allowed abbreviation for the key, the expanded value for that abbreviation. Otherwise, value is returned without modification
    • computeBytesPerRow

      private static int computeBytesPerRow(PdfDictionary imageDictionary, PdfDictionary colorSpaceDic)
      Computes the number of unfiltered bytes that each row of the image will contain. If the number of bytes results in a partial terminating byte, this number is rounded up per the PDF specification
      Parameters:
      imageDictionary - the dictionary of the inline image
      Returns:
      the number of bytes per row of the image
    • parseUnfilteredSamples

      private static byte[] parseUnfilteredSamples(PdfDictionary imageDictionary, PdfDictionary colorSpaceDic, PdfCanvasParser ps) throws IOException
      Parses the samples of the image from the underlying content parser, ignoring all filters. The parser must be positioned immediately after the ID operator that ends the inline image's dictionary. The parser will be left positioned immediately following the EI operator. This is primarily useful if no filters have been applied.
      Parameters:
      imageDictionary - the dictionary of the inline image
      ps - the content parser
      Returns:
      the samples of the image
      Throws:
      IOException - if anything bad happens during parsing
    • parseSamples

      private static byte[] parseSamples(PdfDictionary imageDictionary, PdfDictionary colorSpaceDic, PdfCanvasParser ps) throws IOException
      Parses the samples of the image from the underlying content parser, accounting for filters The parser must be positioned immediately after the ID operator that ends the inline image's dictionary. The parser will be left positioned immediately following the EI operator. Note:This implementation does not actually apply the filters at this time
      Parameters:
      imageDictionary - the dictionary of the inline image
      ps - the content parser
      Returns:
      the samples of the image
      Throws:
      IOException - if anything bad happens during parsing
    • followedByBinaryData

      private static boolean followedByBinaryData(PdfTokenizer tokenizer) throws IOException
      Check whether next several bytes of tokenizer contain binary data. This method probes 10 bytes and tries to find pdf operator in them.
      Parameters:
      tokenizer - pdf tokenizer.
      Returns:
      true if next 10 bytes is binary data, false if they're most likely pdf operators.
      Throws:
      IOException - if any I/O error occurs
    • imageColorSpaceIsKnown

      private static boolean imageColorSpaceIsKnown(PdfDictionary imageDictionary, PdfDictionary colorSpaceDic)
    • inlineImageStreamBytesAreComplete

      private static boolean inlineImageStreamBytesAreComplete(byte[] samples, PdfDictionary imageDictionary)
      This method acts like a check that bytes that were parsed are really all image bytes. If it's true, then decoding will succeed, but if not all image bytes were read and "<ws>EI<ws>" bytes were just a part of the image, then decoding should fail. Not the best solution, but probably there is no better and more reliable way to check this.

      Drawbacks: slow; images with DCTDecode, JBIG2Decode and JPXDecode filters couldn't be checked as iText doesn't support these filters; what if decoding will succeed eventhough it's not all bytes?; also I'm not sure that all filters throw an exception in case data is corrupted (For example, FlateDecodeFilter seems not to throw an exception).