Class CMapAwareDocumentFont


public class CMapAwareDocumentFont extends DocumentFont
Implementation of DocumentFont used while parsing PDF streams.
Since:
2.1.4
  • Field Details

    • fontDic

      private PdfDictionary fontDic
      The font dictionary.
    • spaceWidth

      private int spaceWidth
      the width of a space for this font, in normalized 1000 point units
    • toUnicodeCmap

      private CMap toUnicodeCmap
      The CMap constructed from the ToUnicode map from the font's dictionary, if present. This CMap transforms CID values into unicode equivalent
    • cidbyte2uni

      private char[] cidbyte2uni
      Mapping between CID code (single byte only for now) and unicode equivalent as derived by the font's encoding. Only needed if the ToUnicode CMap is not provided.
  • Constructor Details

    • CMapAwareDocumentFont

      public CMapAwareDocumentFont(PRIndirectReference refFont)
      Creates an instance of a CMapAwareFont based on an indirect reference to a font.
      Parameters:
      refFont - the indirect reference to a font
  • Method Details

    • processToUnicode

      private void processToUnicode()
      Parses the ToUnicode entry, if present, and constructs a CMap for it
      Since:
      2.1.7
    • processUni2Byte

      private void processUni2Byte()
      Inverts DocumentFont's uni2byte mapping to obtain a cid-to-unicode mapping based on the font's encoding
      Since:
      2.1.7
    • computeAverageWidth

      private int computeAverageWidth()
      For all widths of all glyphs, compute the average width in normalized 1000 point units. This is used to give some meaningful width in cases where we need an average font width (such as if the width of a space isn't specified by a given font)
      Returns:
      the average width of all non-zero width glyphs in the font
    • getWidth

      public int getWidth(int char1)
      Description copied from class: DocumentFont
      Gets the width of a char in normalized 1000 units.
      Overrides:
      getWidth in class DocumentFont
      Parameters:
      char1 - the unicode char to get the width of
      Returns:
      the width in normalized 1000 units
      Since:
      2.1.5 Override to allow special handling for fonts that don't specify width of space character
      See Also:
    • decodeSingleCID

      private String decodeSingleCID(byte[] bytes, int offset, int len)
      Decodes a single CID (represented by one or two bytes) to a unicode String.
      Parameters:
      bytes - the bytes making up the character code to convert
      offset - an offset
      len - a length
      Returns:
      a String containing the encoded form of the input bytes using the font's encoding.
    • hasUnicodeCMAP

      public boolean hasUnicodeCMAP()
      Returns:
      true if this font has unicode information available.
    • hasTwoByteUnicodeCMAP

      public boolean hasTwoByteUnicodeCMAP()
      Returns:
      true if this font has unicode information available and if it is two bytes.
    • decode

      public String decode(byte[] cidbytes, int offset, int len)
      Decodes a string of bytes (encoded in the font's encoding) into a unicode string. This will use the ToUnicode map of the font, if available, otherwise it uses the font's encoding
      Parameters:
      cidbytes - the bytes that need to be decoded
      offset - offset
      len - length
      Returns:
      the unicode String that results from decoding
      Since:
      2.1.7
    • decode

      public String decode(String chars)
      Decodes a string. This is a normal Java string, but if the range of character values exceeds the range of the encoding for the font, this will fail. Required since we need to process the characters of strings, and we can't determine the character boundaries in advance, especially because of Identity-H encoded fonts which have two-byte character indexes.

      PdfString is used to hold character code points, even though the bytes may not map 1-1. It's not possible to change the encoding once a string is in place.

      Parameters:
      chars - the Characters that need to be decoded
      Returns:
      the unicode String that results from decoding
      Since:
      2.1.
    • decode

      public String decode(char c) throws Error
      Decode single character whose value represents a code point in this font. Will fail if the characters do not have values that correspond to valid code points for the font.
      Parameters:
      c - character to decode
      Returns:
      Unicode character corresponding to the remapped code according to the font's current encoding.
      Throws:
      Error - if the the character is out of range