All Implemented Interfaces:
Closeable, AutoCloseable, Channel, ReadableByteChannel

final class LZW extends CompressionChannel
Inflater for values encoded with the LZW compression. This compression is described in section 13 of TIFF 6 specification, "LZW Compression". Each code is written using at least 9 bits and at most 12 bits. Unisys's patent on the LZW algorithm expired in 2004.
Since:
1.1
Version:
1.3
  • Field Details

    • CLEAR_CODE

      private static final int CLEAR_CODE
      A 12 bits code meaning that we have exhausted the 4093 available codes and must reset the table to the initial set of 9 bits code.
      See Also:
    • EOI_CODE

      private static final int EOI_CODE
      End of information. This code appears at the end of a strip.
      See Also:
    • FIRST_ADAPTATIVE_CODE

      private static final int FIRST_ADAPTATIVE_CODE
      First code which is not one of the predefined codes.
      See Also:
    • OFFSET_TO_MAXIMUM

      private static final int OFFSET_TO_MAXIMUM
      For computing value of indexOfFreeEntry when codeSize needs to be incremented. TIFF specification said that the size needs to be incremented after codes 510, 1022 and 2046 are added to the entriesForCodes table. Those values are a little bit lower than what we would expect if the full integer ranges were used.
      See Also:
    • MIN_CODE_SIZE

      private static final int MIN_CODE_SIZE
      Initial number of bits in a code. TIFF specification said that the size needs to be incremented after codes 510, 1022 and 2046 are added to the entriesForCodes table.
      See Also:
    • MAX_CODE_SIZE

      private static final int MAX_CODE_SIZE
      Maximum number of bits in a code, inclusive.
      See Also:
    • codeSize

      private int codeSize
      Number of bits to read for the next code. This number starts at 9 and increases until 12. After 12 bits, a CLEAR_CODE should occur in the stream of LZW data.
    • LOWEST_OFFSET_BIT

      private static final int LOWEST_OFFSET_BIT
      Position of the lowest bit in an entriesForCodes element where the offset is stored. The position is chosen for leaving 12 bits for storing the length before the offset value.
      Rational: even in the worst case scenario where the same byte is always appended to the sequence, the maximal length cannot exceeded the dictionary size because a CLEAR_CODE will be emitted when the dictionary is full.
      See Also:
    • LENGTH_MASK

      private static final int LENGTH_MASK
      The mask to apply on an entriesForCodes element for getting the length.
      See Also:
    • STRING_ALIGNMENT

      private static final int STRING_ALIGNMENT
      Number of bits in an offset that are always 0 and consequently do not need to be stored. An intentional consequence of this restriction is that size of blocks allocated in the stringsFromCode array must be multiples of (1 << STRING_ALIGNMENT). It makes possible to use the extra size for growing a string up to that amount of bytes without copying it.
      Note: doing allocations only by blocks of 2² = 4 bytes may seem a waste of memory, but actually it reduces memory usage a lot (almost a factor 4) because of the copies avoided. We tried with alignment values 1, 2, 3 and found that 2 seems optimal.
      See Also:
    • PREALLOCATED_SPACE_IS_USED_MASK

      private static final int PREALLOCATED_SPACE_IS_USED_MASK
      Mask for a bit in an entriesForCodes element for telling whether the extra space allocated in the stringsFromCode array has already been used by another entry. If yes (1), then that space cannot be used by new entry. Instead, the new entry will need to allocate a new space.

      Note: newEntryNeedsAllocation(int) implementation assumes that this bit is the sign bit.

      See Also:
    • OFFSET_MASK

      private static final int OFFSET_MASK
      The mask to apply on an entriesForCodes element for getting the compressed offset (before shifting).
      See Also:
    • OFFSET_SHIFT

      private static final int OFFSET_SHIFT
      The shift to apply on a compressed offset (after application of OFFSET_MASK) for getting the uncompressed offset.
      See Also:
    • OFFSET_LIMIT

      private static final int OFFSET_LIMIT
      Maximal value + 1 that the offset can take. The compressed offset takes all the bits after the length, minus one bit that we keep for the PREALLOCATED_SPACE_IS_USED_MASK flag. Note that compressed offsets are multiplied by 1 << STRING_ALIGNMENT for getting the actual offset.
      See Also:
    • LENGTH_MASK_FOR_ALLOCATE

      private static final int LENGTH_MASK_FOR_ALLOCATE
      A mask used for detecting when a new allocation is required. If (length & LENGTH_MASK_FOR_ALLOCATE) == 0 and assuming that length is always incremented by 1, then a new allocation is necessary.
      See Also:
    • entriesForCodes

      private final int[] entriesForCodes
      Pointers to byte sequences for a code in the entriesForCodes array. Each element is a value encoded by offsetAndLength(int, int) method. Elements are decoded by offset(int) length(int) methods.
    • previousCode

      private int previousCode
      Last code found in previous iteration. This is a valid index in the entriesForCodes array. A EOI_CODE value means that the decompression is finished.
    • pendingOffset

      private int pendingOffset
      If some bytes could not be written in previous read(…) execution because the target buffer was full, offset and length of those bytes. Otherwise 0.
    • pendingLength

      private int pendingLength
      If some bytes could not be written in previous read(…) execution because the target buffer was full, offset and length of those bytes. Otherwise 0.
    • indexOfFreeEntry

      private int indexOfFreeEntry
      Index of the next entry available in entriesForCodes. Shall not be lower than 258.
    • indexOfFreeString

      private int indexOfFreeString
      Index of the next byte available in stringsFromCode. Shall not be lower than 1 << Byte.SIZE.
    • stringsFromCode

      private byte[] stringsFromCode
      Sequences of bytes associated to codes. For a given c code read from the stream, the first uncompressed byte is stringsFromCode(offset(entriesForCodes[c])) and the number of bytes is length(entriesForCodes[c]).
  • Constructor Details

    • LZW

      public LZW(ChannelDataInput input, StoreListeners listeners)
      Creates a new channel which will decompress data from the given input. The setInputRegion(long, long) method must be invoked after construction before a reading process can start.
      Parameters:
      input - the source of data to decompress.
      listeners - object where to report warnings.
  • Method Details

    • length

      private static int length(int element)
      Extracts the number of bytes of an entry stored in the stringsFromCode array.
      Parameters:
      element - an element of the entriesForCodes array.
      Returns:
      number of consecutive bytes to read in stringsFromCode array.
    • offset

      private static int offset(int element)
      Extracts the index of the first byte of an entry stored in the stringsFromCode array.
      Parameters:
      element - an element of the entriesForCodes array.
      Returns:
      index of the first byte to read in stringsFromCode array.
    • offsetAndLength

      private static int offsetAndLength(int offset, int length)
      Encodes an offset together with its length.
    • newEntryNeedsAllocation

      private static boolean newEntryNeedsAllocation(int element)
      Returns true if all the space allocated for the given entry is already used. This is true if at least one of the following conditions is true:
      Parameters:
      element - an element of the entriesForCodes array.
      Returns:
      whether all the space for that entry is already used.
    • setInputRegion

      public void setInputRegion(long start, long byteCount) throws IOException
      Prepares this inflater for reading a new tile or a new band of a tile.
      Overrides:
      setInputRegion in class CompressionChannel
      Parameters:
      start - stream position where to start reading.
      byteCount - number of bytes to read from the input.
      Throws:
      IOException - if the stream cannot be seek to the given start position.
    • clearTable

      private void clearTable()
      Clears the entriesForCodes table.
    • readNextCode

      public final int readNextCode() throws IOException
      Reads codeSize bits from the stream.
      Returns:
      the value of the next bits from the stream.
      Throws:
      IOException - if an error occurred while reading.
    • read

      public int read(ByteBuffer target) throws IOException
      Decompresses some bytes from the input into the given destination buffer.
      Parameters:
      target - the buffer into which bytes are to be transferred.
      Returns:
      the number of bytes read, or -1 if end-of-stream.
      Throws:
      IOException - if some other I/O error occurs.
    • unexpectedData

      private IOException unexpectedData()
      The exception to throw if the decompression process encounters data that it cannot process.