Class DeltaIndex


  • public class DeltaIndex
    extends java.lang.Object
    Index of blocks in a source file.

    The index can be passed a result buffer, and output an instruction sequence that transforms the source buffer used by the index into the result buffer. The instruction sequence can be executed by BinaryDelta to recreate the result buffer.

    An index stores the entire contents of the source buffer, but also a table of block identities mapped to locations where the block appears in the source buffer. The mapping table uses 12 bytes for every 16 bytes of source buffer, and is therefore ~75% of the source buffer size. The overall index is ~1.75x the size of the source buffer. This relationship holds for any JVM, as only a constant number of objects are allocated per index. Callers can use the method getIndexSize() to obtain a reasonably accurate estimate of the complete heap space used by this index.

    A DeltaIndex is thread-safe. Concurrent threads can use the same index to encode delta instructions for different result buffers.

    • Field Summary

      Fields 
      Modifier and Type Field Description
      (package private) static int BLKSZ
      Number of bytes in a block.
      private long[] entries
      Pairs of block hash value and src offsets.
      private static int MAX_CHAIN_LENGTH
      Maximum number of positions to consider for a given content hash.
      private byte[] src
      Original source file that we indexed.
      private static int[] T  
      private int[] table
      Pointers into the entries table, indexed by block hash.
      private int tableMask
      Mask to make block hashes into an array index for table.
      private static int[] U  
    • Constructor Summary

      Constructors 
      Constructor Description
      DeltaIndex​(byte[] sourceBuffer)
      Construct an index from the source file.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      private void copyEntries​(DeltaIndexScanner scan)  
      private int countEntries​(DeltaIndexScanner scan)  
      void encode​(java.io.OutputStream out, byte[] res)
      Generate a delta sequence to recreate the result buffer.
      boolean encode​(java.io.OutputStream out, byte[] res, int deltaSizeLimit)
      Generate a delta sequence to recreate the result buffer.
      static long estimateIndexSize​(int sourceLength)
      Estimate the size of an index for a given source.
      private static int fwdmatch​(byte[] res, int resPtr, byte[] src, int srcPtr)  
      long getIndexSize()
      Get an estimate of the memory required by this index.
      long getSourceSize()
      Get size of the source buffer this index has scanned.
      (package private) static int hashBlock​(byte[] raw, int ptr)  
      private static int keyOf​(long ent)  
      private static int negmatch​(byte[] res, int resPtr, byte[] src, int srcPtr, int limit)  
      private DeltaEncoder newEncoder​(java.io.OutputStream out, long resSize, int limit)  
      private static long sizeOf​(byte[] b)  
      private static long sizeOf​(int[] b)  
      private static long sizeOf​(long[] b)  
      private static int sizeOfArray​(int entSize, int len)  
      private static int step​(int hash, byte toRemove, byte toAdd)  
      java.lang.String toString()
      private static int valOf​(long ent)  
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Field Detail

      • MAX_CHAIN_LENGTH

        private static final int MAX_CHAIN_LENGTH
        Maximum number of positions to consider for a given content hash.

        All positions with the same content hash are stored into a single chain. The chain size is capped to ensure delta encoding stays linear time at O(len_src + len_dst) rather than quadratic at O(len_src * len_dst).

        See Also:
        Constant Field Values
      • src

        private final byte[] src
        Original source file that we indexed.
      • table

        private final int[] table
        Pointers into the entries table, indexed by block hash.

        A block hash is masked with tableMask to become the array index of this table. The value stored here is the first index within entries that starts the consecutive list of blocks with that same masked hash. If there are no matching blocks, 0 is stored instead.

        Note that this table is always a power of 2 in size, to support fast normalization of a block hash into an array index.

      • entries

        private final long[] entries
        Pairs of block hash value and src offsets.

        The very first entry in this table at index 0 is always empty, this is to allow fast evaluation when table has no values under any given slot. Remaining entries are pairs of integers, with the upper 32 bits holding the block hash and the lower 32 bits holding the source offset.

      • tableMask

        private final int tableMask
        Mask to make block hashes into an array index for table.
      • T

        private static final int[] T
      • U

        private static final int[] U
    • Constructor Detail

      • DeltaIndex

        public DeltaIndex​(byte[] sourceBuffer)
        Construct an index from the source file.
        Parameters:
        sourceBuffer - the source file's raw contents. The buffer will be held by the index instance to facilitate matching, and therefore must not be modified by the caller.
    • Method Detail

      • estimateIndexSize

        public static long estimateIndexSize​(int sourceLength)
        Estimate the size of an index for a given source.

        This is roughly a worst-case estimate. The actual index may be smaller.

        Parameters:
        sourceLength - length of the source, in bytes.
        Returns:
        estimated size. Approximately 1.75 * sourceLength.
      • getSourceSize

        public long getSourceSize()
        Get size of the source buffer this index has scanned.
        Returns:
        size of the source buffer this index has scanned.
      • getIndexSize

        public long getIndexSize()
        Get an estimate of the memory required by this index.
        Returns:
        an approximation of the number of bytes used by this index in memory. The size includes the cached source buffer size from getSourceSize(), as well as a rough approximation of JVM object overheads.
      • sizeOf

        private static long sizeOf​(byte[] b)
      • sizeOf

        private static long sizeOf​(int[] b)
      • sizeOf

        private static long sizeOf​(long[] b)
      • sizeOfArray

        private static int sizeOfArray​(int entSize,
                                       int len)
      • encode

        public void encode​(java.io.OutputStream out,
                           byte[] res)
                    throws java.io.IOException
        Generate a delta sequence to recreate the result buffer.

        There is no limit on the size of the delta sequence created. This is the same as encode(out, res, 0).

        Parameters:
        out - stream to receive the delta instructions that can transform this index's source buffer into res. This stream should be buffered, as instructions are written directly to it in small bursts.
        res - the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.
        Throws:
        java.io.IOException - the output stream refused to write the instructions.
      • encode

        public boolean encode​(java.io.OutputStream out,
                              byte[] res,
                              int deltaSizeLimit)
                       throws java.io.IOException
        Generate a delta sequence to recreate the result buffer.
        Parameters:
        out - stream to receive the delta instructions that can transform this index's source buffer into res. This stream should be buffered, as instructions are written directly to it in small bursts. If the caller might need to discard the instructions (such as when deltaSizeLimit would be exceeded) the caller is responsible for discarding or rewinding the stream when this method returns false.
        res - the desired result buffer. The generated instructions will recreate this buffer when applied to the source buffer stored within this index.
        deltaSizeLimit - maximum number of bytes that the delta instructions can occupy. If the generated instructions would be longer than this amount, this method returns false. If 0, there is no limit on the length of delta created.
        Returns:
        true if the delta is smaller than deltaSizeLimit; false if the encoder aborted because the encoded delta instructions would be longer than deltaSizeLimit bytes.
        Throws:
        java.io.IOException - the output stream refused to write the instructions.
      • newEncoder

        private DeltaEncoder newEncoder​(java.io.OutputStream out,
                                        long resSize,
                                        int limit)
                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • fwdmatch

        private static int fwdmatch​(byte[] res,
                                    int resPtr,
                                    byte[] src,
                                    int srcPtr)
      • negmatch

        private static int negmatch​(byte[] res,
                                    int resPtr,
                                    byte[] src,
                                    int srcPtr,
                                    int limit)
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object
      • hashBlock

        static int hashBlock​(byte[] raw,
                             int ptr)
      • step

        private static int step​(int hash,
                                byte toRemove,
                                byte toAdd)
      • keyOf

        private static int keyOf​(long ent)
      • valOf

        private static int valOf​(long ent)