Class S2PointVectorCoder
- java.lang.Object
-
- com.google.common.geometry.S2PointVectorCoder
-
@GwtCompatible class S2PointVectorCoder extends java.lang.Object implements S2Coder<java.util.List<S2Point>>
An encoder/decoder ofList
s.Values from the
List
returned bydecode(Bytes, Cursor)
are decoded only when they are accessed. This allows for very fast initialization and no additional memory use beyond the encoded data. The encoded data is not owned by theList
; typically it points into a large contiguous buffer that contains other encoded data as well.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static class
S2PointVectorCoder.Base
private static class
S2PointVectorCoder.ByteArrayOutput
A thin wrapper overByteArrayOutputStream
which allows the last written byte to be removed.private static class
S2PointVectorCoder.CellPoint
Represents a point that can be encoded as anS2CellId
center.private static class
S2PointVectorCoder.Format
Controls whether to optimize for speed or size when encoding points.private static class
S2PointVectorCoder.MutableBlockCode
Represents the encoding parameters to be used for a given block (consisting ofBLOCK_SIZE
encodable 64-bit values).
-
Field Summary
Fields Modifier and Type Field Description private static int
BLOCK_SHIFT
The left shift factor forBLOCK_SIZE
.private static int
BLOCK_SIZE
S2CellId
s are represented in a special 64-bit format and are encoded in fixed-size blocks.(package private) static S2PointVectorCoder
COMPACT
An instance of aS2PointVectorCoder
which encodes/decodesS2Point
s in the COMPACT encoding format.private static int
ENCODING_FORMAT_BITS
private static byte
ENCODING_FORMAT_MASK
private static long
EXCEPTION
The exception value in the COMPACT encoding format.(package private) static S2PointVectorCoder
FAST
An instance of aS2PointVectorCoder
which encodes/decodesS2Point
s in the FAST encoding format.private static int
FORMAT_COMPACT
The value of the COMPACT encoding format.private static int
FORMAT_FAST
The value of the FAST encoding format.private static int
SIZEOF_S2POINT
The size of an encodedS2Point
in bytes (3 doubles * 8 bytes per double).private S2PointVectorCoder.Format
type
-
Constructor Summary
Constructors Modifier Constructor Description private
S2PointVectorCoder(S2PointVectorCoder.Format type)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static int
baseShift(int level, int baseBits)
Returns the number of bits thatbase
should be right-shifted in order to encode only its leadingbaseBits
bits, assuming that all points are encoded at the givenS2CellId
level.private static long
bitMask(int n)
Returns a bit mask withn
low-order 1 bits, for0 <= n <= 64
.private static boolean
canEncode(long dMin, long dMax, int deltaBits, int overlapBits, boolean haveExceptions)
Returns true if the range of values[dMin, dMax]
can be encoded using the specified parameters (deltaBits
,overlapBits
, andhaveExceptions
).private static S2PointVectorCoder.Base
chooseBase(com.google.common.primitives.ImmutableLongArray values, int level, boolean haveExceptions)
Returns the global minimum valueS2PointVectorCoder.Base.base
and the number of bits that should be used to encode it (S2PointVectorCoder.Base.baseBits
).private static int
chooseBestLevel(java.util.List<S2Point> points, java.util.List<S2PointVectorCoder.CellPoint> cellPoints)
Returns theS2CellId
level for which the greatest number of the given points can be represented as the center of anS2CellId
, or -1 if there is no S2CellId that would result in significant space savings.private static com.google.common.primitives.ImmutableLongArray
convertCellsToValues(java.util.List<S2PointVectorCoder.CellPoint> cellPoints, int level)
Given a vector of points inS2PointVectorCoder.CellPoint
format and anS2CellId
level that has been chosen for encoding, returns a vector of 64-bit values that should be encoded in order to represent these points.java.util.List<S2Point>
decode(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
private static java.util.List<S2Point>
decodeCompact(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
private static java.util.List<S2Point>
decodeFast(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
void
encode(java.util.List<S2Point> values, java.io.OutputStream output)
Encodesvalue
tooutput
.private static void
encodeCompact(java.util.List<S2Point> values, java.io.OutputStream output)
Encodes a vector ofS2Point
s, optimizing for space.private static void
encodeFast(java.util.List<S2Point> values, java.io.OutputStream output)
private static void
getBlockCode(S2PointVectorCoder.MutableBlockCode code, com.google.common.primitives.ImmutableLongArray values, long base, boolean haveExceptions)
Given a vector of 64-bit values to be encoded and anS2CellId
level, returns the optimal encoding parameters that should be used to encode each block.private static int
maxBitsForLevel(int level)
Returns the maximum number of bits per value at the givenS2CellId
level.
-
-
-
Field Detail
-
FAST
static final S2PointVectorCoder FAST
An instance of aS2PointVectorCoder
which encodes/decodesS2Point
s in the FAST encoding format. The FAST format is optimized for fast encoding/decoding.
-
COMPACT
static final S2PointVectorCoder COMPACT
An instance of aS2PointVectorCoder
which encodes/decodesS2Point
s in the COMPACT encoding format. The COMPACT format is optimized for disk usage and memory footprint.
-
type
private final S2PointVectorCoder.Format type
-
FORMAT_FAST
private static final int FORMAT_FAST
The value of the FAST encoding format.- See Also:
- Constant Field Values
-
FORMAT_COMPACT
private static final int FORMAT_COMPACT
The value of the COMPACT encoding format.- See Also:
- Constant Field Values
-
ENCODING_FORMAT_BITS
private static final int ENCODING_FORMAT_BITS
- See Also:
- Constant Field Values
-
ENCODING_FORMAT_MASK
private static final byte ENCODING_FORMAT_MASK
- See Also:
- Constant Field Values
-
SIZEOF_S2POINT
private static final int SIZEOF_S2POINT
The size of an encodedS2Point
in bytes (3 doubles * 8 bytes per double).- See Also:
- Constant Field Values
-
BLOCK_SHIFT
private static final int BLOCK_SHIFT
The left shift factor forBLOCK_SIZE
.- See Also:
- Constant Field Values
-
BLOCK_SIZE
private static final int BLOCK_SIZE
S2CellId
s are represented in a special 64-bit format and are encoded in fixed-size blocks.BLOCK_SIZE
represents the number of values per block. Block sizes of 4, 8, 16, and 32 were tested, andBLOCK_SIZE
== 16 seems to offer the best compression. (Note thatBLOCK_SIZE == 32
requires some code modifications which have since been removed).- See Also:
- Constant Field Values
-
EXCEPTION
private static final long EXCEPTION
The exception value in the COMPACT encoding format.
-
-
Constructor Detail
-
S2PointVectorCoder
private S2PointVectorCoder(S2PointVectorCoder.Format type)
-
-
Method Detail
-
encode
public void encode(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
Description copied from interface:S2Coder
Encodesvalue
tooutput
.
-
decode
public java.util.List<S2Point> decode(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
Description copied from interface:S2Coder
Decodes a value of typeS2Coder
fromdata
starting atcursor.position
.cursor.position
is updated to the position of the first byte indata
following the encoded value.
-
encodeFast
private static void encodeFast(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
- Throws:
java.io.IOException
-
decodeFast
private static java.util.List<S2Point> decodeFast(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
-
encodeCompact
private static void encodeCompact(java.util.List<S2Point> values, java.io.OutputStream output) throws java.io.IOException
Encodes a vector ofS2Point
s, optimizing for space.OVERVIEW
We attempt to represent each S2Point as the center of an S2CellId. All S2CellIds must be at the same level. Any points that cannot be encoded exactly as S2CellId centers are stored as exceptions using 24 bytes each. If there are so many exceptions that the COMPACT encoding does not save significant space, we give up and use the uncompressed encoding.
The first step is to choose the best S2CellId level. This requires converting each point to (face, si, ti) coordinates and checking whether the point can be represented exactly as an S2CellId center at some level. We then build a histogram of S2CellId levels (just like the similar code in S2Polygon.encode) and choose the best level (or give up, if there are not enough S2CellId-encodable points).
The simplest approach would then be to take all the S2CellIds and right-shift them to remove all the constant bits at the chosen level. This would give the best spatial locality and hence the smallest deltas. However instead we give up some spatial locality and use the similar but faster transformation described below.
Each encodable point is first converted to the (sj, tj) representation defined below:
These two values encode the (face, si, ti) tuple using (2 * level + 3) bits. To see this, recall that "si" and "ti" are 31-bit values that all share a common suffix consisting of a "1" bit followed by (30 - level) "0" bits. The code above right-shifts these values to remove the constant bits and then prepends the bits for "face", yielding a total of (level + 2) bits for "sj" and (level + 1) bits for "tj".sj = (((face & 3) << 30) | (si >> 1)) >> (30 - level); tj = (((face & 4) << 29) | ti) >> (31 - level);
We then combine (sj, tj) into one 64-bit value by interleaving bit pairs:
(We could also interleave individual bits, but it is faster this way). The result is similar to right-shifting an S2CellId by (61 - 2 * level), except that it is faster to decode and the spatial locality is not quite as good.v = interleaveBitPairs(sj, tj);
The 64-bit values are divided into blocks of size 8, and then each value is encoded as the sum of a base value, a per-block offset, and a per-value delta within that block:
v[i,j] = base + offset[i] + delta[i, j]
where "i" represents a block and "j" represents an entry in that block.
The deltas for each block are encoded using a fixed number of 4-bit nibbles (1-16 nibbles per delta). This allows any delta to be accessed in constant time.
The "offset" for each block is a 64-bit value encoded in 0-8 bytes. The offset is left-shifted such that it overlaps the deltas by a configurable number of bits (either 0 or 4), called the "overlap". The overlap and offset length (0-8 bytes) are specified per block. The reason for the overlap is that it allows fewer delta bits to be used in some cases. For example if base == 0 and the range within a block is 0xf0 to 0x110, then rather than using 12-bits deltas with an offset of 0, the overlap lets us use 8-bits deltas with an offset of 0xf0 (saving 7 bytes per block).
The global minimum value "base" is encoded using 0-7 bytes starting with the most-significant non-zero bit possible for the chosen level. For example, if (level == 7) then the encoded values have at most 17 bits, so if "base" is encoded in 1 byte then it is shifted to occupy bits 9-16.
Example: at level == 15, there are at most 33 non-zero value bits. The following shows the bit positions covered by "base", "offset", and "delta" assuming that "base" and "offset" are encoded in 2 bytes each, deltas are encoded in 2 nibbles (1 byte) each, and "overlap" is 4 bits:
Base: 1111111100000000----------------- Offset: -------------1111111100000000---- Delta: -------------------------00000000 Overlap: ^^^^
The numbers (0 or 1) in this diagram denote the byte number of the encoded value. Notice that "base" is shifted so that it starts at the leftmost possible bit, "delta" always starts at the rightmost possible bit (bit 0), and "offset" is shifted so that it overlaps "delta" by the chosen "overlap" (either 0 or 4 bits). Also note that all of these values are summed, and therefore each value can affect higher-order bits due to carries.
NOTE(user): Encoding deltas in 4-bit rather than 8-bit length increments reduces encoded sizes by about 7%. Allowing a 4-bit overlap between the offset and deltas reduces encoded sizes by about 1%. Both optimizations make the code more complex but don't affect running times significantly.
ENCODING DETAILS
Now we can move on to the actual encodings. First, there is a 2 byte header encoded as follows:
Byte 0, bits 0-2: encodingFormat (COMPACT) Byte 0, bit 3: haveExceptions Byte 0, bits 4-7: (lastBlockSize - 1) Byte 1, bits 0-2: baseBytes Byte 1, bits 3-7: level (0-30)
This is followed by an EncodedStringVector containing the encoded blocks. Each block contains BLOCK_SIZE (8) values. The total size of the EncodedS2PointVector is not stored explicitly, but instead is calculated as
num_values == BLOCK_SIZE * (numBlocks - 1) + lastBlockSize
(An empty vector has numBlocks == 0 and lastBlockSize == BLOCK_SIZE.)
Each block starts with a 1 byte header containing the following:
Byte 0, bits 0-2: (offsetBytes - overlapNibbles) Byte 0, bit 3: overlapNibbles Byte 0, bits 4-7: (deltaNibbles - 1)
"overlapNibbles" is either 0 or 1 (indicating an overlap of 0 or 4 bits), while "offsetBytes" is in the range 0-8 (indicating the number of bytes used to encode the offset for this block). Note that some combinations cannot be encoded: in particular, offsetBytes == 0 can only be encoded with an overlap of 0 bits, and offsetBytes == 8 can only be encoded with an overlap of 4 bits. This allows us to encode offset lengths of 0-8 rather than just 0-7 without using an extra bit. (Note that the combinations that can't be encoded are not useful anyway).
The header is followed by "offsetBytes" bytes for the offset, and then (4 * deltaNibbles) bytes for the deltas.
If there are any points that could not be represented as S2CellIds, then "haveExceptions" in the header is true. In that case the delta values within each block are encoded as (delta + 8), and values 0-7 are used to represent exceptions. If a block has exceptions, they are encoded immediately following the array of deltas, and are referenced by encoding the corresponding exception index (0-7) as the delta.
TODO(user): A vector containing a single leaf cell is currently encoded as 13 bytes (2 byte header, 7 byte base, 1 byte block count, 1 byte block length, 1 byte block header, 1 byte delta). However if this case occurs often, a better solution would be implement a separate format that encodes the leading k bytes of an S2CellId. It would have a one-byte header consisting of the encoding format (3 bits) and the number of bytes encoded (3 bits), followed by the S2CellId bytes. The extra 2 header bits could be used to store single points using other encodings, e.g. E7.
If we wind up using 8-value blocks, we could also use the extra bit in the first byte of the header to indicate that there is only one value, and then skip the 2nd byte of header and the EncodedStringVector. But this would be messy because it also requires special cases while decoding. Essentially this would be a sub-format within the COMPACT format.
- Throws:
java.io.IOException
-
decodeCompact
private static java.util.List<S2Point> decodeCompact(PrimitiveArrays.Bytes data, PrimitiveArrays.Cursor cursor)
-
bitMask
private static long bitMask(int n)
Returns a bit mask withn
low-order 1 bits, for0 <= n <= 64
.
-
maxBitsForLevel
private static int maxBitsForLevel(int level)
Returns the maximum number of bits per value at the givenS2CellId
level.
-
baseShift
private static int baseShift(int level, int baseBits)
Returns the number of bits thatbase
should be right-shifted in order to encode only its leadingbaseBits
bits, assuming that all points are encoded at the givenS2CellId
level.
-
chooseBestLevel
private static int chooseBestLevel(java.util.List<S2Point> points, java.util.List<S2PointVectorCoder.CellPoint> cellPoints)
Returns theS2CellId
level for which the greatest number of the given points can be represented as the center of anS2CellId
, or -1 if there is no S2CellId that would result in significant space savings.Adds the
S2CellId
representation of each point (if any) tocellPoints
.
-
convertCellsToValues
private static com.google.common.primitives.ImmutableLongArray convertCellsToValues(java.util.List<S2PointVectorCoder.CellPoint> cellPoints, int level)
Given a vector of points inS2PointVectorCoder.CellPoint
format and anS2CellId
level that has been chosen for encoding, returns a vector of 64-bit values that should be encoded in order to represent these points. Points that cannot be represented losslessly as the center of anS2CellId
at the chosen level are indicated by the valueEXCEPTION
.
-
chooseBase
private static S2PointVectorCoder.Base chooseBase(com.google.common.primitives.ImmutableLongArray values, int level, boolean haveExceptions)
Returns the global minimum valueS2PointVectorCoder.Base.base
and the number of bits that should be used to encode it (S2PointVectorCoder.Base.baseBits
).
-
canEncode
private static boolean canEncode(long dMin, long dMax, int deltaBits, int overlapBits, boolean haveExceptions)
Returns true if the range of values[dMin, dMax]
can be encoded using the specified parameters (deltaBits
,overlapBits
, andhaveExceptions
).
-
getBlockCode
private static void getBlockCode(S2PointVectorCoder.MutableBlockCode code, com.google.common.primitives.ImmutableLongArray values, long base, boolean haveExceptions)
Given a vector of 64-bit values to be encoded and anS2CellId
level, returns the optimal encoding parameters that should be used to encode each block.
-
-