Class UnicodeCompressor
The SCSU works by using dynamically positioned windows consisting of 128 consecutive characters in Unicode. During compression, characters within a window are encoded in the compressed stream as the bytes 0x7F - 0xFF. The SCSU provides transparency for the characters (bytes) between U+0000 - U+00FF. The SCSU approximates the storage size of traditional character sets, for example 1 byte per character for ASCII or Latin-1 text, and 2 bytes per character for CJK ideographs.
USAGE
The static methods on UnicodeCompressor may be used in a straightforward manner to compress simple strings:
String s = ... ; // get string from somewhere byte [] compressed = UnicodeCompressor.compress(s);
The static methods have a fairly large memory footprint. For finer-grained control over memory usage, UnicodeCompressor offers more powerful APIs allowing iterative compression:
// Compress an array "chars" of length "len" using a buffer of 512 bytes // to the OutputStream "out" UnicodeCompressor myCompressor = new UnicodeCompressor(); final static int BUFSIZE = 512; byte [] byteBuffer = new byte [ BUFSIZE ]; int bytesWritten = 0; int [] unicharsRead = new int [1]; int totalCharsCompressed = 0; int totalBytesWritten = 0; do { // do the compression bytesWritten = myCompressor.compress(chars, totalCharsCompressed, len, unicharsRead, byteBuffer, 0, BUFSIZE); // do something with the current set of bytes out.write(byteBuffer, 0, bytesWritten); // update the no. of characters compressed totalCharsCompressed += unicharsRead[0]; // update the no. of bytes written totalBytesWritten += bytesWritten; } while(totalCharsCompressed < len); myCompressor.reset(); // reuse compressor
- Author:
- Stephen F. Booth
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int[]
Static compression window offsetsstatic final int[]
For window offset mappingstatic final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
static final int
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic byte[]
compress
(char[] buffer, int start, int limit) Compress a Unicode character array into a byte array.int
compress
(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit) Compress a Unicode character array into a byte array.static byte[]
Compress a string into a byte array.void
reset()
Reset the compressor to its initial state.
-
Field Details
-
COMPRESSIONOFFSET
static final int COMPRESSIONOFFSET- See Also:
-
NUMWINDOWS
static final int NUMWINDOWS- See Also:
-
NUMSTATICWINDOWS
static final int NUMSTATICWINDOWS- See Also:
-
INVALIDWINDOW
static final int INVALIDWINDOW- See Also:
-
INVALIDCHAR
static final int INVALIDCHAR- See Also:
-
SINGLEBYTEMODE
static final int SINGLEBYTEMODE- See Also:
-
UNICODEMODE
static final int UNICODEMODE- See Also:
-
MAXINDEX
static final int MAXINDEX- See Also:
-
RESERVEDINDEX
static final int RESERVEDINDEX- See Also:
-
LATININDEX
static final int LATININDEX- See Also:
-
IPAEXTENSIONINDEX
static final int IPAEXTENSIONINDEX- See Also:
-
GREEKINDEX
static final int GREEKINDEX- See Also:
-
ARMENIANINDEX
static final int ARMENIANINDEX- See Also:
-
HIRAGANAINDEX
static final int HIRAGANAINDEX- See Also:
-
KATAKANAINDEX
static final int KATAKANAINDEX- See Also:
-
HALFWIDTHKATAKANAINDEX
static final int HALFWIDTHKATAKANAINDEX- See Also:
-
SDEFINEX
static final int SDEFINEX- See Also:
-
SRESERVED
static final int SRESERVED- See Also:
-
SQUOTEU
static final int SQUOTEU- See Also:
-
SCHANGEU
static final int SCHANGEU- See Also:
-
SQUOTE0
static final int SQUOTE0- See Also:
-
SQUOTE1
static final int SQUOTE1- See Also:
-
SQUOTE2
static final int SQUOTE2- See Also:
-
SQUOTE3
static final int SQUOTE3- See Also:
-
SQUOTE4
static final int SQUOTE4- See Also:
-
SQUOTE5
static final int SQUOTE5- See Also:
-
SQUOTE6
static final int SQUOTE6- See Also:
-
SQUOTE7
static final int SQUOTE7- See Also:
-
SCHANGE0
static final int SCHANGE0- See Also:
-
SCHANGE1
static final int SCHANGE1- See Also:
-
SCHANGE2
static final int SCHANGE2- See Also:
-
SCHANGE3
static final int SCHANGE3- See Also:
-
SCHANGE4
static final int SCHANGE4- See Also:
-
SCHANGE5
static final int SCHANGE5- See Also:
-
SCHANGE6
static final int SCHANGE6- See Also:
-
SCHANGE7
static final int SCHANGE7- See Also:
-
SDEFINE0
static final int SDEFINE0- See Also:
-
SDEFINE1
static final int SDEFINE1- See Also:
-
SDEFINE2
static final int SDEFINE2- See Also:
-
SDEFINE3
static final int SDEFINE3- See Also:
-
SDEFINE4
static final int SDEFINE4- See Also:
-
SDEFINE5
static final int SDEFINE5- See Also:
-
SDEFINE6
static final int SDEFINE6- See Also:
-
SDEFINE7
static final int SDEFINE7- See Also:
-
UCHANGE0
static final int UCHANGE0- See Also:
-
UCHANGE1
static final int UCHANGE1- See Also:
-
UCHANGE2
static final int UCHANGE2- See Also:
-
UCHANGE3
static final int UCHANGE3- See Also:
-
UCHANGE4
static final int UCHANGE4- See Also:
-
UCHANGE5
static final int UCHANGE5- See Also:
-
UCHANGE6
static final int UCHANGE6- See Also:
-
UCHANGE7
static final int UCHANGE7- See Also:
-
UDEFINE0
static final int UDEFINE0- See Also:
-
UDEFINE1
static final int UDEFINE1- See Also:
-
UDEFINE2
static final int UDEFINE2- See Also:
-
UDEFINE3
static final int UDEFINE3- See Also:
-
UDEFINE4
static final int UDEFINE4- See Also:
-
UDEFINE5
static final int UDEFINE5- See Also:
-
UDEFINE6
static final int UDEFINE6- See Also:
-
UDEFINE7
static final int UDEFINE7- See Also:
-
UQUOTEU
static final int UQUOTEU- See Also:
-
UDEFINEX
static final int UDEFINEX- See Also:
-
URESERVED
static final int URESERVED- See Also:
-
sOffsetTable
static final int[] sOffsetTableFor window offset mapping -
sOffsets
static final int[] sOffsetsStatic compression window offsets
-
-
Constructor Details
-
UnicodeCompressor
public UnicodeCompressor()Create a UnicodeCompressor. Sets all windows to their default values.- See Also:
-
-
Method Details
-
compress
Compress a string into a byte array.- Parameters:
buffer
- The string to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
-
compress
public static byte[] compress(char[] buffer, int start, int limit) Compress a Unicode character array into a byte array.- Parameters:
buffer
- The character buffer to compress.start
- The start of the character run to compress.limit
- The limit of the character run to compress.- Returns:
- A byte array containing the compressed characters.
- See Also:
-
compress
public int compress(char[] charBuffer, int charBufferStart, int charBufferLimit, int[] charsRead, byte[] byteBuffer, int byteBufferStart, int byteBufferLimit) Compress a Unicode character array into a byte array. This function will only consume input that can be completely output.- Parameters:
charBuffer
- The character buffer to compress.charBufferStart
- The start of the character run to compress.charBufferLimit
- The limit of the character run to compress.charsRead
- A one-element array. If not null, on return the number of characters read from charBuffer.byteBuffer
- A buffer to receive the compressed data. This buffer must be at minimum four bytes in size.byteBufferStart
- The starting offset to which to write compressed data.byteBufferLimit
- The limiting offset for writing compressed data.- Returns:
- The number of bytes written to byteBuffer.
-
reset
public void reset()Reset the compressor to its initial state.
-