Package org.apache.uima.cas.impl
Class BinaryCasSerDes4.Serializer
java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes4.Serializer
- Enclosing class:
BinaryCasSerDes4
Class instantiated once per serialization Multiple serializations in parallel supported, with
multiple instances of this
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ByteArrayOutputStream[]
private final CASImpl
private final BinaryCasSerDes
private final DataOutputStream
private final BinaryCasSerDes4.CompressLevel
private final BinaryCasSerDes4.CompressStrat
private final DataOutputStream
private final CommonSerDesSequential
private final boolean
private final DataOutputStream[]
private final DataOutputStream
private final DataOutputStream
private final DataOutputStream
private final DataOutputStream
private final Obj2IntIdentityHashMap
<TOP> convert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the intprivate final DataOutputStream
private int
end of heap, in v2 pseudo-addr coordinates = addr of last + length of lastprivate int
start of heap, in v2 pseudo-addr coordinatesprivate final boolean
private final boolean
private final MarkerImpl
private boolean
private final OptimizeStrings
private TOP
private final TOP[]
For differencing when reading and writing.private final DataOutputStream
private final SerializationMeasures
private final DataOutputStream
private final DataOutputStream
private final DataOutputStream
private final DataOutputStream
private PositiveIntSet
Set of FSes on which UimaSerializable _save_to_cas_data has already been called. -
Constructor Summary
ConstructorsModifierConstructorDescriptionprivate
Serializer
(CASImpl cas, DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi) -
Method Summary
Modifier and TypeMethodDescriptionprivate void
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streamsprivate int
compressFsxPart
(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds) private int
encodeIntSign
(int v) private void
extractStrings
(TOP fs) add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange itemsprivate void
For delta, for each fsChange element, extract any stringsprivate int
private int
private int
private boolean
isNoPrevArrayValue
(CommonArrayFS prevCommonArray) private void
Form 4 serialization is tied to the layout of V2 Feature Structures in heaps.private void
serializeArray
(TOP fs) private int
private void
serializeByKind
(TOP fs, FeatureImpl feat) private void
private void
writeDiff
(int kind, int v, int prev) Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = deltaprivate void
writeDouble
(long raw) private void
writeFloat
(int raw) Need to support NAN sets, 0x7fc....private void
private void
writeLong
(long v, long prev) private void
String encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.private void
Write the compressed string table(s)private void
writeUnsignedByte
(DataOutputStream s, int v) private void
writeVnumber
(int kind, int v) private void
writeVnumber
(int kind, long v) private void
writeVnumber
(DataOutputStream s, int v) private void
writeVnumber
(DataOutputStream s, long v)
-
Field Details
-
serializedOut
-
baseCas
-
bcsd
-
mark
-
sm
-
baosZipSources
-
dosZipSources
-
heapStart
private int heapStartstart of heap, in v2 pseudo-addr coordinates -
heapEnd
private int heapEndend of heap, in v2 pseudo-addr coordinates = addr of last + length of last -
isDelta
private final boolean isDelta -
isTsi
private final boolean isTsi -
doMeasurement
private final boolean doMeasurement -
os
-
compressLevel
-
compressStrategy
-
prevFsByType
For differencing when reading and writing. Also used for arrays to difference the 0th element. -
prevFs
-
only1CommonString
private boolean only1CommonString -
byte_dos
-
typeCode_dos
-
strOffset_dos
-
strLength_dos
-
float_Mantissa_Sign_dos
-
float_Exponent_dos
-
double_Mantissa_Sign_dos
-
double_Exponent_dos
-
fsIndexes_dos
-
control_dos
-
strSeg_dos
-
csds
-
fs2seq
convert between FSs and "sequential" numbers This is for compression efficiency and also is needed for backwards compatibility with v2 serialization forms, where index information was written using "sequential" numbers Note: This may be identity map, but may not in the case for V3 where some FSs are GC'd Contrast with fs2addr and addr2fs in csds - these use the pseudo v2 addresses as the int -
uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.
-
-
Constructor Details
-
Serializer
private Serializer(CASImpl cas, DataOutputStream serializedOut, MarkerImpl mark, SerializationMeasures sm, BinaryCasSerDes4.CompressLevel compressLevel, BinaryCasSerDes4.CompressStrat compressStrategy, boolean isTsi) - Parameters:
cas
- -serializedOut
- -mark
- -sm
- -compressLevel
- -compressStrategy
- -
-
-
Method Details
-
serialize
Form 4 serialization is tied to the layout of V2 Feature Structures in heaps. It does not walk the indexes to serialize just those FSs that are reachable. For V3, it scans the CASImpl.id2fs information and serializes those (except those which have been GC'd). The seq numbers of the target incrementing sequentially will be different from the source id's if some FSs were GC'd. To determine for delta what new strings and new- Throws:
IOException
-
writeStringInfo
Write the compressed string table(s)- Throws:
IOException
-
writeFs
- Throws:
IOException
-
serializeIndexedFeatureStructures
- Throws:
IOException
-
compressFsxPart
private int compressFsxPart(int[] fsIndexes, int fsNdxStart, CommonSerDesSequential csds) throws IOException - Throws:
IOException
-
serializeArray
- Throws:
IOException
-
getPrevArray0HeapRef
private int getPrevArray0HeapRef() -
getPrevArray0Int
private int getPrevArray0Int() -
isNoPrevArrayValue
-
serializeByKind
- Throws:
IOException
-
serializeArrayLength
- Throws:
IOException
-
collectAndZip
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams- Throws:
IOException
- passthru
-
writeLong
- Throws:
IOException
-
writeString
String encoding Length = 0 - used for null, no offset written Length = 1 - used for "", no offset written Length > 0 (subtract 1): used for actual string length Length < 0 - use (-length) as slot index (minimum is 1, slot 0 is NULL) For length > 0, write also the offset.- Throws:
IOException
- passthru
-
writeFloat
Need to support NAN sets, 0x7fc.... for NAN 0xff8.... for NAN, negative infinity 0x7f8 for NAN, positive infinity Because 0 occurs frequently, we reserve exp of 0 for the value 0- Parameters:
raw
- the number to write- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeVnumber
- Throws:
IOException
-
writeUnsignedByte
- Throws:
IOException
-
writeDouble
- Throws:
IOException
-
encodeIntSign
private int encodeIntSign(int v) -
writeDiff
Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta- Parameters:
kind
- the kind of sloti
- runs from iHeap + 3 to end of array- Throws:
IOException
- passthru
-
extractStrings
add strings to the optimizestrings object If delta, only process for fs's that are new; modified string values picked up when scanning FsChange items- Parameters:
fs
- feature structure
-
extractStringsFromModifications
For delta, for each fsChange element, extract any strings- Parameters:
fsChange
-
-
fs2seq
-