Package org.apache.uima.cas.impl
Class BinaryCasSerDes6
- java.lang.Object
-
- org.apache.uima.cas.impl.BinaryCasSerDes6
-
- All Implemented Interfaces:
SlotKindsConstants
public class BinaryCasSerDes6 extends java.lang.Object implements SlotKindsConstants
User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS. It has the capability to map type systems, so the sending and receiving type systems do not have to be the same. - types and features are matched by name, and features must have the same range (slot kind) - types and/or features in one type system not in the other are skipped over Header specifies to reader the format, and the compression level. How to Serialize: 1) create an instance of this class a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created after deserializing this CAS initially. b) if serializaing to a target with a different type system, pass the target's type system impl object so the serialization can filter the types for the target. 2) call serialize() to serialize the CAS 3) If doing serialization to a target from which you expect to receive back a delta CAS, create a ReuseInfo object from this object and reuse it for deserializing the delta CAS. TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in serializing or deserializing. These are preserved for future calls, so their setup / initialization is only needed the first time. TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems; these too are preserved and reused on future calls. Compressed Binary CASes are designed to be "self-describing" - The format of the compressed binary CAS, including version info, is inserted at the beginning so that a proper deserialization method can be automatically chosen. Compressed Binary format implemented by this class supports type system mapping. Types in the source which are not in the target (or vice versa) are omitted. Types with "extra" features have their extra features omitted (or on deserialization, they are set to their default value - null, or 0, etc.). Feature slots which hold references to types not in the target type system are replaced with 0 (null). How to Deserialize: 1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must be the originating CAS from which the delta was produced. 2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta, then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx a) Assuming the object being deserialized has a different type system, set the "target" type system to the TypeSystemImpl instance of the object being deserialized. a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized 3) call xxx.deserialize(inputStream) Compression/Decompression Works in two stages: application of Zip/Unzip to particular sub-collections of CAS data, grouped according to similar data distribution collection of like kinds of data (to make the zipping more effective) There can be up to ~20 of these collections, such as control info, float-exponents, string chars Deserialization: Read all bytes, create separate ByteArrayInputStreams for each segment create appropriate unzip data input streams for these Slow but expensive data: extra type system info - lazily created and added to shared TypeSystemImpl object set up per type actually referenced mapper for type system - lazily created and added to shared TypeSystemImpl object in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl. Defaulting: flags: doMeasurements, compressLevel, CompressStrategy Per serialize call: cas, output, [target ts], [mark for delta] Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization CASImpl has instance method with defaulting args for serialization. CASImpl has reinit which works with compressed binary serialization objects if no type mapping If type mapping, (new BinaryCasSerDes6(cas, marker-or-null, targetTypeSystem (for stream being deserialized), reuseInfo-or-null) .deserialize(in-stream) Use Cases, filtering and delta ************************************************************************** * (de)serialize * filter? * delta? * Use case ************************************************************************** * serialize * N * N * Saving a Cas, * * * * sending Cas to service with identical ts ************************************************************************** * serialize * Y * N * sending Cas to service with * * * * different ts (a guaranteed subset) ************************************************************************** * serialize * N * Y * returning Cas to client * * * * uses info saved when deserializing * * * * (?? saving just a delta to disk??) ************************************************************************** * serialize * Y * Y * NOT SUPPORTED (not needed) ************************************************************************** * deserialize * N * N * reading/(receiving) CAS, identical TS ************************************************************************** * deserialize * Y * N * reading/receiving CAS, different TS * * * * ts not guaranteed to be superset * * * * for "reading" case. ************************************************************************** * deserialize * N * Y * receiving CAS, identical TS * * * * uses info saved when serializing ************************************************************************** * deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset) * * * * uses info saved when serializing **************************************************************************
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
BinaryCasSerDes6.CompressLevel
Compression alternativesstatic class
BinaryCasSerDes6.CompressStrat
private class
BinaryCasSerDes6.ReadModifiedFSs
Modified Values Modified heap values need fsStartIndexes conversionstatic class
BinaryCasSerDes6.ReuseInfo
Info reused for 1) multiple serializations of same cas to multiple targets (a speedup), or 2) for delta cas serialization, where it represents the fsStartIndex info before any mods were done which could change that info, or 3) for deserializing with a delta cas, where it represents the fsStartIndex info at the time the CAS was serialized out..private class
BinaryCasSerDes6.SerializeModifiedFSs
Modified Values Output: For each FS that has 1 or more modified values, write the heap addr converted to a seq # of the FS For all modified values within the FS: if it is an aux array element, write the index in the aux array and the new value otherwise, write the slot offset and the new value
-
Field Summary
Fields Modifier and Type Field Description private AllowPreexistingFS
allowPreexistingFS
Things for just deserializationprivate java.io.DataInputStream
arrayLength_dis
private java.io.ByteArrayOutputStream[]
baosZipSources
private BinaryCasSerDes
bcsd
private java.io.DataInputStream
byte_dis
private java.io.DataOutputStream
byte_dos
private CASImpl
cas
Things for both serialization and Deserializationprivate BinaryCasSerDes6.CompressLevel
compressLevel
private BinaryCasSerDes6.CompressStrat
compressStrategy
private java.io.DataInputStream
control_dis
private java.io.DataOutputStream
control_dos
private TOP
currentFs
the FS being deserializedprivate java.io.DataInputStream[]
dataInputs
private boolean
debugEOF
private java.io.DataInputStream
deserIn
private boolean
doMeasurements
private java.io.DataOutputStream[]
dosZipSources
private java.io.DataInputStream
double_Exponent_dis
private java.io.DataOutputStream
double_Exponent_dos
private java.io.DataInputStream
double_Mantissa_Sign_dis
private java.io.DataOutputStream
double_Mantissa_Sign_dos
private static java.lang.String
EMPTY_STRING
private java.util.List<java.lang.Runnable>
fixupsNeeded
the "fixups" for relative heap refs actions set slot valuesprivate java.io.DataInputStream
float_Exponent_dis
private java.io.DataOutputStream
float_Exponent_dos
private java.io.DataInputStream
float_Mantissa_Sign_dis
private java.io.DataOutputStream
float_Mantissa_Sign_dos
private PositiveIntSet
foundFSs
ordered set of FSs found in indexes or linked from other found FSs.private PositiveIntSet
foundFSsBelowMark
ordered set of FSs found in indexes or linked from other found FSs, which are below the mark.private java.io.DataInputStream
fsIndexes_dis
private java.io.DataOutputStream
fsIndexes_dos
private CasSeqAddrMaps
fsStartIndexes
maps from src id <-> tgt id For deserialization: if src type not exist, tgt to src is 0private java.util.List<TOP>
fssToSerialize
FSs being serialized.private java.io.DataInputStream
heapRef_dis
private java.util.zip.Inflater[]
inflaters
private java.io.DataInputStream
int_dis
private boolean
isDelta
private boolean
isReadingDelta
private boolean
isSerializingDelta
private boolean
isTsiIncluded
private boolean
isTsIncluded
private boolean
isTypeMapping
This is the used version of isTypeMapping, normally == to isTypeMappingCmn But compareCASes sets this false temporarily while setting up the compareprivate boolean
isUpdatePrevOK
private int
lastArrayLength
private java.io.DataInputStream
long_High_dis
private java.io.DataInputStream
long_Low_dis
private MarkerImpl
mark
private int
nextFsId
private boolean
only1CommonString
private OptimizeStrings
os
private Int2ObjHashMap<long[],long[]>
prevFsWithLongValues
Hold prev values of "long" slots, by type, for instances of FS which are non-arrays containing slots which have long values, used for differencing - not using the actual FS instance, because during deserialization, these may not be deserialized due to type filtering set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based)private int[][]
prevHeapInstanceWithIntValues
Hold prev instance of FS which have non-array FSRef slots, to allow computing these to match case where a 0 value is used because of type filtering and also to allow for forward references.private java.lang.String[]
readCommonString
private boolean
reuseInfoProvided
private java.io.DataOutputStream
serializedOut
Things for just serializationprivate java.io.DataInputStream
short_dis
private java.util.List<java.lang.Runnable>
singleFsDefer
Deferred actions to set Feature Slots of feature structures.private SerializationMeasures
sm
private java.lang.String
sofaMimeType
private java.lang.String
sofaName
private int
sofaNum
used for deferred creationprivate Sofa
sofaRef
private TypeSystemImpl
srcTs
Things set up for one instance of this classprivate java.io.DataInputStream
strChars_dis
private StringHeap
stringHeapObj
private java.io.DataInputStream
strLength_dis
private java.io.DataOutputStream
strLength_dos
private java.io.DataInputStream
strOffset_dis
private java.io.DataOutputStream
strOffset_dos
private java.io.DataInputStream
strSeg_dis
private java.io.DataOutputStream
strSeg_dos
private TypeSystemImpl
tgtTs
private java.util.List<TOP>
toBeScanned
FSs being processed, including below-the-line deltas.private static boolean
TRACE_DES
private static boolean
TRACE_MOD_DES
private static boolean
TRACE_MOD_SER
private static boolean
TRACE_SER
private static boolean
TRACE_STR_ARRAY
private java.io.DataInputStream
typeCode_dis
private java.io.DataOutputStream
typeCode_dos
private CasTypeSystemMapper
typeMapper
private java.util.List<java.lang.Runnable>
uimaSerializableFixups
private PositiveIntSet
uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.private int
version
-
Fields inherited from interface org.apache.uima.cas.impl.SlotKindsConstants
arrayLength_i, byte_i, CAN_BE_NEGATIVE, control_i, double_Exponent_i, double_Mantissa_Sign_i, float_Exponent_i, float_Mantissa_Sign_i, fsIndexes_i, heapRef_i, IGNORED, IN_MAIN_HEAP, int_i, long_High_i, long_Low_i, NBR_SLOT_KIND_ZIP_STREAMS, short_i, strChars_i, strLength_i, strOffset_i, strSeg_i, typeCode_i
-
-
Constructor Summary
Constructors Modifier Constructor Description BinaryCasSerDes6(AbstractCas cas)
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs)
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI)
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structuresprivate
BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, boolean storeTS, boolean storeTSI, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy)
BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs)
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements)
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurementsBinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy)
Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature StructuresBinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs)
Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures(package private)
BinaryCasSerDes6(BinaryCasSerDes6 f6, TypeSystemImpl tgtTs)
only called to set up for deserialization.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
addStringsFromFS(TOP fs)
Add all the strings ref'd by this FS.private void
closeDataInputs()
private void
collectAndZip()
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streamsboolean
compareCASes(CASImpl c1, CASImpl c2)
Compare 2 CASes, with perhaps different type systems.private void
createCurrentFs(TypeImpl type, CASImpl view)
private long
decodeDouble(long mants, int exponent)
private int
decodeIntSign(int v)
void
deserialize(java.io.InputStream istream)
void
deserialize(java.io.InputStream istream, AllowPreexistingFS allowPreexistingFS)
Version used by uima-as to read delta cas from remote parallel stepsvoid
deserializeAfterVersion(java.io.DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS)
private int
encodeIntSign(int v)
private void
enqueueFS(TOP fs)
Add Fs to toBeProcessed and set foundxxx bit - skip this if doesn't exist in target type systemprivate java.io.DataInput
getInputStream(SlotKinds.SlotKind kind)
private int
getPrevIntValue(int typeCode, int featOffset)
For heaprefs this gets the previously serialized int valueprivate long
getPrevLongValue(int typeCode, int featOffset)
private TOP
getRefVal(int tgtSeq)
BinaryCasSerDes6.ReuseInfo
getReuseInfo()
private int
getTgtSeqFromSrcFS(TOP fs)
For Serialization only.(package private) TypeSystemImpl
getTgtTs()
private int[]
initPrevIntValue(TypeImpl ti)
Get and lazily initialize if needed the feature cache values for a type For Serializing, the type belongs to the srcTs For Deserializing, the type belongs to the tgtTsprivate long[]
initPrevLongValue(TypeImpl ti)
Get and lazily initialize if needed the long values for a type For Serializing and Deserializing, the type belongs to the tgtTsprivate void
initSrcTgtIdMapsAndStrings()
Serializing: Called at beginning of serialize, scans whole CAS or just delta CAS If doing delta serialization, fsStartIndexes is passed in, pre-initialized with a copy of the map info below the line.private boolean
isTypeInTgt(TOP fs)
private static java.io.DataOutputStream
makeDataOutputStream(java.lang.Object f)
private void
maybeStoreOrDefer(boolean storeIt, TOP fs, java.util.function.Consumer<TOP> doStore)
private void
maybeStoreOrDefer_slotFixups(int tgtSeq, java.util.function.Consumer<TOP> r)
FS Ref slots fixupsprivate void
processFSsForView(boolean isEnqueue, java.util.stream.Stream<TOP> fss)
processes one view's worth of feature structuresprivate void
processIndexedFeatureStructures(CASImpl cas1, boolean isWrite)
private void
readArray(boolean storeIt, TypeImpl srcType, TypeImpl tgtType)
private int
readArrayLength()
private void
readByKind(TOP fs, FeatureImpl tgtFeat, FeatureImpl srcFeat, boolean storeIt, TypeImpl tgtType)
private int
readDiff(java.io.DataInput in, int prev)
private int
readDiff(SlotKinds.SlotKind kind, int prev)
private int
readDiffIntSlot(boolean storeIt, int featOffset, SlotKinds.SlotKind kind, TypeImpl tgtType)
private long
readDouble()
private int
readFloat()
private void
readFsxPart(IntVector fsIndexes)
Each FS index is sorted, and output is by deltaprivate CommonSerDes.Header
readHeader(java.io.InputStream istream)
HEADERSprivate void
readIndexedFeatureStructures()
process index information to re-index thingsprivate void
readIntoByteArray(byte[] array, int length, boolean storeIt)
private void
readIntoDoubleArray(double[] array, SlotKinds.SlotKind kind, int length, boolean storeIt)
private void
readIntoLongArray(long[] array, SlotKinds.SlotKind kind, int length, boolean storeIt)
private void
readIntoShortArray(short[] array, int length, boolean storeIt)
private long
readLongOrDouble(SlotKinds.SlotKind kind, long prev)
private java.lang.String
readString(boolean storeIt)
private long
readVlong(java.io.DataInput dis)
private long
readVlong(java.io.DataInputStream dis)
private int
readVnumber(java.io.DataInputStream dis)
SerializationMeasures
serialize(java.lang.Object out)
S E R I A L I Z Eprivate void
serializeArray(TOP fs)
private int
serializeArrayLength(CommonArrayFS array)
private void
serializeByKind(TOP fs, FeatureImpl feat)
serialize one feature structure, which is guaranteed not to be null guaranteed to exist in target if there is type mapping Caller iterates over target slots, but the feat arg is for the corresponding src featureprivate void
serializeDiffWithPrevTypeSlot(SlotKinds.SlotKind kind, TOP fs, FeatureImpl feat, int newValue)
private static void
setupOutputStream(int i, int size, java.io.ByteArrayOutputStream[] baosZipSources, java.io.DataOutputStream[] dosZipSources)
private void
setupOutputStreams(java.lang.Object out)
Set up Streams(package private) static void
setupOutputStreams(CASImpl cas, java.io.ByteArrayOutputStream[] baosZipSources, java.io.DataOutputStream[] dosZipSources)
private void
setupReadStream(int slotIndex, int bytesCompr, int bytesOrig)
private void
setupReadStreams()
(package private) static void
skipBytes(java.io.DataInputStream stream, int skipNumber)
private void
skipDouble(int length)
private void
skipLong(int length)
private void
updatePrevArray0IntValue(TypeImpl ti, int newValue)
version called for arrays, captures the 0th valueprivate void
updatePrevIntValue(TypeImpl ti, int featOffset, int newValue)
Called for non-arraysprivate void
updatePrevLongValue(TypeImpl ti, int featOffset, long newValue)
private void
write0(int kind)
private int
writeDiff(int kind, int v, int prev)
Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = deltaprivate void
writeDouble(long raw)
private void
writeFloat(int raw)
private void
writeLong(long v, long prev)
private void
writeString(java.lang.String s)
private void
writeStringInfo()
private void
writeUnsignedByte(java.io.DataOutputStream s, int v)
private void
writeVnumber(int kind, int v)
private void
writeVnumber(int kind, long v)
private void
writeVnumber(java.io.DataOutputStream s, int v)
private void
writeVnumber(java.io.DataOutputStream s, long v)
-
-
-
Field Detail
-
EMPTY_STRING
private static final java.lang.String EMPTY_STRING
- See Also:
- Constant Field Values
-
TRACE_SER
private static final boolean TRACE_SER
- See Also:
- Constant Field Values
-
TRACE_DES
private static final boolean TRACE_DES
- See Also:
- Constant Field Values
-
TRACE_MOD_SER
private static final boolean TRACE_MOD_SER
- See Also:
- Constant Field Values
-
TRACE_MOD_DES
private static final boolean TRACE_MOD_DES
- See Also:
- Constant Field Values
-
TRACE_STR_ARRAY
private static final boolean TRACE_STR_ARRAY
- See Also:
- Constant Field Values
-
srcTs
private TypeSystemImpl srcTs
Things set up for one instance of this class
-
tgtTs
private final TypeSystemImpl tgtTs
-
compressLevel
private final BinaryCasSerDes6.CompressLevel compressLevel
-
compressStrategy
private final BinaryCasSerDes6.CompressStrat compressStrategy
-
cas
private final CASImpl cas
Things for both serialization and Deserialization
-
bcsd
private final BinaryCasSerDes bcsd
-
stringHeapObj
private final StringHeap stringHeapObj
-
nextFsId
private int nextFsId
-
isSerializingDelta
private final boolean isSerializingDelta
-
isDelta
private boolean isDelta
-
isReadingDelta
private boolean isReadingDelta
-
mark
private final MarkerImpl mark
-
fsStartIndexes
private final CasSeqAddrMaps fsStartIndexes
maps from src id <-> tgt id For deserialization: if src type not exist, tgt to src is 0
-
reuseInfoProvided
private final boolean reuseInfoProvided
-
doMeasurements
private final boolean doMeasurements
-
os
private OptimizeStrings os
-
only1CommonString
private boolean only1CommonString
-
isTsIncluded
private boolean isTsIncluded
-
isTsiIncluded
private boolean isTsiIncluded
-
typeMapper
private final CasTypeSystemMapper typeMapper
-
isTypeMapping
private boolean isTypeMapping
This is the used version of isTypeMapping, normally == to isTypeMappingCmn But compareCASes sets this false temporarily while setting up the compare
-
prevHeapInstanceWithIntValues
private final int[][] prevHeapInstanceWithIntValues
Hold prev instance of FS which have non-array FSRef slots, to allow computing these to match case where a 0 value is used because of type filtering and also to allow for forward references. Note: we can't use the actual prev FS, because for type filtering, it may not exist! and even if it exists, it may not be fixed up (forward ref not yet deserialized) for each target typecode, only set if the type has 1 or more non-array fsref set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based) Also used for array refs sometimes, for the 1st entry in the array - feature slot 0 is used for this when reading (not when writing - could be made more uniform)
-
prevFsWithLongValues
private final Int2ObjHashMap<long[],long[]> prevFsWithLongValues
Hold prev values of "long" slots, by type, for instances of FS which are non-arrays containing slots which have long values, used for differencing - not using the actual FS instance, because during deserialization, these may not be deserialized due to type filtering set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based)
-
foundFSs
private PositiveIntSet foundFSs
ordered set of FSs found in indexes or linked from other found FSs. used to control loops/recursion when locating things
-
foundFSsBelowMark
private PositiveIntSet foundFSsBelowMark
ordered set of FSs found in indexes or linked from other found FSs, which are below the mark. used to control loops/recursion when locating things
-
fssToSerialize
private java.util.List<TOP> fssToSerialize
FSs being serialized. For delta, just the deltas above the delta line. Constructed from indexed plus reachable, above the delta line.
-
uimaSerializableSavedToCas
private PositiveIntSet uimaSerializableSavedToCas
Set of FSes on which UimaSerializable _save_to_cas_data has already been called.
-
toBeScanned
private final java.util.List<TOP> toBeScanned
FSs being processed, including below-the-line deltas.
-
debugEOF
private final boolean debugEOF
- See Also:
- Constant Field Values
-
serializedOut
private java.io.DataOutputStream serializedOut
Things for just serialization
-
sm
private final SerializationMeasures sm
-
baosZipSources
private final java.io.ByteArrayOutputStream[] baosZipSources
-
dosZipSources
private final java.io.DataOutputStream[] dosZipSources
-
byte_dos
private java.io.DataOutputStream byte_dos
-
typeCode_dos
private java.io.DataOutputStream typeCode_dos
-
strOffset_dos
private java.io.DataOutputStream strOffset_dos
-
strLength_dos
private java.io.DataOutputStream strLength_dos
-
float_Mantissa_Sign_dos
private java.io.DataOutputStream float_Mantissa_Sign_dos
-
float_Exponent_dos
private java.io.DataOutputStream float_Exponent_dos
-
double_Mantissa_Sign_dos
private java.io.DataOutputStream double_Mantissa_Sign_dos
-
double_Exponent_dos
private java.io.DataOutputStream double_Exponent_dos
-
fsIndexes_dos
private java.io.DataOutputStream fsIndexes_dos
-
control_dos
private java.io.DataOutputStream control_dos
-
strSeg_dos
private java.io.DataOutputStream strSeg_dos
-
allowPreexistingFS
private AllowPreexistingFS allowPreexistingFS
Things for just deserialization
-
deserIn
private java.io.DataInputStream deserIn
-
version
private int version
-
dataInputs
private final java.io.DataInputStream[] dataInputs
-
inflaters
private final java.util.zip.Inflater[] inflaters
-
fixupsNeeded
private final java.util.List<java.lang.Runnable> fixupsNeeded
the "fixups" for relative heap refs actions set slot values
-
uimaSerializableFixups
private final java.util.List<java.lang.Runnable> uimaSerializableFixups
-
singleFsDefer
private final java.util.List<java.lang.Runnable> singleFsDefer
Deferred actions to set Feature Slots of feature structures. the deferrals needed when deserializing a subtype of AnnotationBase before the sofa is known Also for Sofa creation where some fields are final
-
sofaNum
private int sofaNum
used for deferred creation
-
sofaName
private java.lang.String sofaName
-
sofaMimeType
private java.lang.String sofaMimeType
-
sofaRef
private Sofa sofaRef
-
currentFs
private TOP currentFs
the FS being deserialized
-
isUpdatePrevOK
private boolean isUpdatePrevOK
-
readCommonString
private java.lang.String[] readCommonString
-
arrayLength_dis
private java.io.DataInputStream arrayLength_dis
-
heapRef_dis
private java.io.DataInputStream heapRef_dis
-
int_dis
private java.io.DataInputStream int_dis
-
byte_dis
private java.io.DataInputStream byte_dis
-
short_dis
private java.io.DataInputStream short_dis
-
typeCode_dis
private java.io.DataInputStream typeCode_dis
-
strOffset_dis
private java.io.DataInputStream strOffset_dis
-
strLength_dis
private java.io.DataInputStream strLength_dis
-
long_High_dis
private java.io.DataInputStream long_High_dis
-
long_Low_dis
private java.io.DataInputStream long_Low_dis
-
float_Mantissa_Sign_dis
private java.io.DataInputStream float_Mantissa_Sign_dis
-
float_Exponent_dis
private java.io.DataInputStream float_Exponent_dis
-
double_Mantissa_Sign_dis
private java.io.DataInputStream double_Mantissa_Sign_dis
-
double_Exponent_dis
private java.io.DataInputStream double_Exponent_dis
-
fsIndexes_dis
private java.io.DataInputStream fsIndexes_dis
-
strChars_dis
private java.io.DataInputStream strChars_dis
-
control_dis
private java.io.DataInputStream control_dis
-
strSeg_dis
private java.io.DataInputStream strSeg_dis
-
lastArrayLength
private int lastArrayLength
-
-
Constructor Detail
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException
Setup to serialize or deserialize using binary compression, with (optional) type mapping and only processing reachable Feature Structures- Parameters:
aCas
- required - refs the CAS being serialized or deserialized intomark
- if not null is the serialization mark for delta serialization. Unused for deserialization.tgtTs
- if not null is the target type system. - For serialization - this is a subset of the CASs TS - for deserialization, is the type system of the serialized data being read.rfs
- For delta serialization - must be not null, and the saved value after deserializing the original before any modifications / additions made. For normal serialization - can be null, but if not, is used in place of re-calculating, for speed up For delta deserialization - must not be null, and is the saved value after serializing to the service For normal deserialization - must be nulldoMeasurements
- if true, measurements are done (on serialization)compressLevel
- if not null, specifies enum instance for compress levelcompressStrategy
- if not null, specifies enum instance for compress strategy- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
private BinaryCasSerDes6(AbstractCas aCas, MarkerImpl mark, TypeSystemImpl tgtTs, boolean storeTS, boolean storeTSI, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements, BinaryCasSerDes6.CompressLevel compressLevel, BinaryCasSerDes6.CompressStrat compressStrategy) throws ResourceInitializationException
- Throws:
ResourceInitializationException
-
BinaryCasSerDes6
BinaryCasSerDes6(BinaryCasSerDes6 f6, TypeSystemImpl tgtTs) throws ResourceInitializationException
only called to set up for deserialization. clones existing f6, but changes the tgtTs (used to decode)- Parameters:
f6
- -tgtTs
- used for decoding- Throws:
ResourceInitializationException
- -
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (not delta) using binary compression, no type mapping but only processing reachable Feature Structures- Parameters:
cas
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, TypeSystemImpl tgtTs) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (not delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -tgtTs
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures- Parameters:
cas
- -mark
- -tgtTs
- - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - required for both delta serialization and delta deserialization- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, MarkerImpl mark, TypeSystemImpl tgtTs, BinaryCasSerDes6.ReuseInfo rfs, boolean doMeasurements) throws ResourceInitializationException
Setup to serialize (maybe delta) or deserialize (maybe delta) using binary compression, with type mapping and only processing reachable Feature Structures, output measurements- Parameters:
cas
- -mark
- -tgtTs
- - - for deserialization, is the type system of the serialized data being read.rfs
- Reused Feature Structure information - speed up on serialization, required on delta deserializationdoMeasurements
- -- Throws:
ResourceInitializationException
- if the target type system is incompatible with the source type system
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -- Throws:
ResourceInitializationException
- never thrown
-
BinaryCasSerDes6
public BinaryCasSerDes6(AbstractCas cas, BinaryCasSerDes6.ReuseInfo rfs, boolean storeTS, boolean storeTSI) throws ResourceInitializationException
Setup to serialize (not delta) or deserialize (maybe delta) using binary compression, no type mapping, optionally storing TSI, and only processing reachable Feature Structures- Parameters:
cas
- -rfs
- -storeTS
- -storeTSI
- -- Throws:
ResourceInitializationException
- never thrown
-
-
Method Detail
-
getReuseInfo
public BinaryCasSerDes6.ReuseInfo getReuseInfo()
-
serialize
public SerializationMeasures serialize(java.lang.Object out) throws java.io.IOException
S E R I A L I Z E- Parameters:
out
- -- Returns:
- null or serialization measurements (depending on setting of doMeasurements)
- Throws:
java.io.IOException
- passthru
-
serializeArray
private void serializeArray(TOP fs) throws java.io.IOException
- Throws:
java.io.IOException
-
serializeByKind
private void serializeByKind(TOP fs, FeatureImpl feat) throws java.io.IOException
serialize one feature structure, which is guaranteed not to be null guaranteed to exist in target if there is type mapping Caller iterates over target slots, but the feat arg is for the corresponding src feature- Parameters:
fs
- the FS whose slot "feat" is to be serializefeat
- the corresponding source feature slot to serialize- Throws:
java.io.IOException
-
serializeArrayLength
private int serializeArrayLength(CommonArrayFS array) throws java.io.IOException
- Throws:
java.io.IOException
-
serializeDiffWithPrevTypeSlot
private void serializeDiffWithPrevTypeSlot(SlotKinds.SlotKind kind, TOP fs, FeatureImpl feat, int newValue) throws java.io.IOException
- Throws:
java.io.IOException
-
updatePrevIntValue
private void updatePrevIntValue(TypeImpl ti, int featOffset, int newValue)
Called for non-arrays- Parameters:
fs
- used to get the typefeatOffset
- offset to the slotnewValue
- for heap refs, is the converted-from-addr-to-seq-number value
-
updatePrevLongValue
private void updatePrevLongValue(TypeImpl ti, int featOffset, long newValue)
-
updatePrevArray0IntValue
private void updatePrevArray0IntValue(TypeImpl ti, int newValue)
version called for arrays, captures the 0th value- Parameters:
ti
-newValue
-
-
initPrevIntValue
private int[] initPrevIntValue(TypeImpl ti)
Get and lazily initialize if needed the feature cache values for a type For Serializing, the type belongs to the srcTs For Deserializing, the type belongs to the tgtTs- Parameters:
ti
- the type- Returns:
- the int feature cache
-
initPrevLongValue
private long[] initPrevLongValue(TypeImpl ti)
Get and lazily initialize if needed the long values for a type For Serializing and Deserializing, the type belongs to the tgtTs- Parameters:
ti
- the type- Returns:
- the int feature cache
-
getPrevIntValue
private int getPrevIntValue(int typeCode, int featOffset)
For heaprefs this gets the previously serialized int value- Parameters:
typeCode
- the type codefeatOffset
- true offset, 1 = first feature...- Returns:
- the previous int value for use in difference calculations
-
getPrevLongValue
private long getPrevLongValue(int typeCode, int featOffset)
-
collectAndZip
private void collectAndZip() throws java.io.IOException
Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams- Throws:
java.io.IOException
- passthru
-
writeLong
private void writeLong(long v, long prev) throws java.io.IOException
- Throws:
java.io.IOException
-
writeString
private void writeString(java.lang.String s) throws java.io.IOException
- Throws:
java.io.IOException
-
writeFloat
private void writeFloat(int raw) throws java.io.IOException
- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(int kind, int v) throws java.io.IOException
- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(int kind, long v) throws java.io.IOException
- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(java.io.DataOutputStream s, int v) throws java.io.IOException
- Throws:
java.io.IOException
-
writeVnumber
private void writeVnumber(java.io.DataOutputStream s, long v) throws java.io.IOException
- Throws:
java.io.IOException
-
writeUnsignedByte
private void writeUnsignedByte(java.io.DataOutputStream s, int v) throws java.io.IOException
- Throws:
java.io.IOException
-
writeDouble
private void writeDouble(long raw) throws java.io.IOException
- Throws:
java.io.IOException
-
encodeIntSign
private int encodeIntSign(int v)
-
writeDiff
private int writeDiff(int kind, int v, int prev) throws java.io.IOException
Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta- Parameters:
kind
- selects the stream to write tov
- runs from iHeap + 3 to end of arrayprev
- for difference encoding sets isUpdatePrevOK true if ok to update prev, false if writing 0 for any reason, or max neg nbr- Throws:
java.io.IOException
- passthru
-
write0
private void write0(int kind) throws java.io.IOException
- Throws:
java.io.IOException
-
deserialize
public void deserialize(java.io.InputStream istream) throws java.io.IOException
- Parameters:
istream
- -- Throws:
java.io.IOException
- -
-
deserialize
public void deserialize(java.io.InputStream istream, AllowPreexistingFS allowPreexistingFS) throws java.io.IOException
Version used by uima-as to read delta cas from remote parallel steps- Parameters:
istream
- input streamallowPreexistingFS
- what to do if item already exists below the mark- Throws:
java.io.IOException
- passthru
-
deserializeAfterVersion
public void deserializeAfterVersion(java.io.DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS) throws java.io.IOException
- Throws:
java.io.IOException
-
readArray
private void readArray(boolean storeIt, TypeImpl srcType, TypeImpl tgtType) throws java.io.IOException
- Parameters:
storeIt
-srcType
- may be null if there's no source type for target when deserializingtgtType
- the type being deserialized- Throws:
java.io.IOException
-
getRefVal
private TOP getRefVal(int tgtSeq)
-
readArrayLength
private int readArrayLength() throws java.io.IOException
- Throws:
java.io.IOException
-
readByKind
private void readByKind(TOP fs, FeatureImpl tgtFeat, FeatureImpl srcFeat, boolean storeIt, TypeImpl tgtType) throws java.io.IOException
- Parameters:
The
- feature structure to set feature value in, but may be null if it was deferred, - happens for Sofas and subtypes of AnnotationBase because those have "final" values For Sofa: these are the sofaid (String) and sofanum (int) For AnnotationBase : this is the sofaRef (and the view).tgtFeat
- the Feature being readsrcFeat
- the Feature being set (may be null if the feature doesn't exist)storeIt
- false causes storing of values to be skipped- Throws:
java.io.IOException
- passthru
-
maybeStoreOrDefer
private void maybeStoreOrDefer(boolean storeIt, TOP fs, java.util.function.Consumer<TOP> doStore)
-
maybeStoreOrDefer_slotFixups
private void maybeStoreOrDefer_slotFixups(int tgtSeq, java.util.function.Consumer<TOP> r)
FS Ref slots fixups- Parameters:
tgtSeq
- the int value of the target seq numberr
- is sofa-or-lfs.setFeatureValue-or-setLocalSofaData(TOP ref-d-fs)
-
readIndexedFeatureStructures
private void readIndexedFeatureStructures() throws java.io.IOException
process index information to re-index things- Throws:
java.io.IOException
-
readFsxPart
private void readFsxPart(IntVector fsIndexes) throws java.io.IOException
Each FS index is sorted, and output is by delta- Throws:
java.io.IOException
-
getInputStream
private java.io.DataInput getInputStream(SlotKinds.SlotKind kind)
-
readVnumber
private int readVnumber(java.io.DataInputStream dis) throws java.io.IOException
- Throws:
java.io.IOException
-
readVlong
private long readVlong(java.io.DataInputStream dis) throws java.io.IOException
- Throws:
java.io.IOException
-
readIntoByteArray
private void readIntoByteArray(byte[] array, int length, boolean storeIt) throws java.io.IOException
- Throws:
java.io.IOException
-
readIntoShortArray
private void readIntoShortArray(short[] array, int length, boolean storeIt) throws java.io.IOException
- Throws:
java.io.IOException
-
readIntoLongArray
private void readIntoLongArray(long[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws java.io.IOException
- Throws:
java.io.IOException
-
readIntoDoubleArray
private void readIntoDoubleArray(double[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws java.io.IOException
- Throws:
java.io.IOException
-
readDiff
private int readDiff(SlotKinds.SlotKind kind, int prev) throws java.io.IOException
- Throws:
java.io.IOException
-
readDiffIntSlot
private int readDiffIntSlot(boolean storeIt, int featOffset, SlotKinds.SlotKind kind, TypeImpl tgtType) throws java.io.IOException
- Throws:
java.io.IOException
-
readDiff
private int readDiff(java.io.DataInput in, int prev) throws java.io.IOException
- Throws:
java.io.IOException
-
readLongOrDouble
private long readLongOrDouble(SlotKinds.SlotKind kind, long prev) throws java.io.IOException
- Throws:
java.io.IOException
-
skipLong
private void skipLong(int length) throws java.io.IOException
- Throws:
java.io.IOException
-
skipDouble
private void skipDouble(int length) throws java.io.IOException
- Throws:
java.io.IOException
-
readFloat
private int readFloat() throws java.io.IOException
- Throws:
java.io.IOException
-
decodeIntSign
private int decodeIntSign(int v)
-
readDouble
private long readDouble() throws java.io.IOException
- Throws:
java.io.IOException
-
decodeDouble
private long decodeDouble(long mants, int exponent)
-
readVlong
private long readVlong(java.io.DataInput dis) throws java.io.IOException
- Throws:
java.io.IOException
-
readString
private java.lang.String readString(boolean storeIt) throws java.io.IOException
- Parameters:
storeIt
- true to store value, false to skip it- Returns:
- the string
- Throws:
java.io.IOException
-
skipBytes
static void skipBytes(java.io.DataInputStream stream, int skipNumber) throws java.io.IOException
- Throws:
java.io.IOException
-
processIndexedFeatureStructures
private void processIndexedFeatureStructures(CASImpl cas1, boolean isWrite) throws java.io.IOException
- Throws:
java.io.IOException
-
processFSsForView
private void processFSsForView(boolean isEnqueue, java.util.stream.Stream<TOP> fss)
processes one view's worth of feature structures- Parameters:
fsIndexes
-fsNdxStart
-isDoingEnqueue
-isWrite
-- Throws:
java.io.IOException
-
enqueueFS
private void enqueueFS(TOP fs)
Add Fs to toBeProcessed and set foundxxx bit - skip this if doesn't exist in target type system- Parameters:
fs
-
-
isTypeInTgt
private boolean isTypeInTgt(TOP fs)
-
initSrcTgtIdMapsAndStrings
private void initSrcTgtIdMapsAndStrings()
Serializing: Called at beginning of serialize, scans whole CAS or just delta CAS If doing delta serialization, fsStartIndexes is passed in, pre-initialized with a copy of the map info below the line.
-
addStringsFromFS
private void addStringsFromFS(TOP fs)
Add all the strings ref'd by this FS. - if it is a string array, do all the array items - else scan the features and do all string-valued features, in feature offset order For delta, this isn't done here - another routine driven by FsChange info does this.
-
compareCASes
public boolean compareCASes(CASImpl c1, CASImpl c2)
Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs- Parameters:
c1
- CAS to comparec2
- CAS to compare- Returns:
- true if equal (for types / features in both)
-
makeDataOutputStream
private static java.io.DataOutputStream makeDataOutputStream(java.lang.Object f) throws java.io.FileNotFoundException
- Parameters:
f
- can be a DataOutputStream, an OutputStream a File- Returns:
- a data output stream
- Throws:
java.io.FileNotFoundException
- passthru
-
setupOutputStreams
private void setupOutputStreams(java.lang.Object out) throws java.io.FileNotFoundException
Set up Streams- Throws:
java.io.FileNotFoundException
- passthru
-
setupOutputStreams
static void setupOutputStreams(CASImpl cas, java.io.ByteArrayOutputStream[] baosZipSources, java.io.DataOutputStream[] dosZipSources)
-
setupOutputStream
private static void setupOutputStream(int i, int size, java.io.ByteArrayOutputStream[] baosZipSources, java.io.DataOutputStream[] dosZipSources)
-
setupReadStreams
private void setupReadStreams() throws java.io.IOException
- Throws:
java.io.IOException
-
setupReadStream
private void setupReadStream(int slotIndex, int bytesCompr, int bytesOrig) throws java.io.IOException
- Throws:
java.io.IOException
-
closeDataInputs
private void closeDataInputs()
-
readHeader
private CommonSerDes.Header readHeader(java.io.InputStream istream) throws java.io.IOException
HEADERS- Throws:
java.io.IOException
- passthru
-
writeStringInfo
private void writeStringInfo() throws java.io.IOException
- Throws:
java.io.IOException
-
getTgtSeqFromSrcFS
private int getTgtSeqFromSrcFS(TOP fs)
For Serialization only. Map src FS to tgt seq number: fs == null -> 0 type not in target -> 0 map src fs._id to tgt seq- Parameters:
fs
-- Returns:
- 0 or the mapped src id
-
getTgtTs
TypeSystemImpl getTgtTs()
-
-