Class BinaryCasSerDes6

java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes6
All Implemented Interfaces:
SlotKindsConstants

public class BinaryCasSerDes6 extends Object implements SlotKindsConstants
User callable serialization and deserialization of the CAS in a compressed Binary Format This serializes/deserializes the state of the CAS. It has the capability to map type systems, so the sending and receiving type systems do not have to be the same. - types and features are matched by name, and features must have the same range (slot kind) - types and/or features in one type system not in the other are skipped over Header specifies to reader the format, and the compression level. How to Serialize: 1) create an instance of this class a) if doing a delta serialization, pass in the mark and a ReuseInfo object that was created after deserializing this CAS initially. b) if serializaing to a target with a different type system, pass the target's type system impl object so the serialization can filter the types for the target. 2) call serialize() to serialize the CAS 3) If doing serialization to a target from which you expect to receive back a delta CAS, create a ReuseInfo object from this object and reuse it for deserializing the delta CAS. TypeSystemImpl objects are lazily augmented by customized TypeInfo instances for each type encountered in serializing or deserializing. These are preserved for future calls, so their setup / initialization is only needed the first time. TypeSystemImpl objects are also lazily augmented by typeMappers for individual different target typesystems; these too are preserved and reused on future calls. Compressed Binary CASes are designed to be "self-describing" - The format of the compressed binary CAS, including version info, is inserted at the beginning so that a proper deserialization method can be automatically chosen. Compressed Binary format implemented by this class supports type system mapping. Types in the source which are not in the target (or vice versa) are omitted. Types with "extra" features have their extra features omitted (or on deserialization, they are set to their default value - null, or 0, etc.). Feature slots which hold references to types not in the target type system are replaced with 0 (null). How to Deserialize: 1) get an appropriate CAS to deserialize into. For delta CAS, it does not have to be empty, but it must be the originating CAS from which the delta was produced. 2) If the case is one where the target type system == the CAS's, and the serialized for is not Delta, then, call aCAS.reinit(source). Otherwise, create an instance of this class -%gt; xxx a) Assuming the object being deserialized has a different type system, set the "target" type system to the TypeSystemImpl instance of the object being deserialized. a) if delta deserializing, pass in the ReuseInfo object created when the CAS was serialized 3) call xxx.deserialize(inputStream) Compression/Decompression Works in two stages: application of Zip/Unzip to particular sub-collections of CAS data, grouped according to similar data distribution collection of like kinds of data (to make the zipping more effective) There can be up to ~20 of these collections, such as control info, float-exponents, string chars Deserialization: Read all bytes, create separate ByteArrayInputStreams for each segment create appropriate unzip data input streams for these Slow but expensive data: extra type system info - lazily created and added to shared TypeSystemImpl object set up per type actually referenced mapper for type system - lazily created and added to shared TypeSystemImpl object in identity-map cache (size limit = 10 per source type system?) - key is target typesystemimpl. Defaulting: flags: doMeasurements, compressLevel, CompressStrategy Per serialize call: cas, output, [target ts], [mark for delta] Per deserialize call: cas, input, [target ts], whether-to-save-info-for-delta-serialization CASImpl has instance method with defaulting args for serialization. CASImpl has reinit which works with compressed binary serialization objects if no type mapping If type mapping, (new BinaryCasSerDes6(cas, marker-or-null, targetTypeSystem (for stream being deserialized), reuseInfo-or-null) .deserialize(in-stream) Use Cases, filtering and delta ************************************************************************** * (de)serialize * filter? * delta? * Use case ************************************************************************** * serialize * N * N * Saving a Cas, * * * * sending Cas to service with identical ts ************************************************************************** * serialize * Y * N * sending Cas to service with * * * * different ts (a guaranteed subset) ************************************************************************** * serialize * N * Y * returning Cas to client * * * * uses info saved when deserializing * * * * (?? saving just a delta to disk??) ************************************************************************** * serialize * Y * Y * NOT SUPPORTED (not needed) ************************************************************************** * deserialize * N * N * reading/(receiving) CAS, identical TS ************************************************************************** * deserialize * Y * N * reading/receiving CAS, different TS * * * * ts not guaranteed to be superset * * * * for "reading" case. ************************************************************************** * deserialize * N * Y * receiving CAS, identical TS * * * * uses info saved when serializing ************************************************************************** * deserialize * Y * Y * receiving CAS, different TS (tgt a feature subset) * * * * uses info saved when serializing **************************************************************************
  • Field Details

    • EMPTY_STRING

      private static final String EMPTY_STRING
      See Also:
    • TRACE_SER

      private static final boolean TRACE_SER
      See Also:
    • TRACE_DES

      private static final boolean TRACE_DES
      See Also:
    • TRACE_MOD_SER

      private static final boolean TRACE_MOD_SER
      See Also:
    • TRACE_MOD_DES

      private static final boolean TRACE_MOD_DES
      See Also:
    • TRACE_STR_ARRAY

      private static final boolean TRACE_STR_ARRAY
      See Also:
    • srcTs

      private TypeSystemImpl srcTs
      Things set up for one instance of this class
    • tgtTs

      private final TypeSystemImpl tgtTs
    • compressLevel

      private final BinaryCasSerDes6.CompressLevel compressLevel
    • compressStrategy

      private final BinaryCasSerDes6.CompressStrat compressStrategy
    • cas

      private final CASImpl cas
      Things for both serialization and Deserialization
    • bcsd

      private final BinaryCasSerDes bcsd
    • stringHeapObj

      private final StringHeap stringHeapObj
    • nextFsId

      private int nextFsId
    • isSerializingDelta

      private final boolean isSerializingDelta
    • isDelta

      private boolean isDelta
    • isReadingDelta

      private boolean isReadingDelta
    • mark

      private final MarkerImpl mark
    • fsStartIndexes

      private final CasSeqAddrMaps fsStartIndexes
      maps from src id <-> tgt id For deserialization: if src type not exist, tgt to src is 0
    • reuseInfoProvided

      private final boolean reuseInfoProvided
    • doMeasurements

      private final boolean doMeasurements
    • os

      private OptimizeStrings os
    • only1CommonString

      private boolean only1CommonString
    • isTsIncluded

      private boolean isTsIncluded
    • isTsiIncluded

      private boolean isTsiIncluded
    • typeMapper

      private final CasTypeSystemMapper typeMapper
    • isTypeMapping

      private boolean isTypeMapping
      This is the used version of isTypeMapping, normally == to isTypeMappingCmn But compareCASes sets this false temporarily while setting up the compare
    • prevHeapInstanceWithIntValues

      private final int[][] prevHeapInstanceWithIntValues
      Hold prev instance of FS which have non-array FSRef slots, to allow computing these to match case where a 0 value is used because of type filtering and also to allow for forward references. Note: we can't use the actual prev FS, because for type filtering, it may not exist! and even if it exists, it may not be fixed up (forward ref not yet deserialized) for each target typecode, only set if the type has 1 or more non-array fsref set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based) Also used for array refs sometimes, for the 1st entry in the array - feature slot 0 is used for this when reading (not when writing - could be made more uniform)
    • prevFsWithLongValues

      private final Int2ObjHashMap<long[],long[]> prevFsWithLongValues
      Hold prev values of "long" slots, by type, for instances of FS which are non-arrays containing slots which have long values, used for differencing - not using the actual FS instance, because during deserialization, these may not be deserialized due to type filtering set only for non-filtered domain types set only for non-0 values if fsRef is to filtered type, value serialized will be 0, but this slot not set On deserialization: if value is 0, skip setting first index: key is type code 2nd index: key is slot-offset number (0-based)
    • foundFSs

      private PositiveIntSet foundFSs
      ordered set of FSs found in indexes or linked from other found FSs. used to control loops/recursion when locating things
    • foundFSsBelowMark

      private PositiveIntSet foundFSsBelowMark
      ordered set of FSs found in indexes or linked from other found FSs, which are below the mark. used to control loops/recursion when locating things
    • fssToSerialize

      private List<TOP> fssToSerialize
      FSs being serialized. For delta, just the deltas above the delta line. Constructed from indexed plus reachable, above the delta line.
    • uimaSerializableSavedToCas

      private PositiveIntSet uimaSerializableSavedToCas
      Set of FSes on which UimaSerializable _save_to_cas_data has already been called.
    • toBeScanned

      private final List<TOP> toBeScanned
      FSs being processed, including below-the-line deltas.
    • debugEOF

      private final boolean debugEOF
      See Also:
    • serializedOut

      private DataOutputStream serializedOut
      Things for just serialization
    • sm

      private final SerializationMeasures sm
    • baosZipSources

      private final ByteArrayOutputStream[] baosZipSources
    • dosZipSources

      private final DataOutputStream[] dosZipSources
    • byte_dos

      private DataOutputStream byte_dos
    • typeCode_dos

      private DataOutputStream typeCode_dos
    • strOffset_dos

      private DataOutputStream strOffset_dos
    • strLength_dos

      private DataOutputStream strLength_dos
    • float_Mantissa_Sign_dos

      private DataOutputStream float_Mantissa_Sign_dos
    • float_Exponent_dos

      private DataOutputStream float_Exponent_dos
    • double_Mantissa_Sign_dos

      private DataOutputStream double_Mantissa_Sign_dos
    • double_Exponent_dos

      private DataOutputStream double_Exponent_dos
    • fsIndexes_dos

      private DataOutputStream fsIndexes_dos
    • control_dos

      private DataOutputStream control_dos
    • strSeg_dos

      private DataOutputStream strSeg_dos
    • allowPreexistingFS

      private AllowPreexistingFS allowPreexistingFS
      Things for just deserialization
    • deserIn

      private DataInputStream deserIn
    • version

      private int version
    • dataInputs

      private final DataInputStream[] dataInputs
    • inflaters

      private final Inflater[] inflaters
    • fixupsNeeded

      private final List<Runnable> fixupsNeeded
      the "fixups" for relative heap refs actions set slot values
    • uimaSerializableFixups

      private final List<Runnable> uimaSerializableFixups
    • singleFsDefer

      private final List<Runnable> singleFsDefer
      Deferred actions to set Feature Slots of feature structures. the deferrals needed when deserializing a subtype of AnnotationBase before the sofa is known Also for Sofa creation where some fields are final
    • sofaNum

      private int sofaNum
      used for deferred creation
    • sofaName

      private String sofaName
    • sofaMimeType

      private String sofaMimeType
    • sofaRef

      private Sofa sofaRef
    • currentFs

      private TOP currentFs
      the FS being deserialized
    • isUpdatePrevOK

      private boolean isUpdatePrevOK
    • readCommonString

      private String[] readCommonString
    • arrayLength_dis

      private DataInputStream arrayLength_dis
    • heapRef_dis

      private DataInputStream heapRef_dis
    • int_dis

      private DataInputStream int_dis
    • byte_dis

      private DataInputStream byte_dis
    • short_dis

      private DataInputStream short_dis
    • typeCode_dis

      private DataInputStream typeCode_dis
    • strOffset_dis

      private DataInputStream strOffset_dis
    • strLength_dis

      private DataInputStream strLength_dis
    • long_High_dis

      private DataInputStream long_High_dis
    • long_Low_dis

      private DataInputStream long_Low_dis
    • float_Mantissa_Sign_dis

      private DataInputStream float_Mantissa_Sign_dis
    • float_Exponent_dis

      private DataInputStream float_Exponent_dis
    • double_Mantissa_Sign_dis

      private DataInputStream double_Mantissa_Sign_dis
    • double_Exponent_dis

      private DataInputStream double_Exponent_dis
    • fsIndexes_dis

      private DataInputStream fsIndexes_dis
    • strChars_dis

      private DataInputStream strChars_dis
    • control_dis

      private DataInputStream control_dis
    • strSeg_dis

      private DataInputStream strSeg_dis
    • lastArrayLength

      private int lastArrayLength
  • Constructor Details

  • Method Details

    • getReuseInfo

      public BinaryCasSerDes6.ReuseInfo getReuseInfo()
    • serialize

      public SerializationMeasures serialize(Object out) throws IOException
      S E R I A L I Z E
      Parameters:
      out - -
      Returns:
      null or serialization measurements (depending on setting of doMeasurements)
      Throws:
      IOException - passthru
    • serializeArray

      private void serializeArray(TOP fs) throws IOException
      Throws:
      IOException
    • serializeByKind

      private void serializeByKind(TOP fs, FeatureImpl feat) throws IOException
      serialize one feature structure, which is guaranteed not to be null guaranteed to exist in target if there is type mapping Caller iterates over target slots, but the feat arg is for the corresponding src feature
      Parameters:
      fs - the FS whose slot "feat" is to be serialize
      feat - the corresponding source feature slot to serialize
      Throws:
      IOException
    • serializeArrayLength

      private int serializeArrayLength(CommonArrayFS array) throws IOException
      Throws:
      IOException
    • serializeDiffWithPrevTypeSlot

      private void serializeDiffWithPrevTypeSlot(SlotKinds.SlotKind kind, TOP fs, FeatureImpl feat, int newValue) throws IOException
      Throws:
      IOException
    • updatePrevIntValue

      private void updatePrevIntValue(TypeImpl ti, int featOffset, int newValue)
      Called for non-arrays
      Parameters:
      featOffset - offset to the slot
      newValue - for heap refs, is the converted-from-addr-to-seq-number value
      fs - used to get the type
    • updatePrevLongValue

      private void updatePrevLongValue(TypeImpl ti, int featOffset, long newValue)
    • updatePrevArray0IntValue

      private void updatePrevArray0IntValue(TypeImpl ti, int newValue)
      version called for arrays, captures the 0th value
      Parameters:
      ti -
      newValue -
    • initPrevIntValue

      private int[] initPrevIntValue(TypeImpl ti)
      Get and lazily initialize if needed the feature cache values for a type For Serializing, the type belongs to the srcTs For Deserializing, the type belongs to the tgtTs
      Parameters:
      ti - the type
      Returns:
      the int feature cache
    • initPrevLongValue

      private long[] initPrevLongValue(TypeImpl ti)
      Get and lazily initialize if needed the long values for a type For Serializing and Deserializing, the type belongs to the tgtTs
      Parameters:
      ti - the type
      Returns:
      the int feature cache
    • getPrevIntValue

      private int getPrevIntValue(int typeCode, int featOffset)
      For heaprefs this gets the previously serialized int value
      Parameters:
      typeCode - the type code
      featOffset - true offset, 1 = first feature...
      Returns:
      the previous int value for use in difference calculations
    • getPrevLongValue

      private long getPrevLongValue(int typeCode, int featOffset)
    • collectAndZip

      private void collectAndZip() throws IOException
      Method: write with deflation into a single byte array stream skip if not worth deflating skip the Slot_Control stream record in the Slot_Control stream, for each deflated stream: the Slot index the number of compressed bytes the number of uncompressed bytes add to header: nbr of compressed entries the Slot_Control stream size the Slot_Control stream all the zipped streams
      Throws:
      IOException - passthru
    • writeLong

      private void writeLong(long v, long prev) throws IOException
      Throws:
      IOException
    • writeString

      private void writeString(String s) throws IOException
      Throws:
      IOException
    • writeFloat

      private void writeFloat(int raw) throws IOException
      Throws:
      IOException
    • writeVnumber

      private void writeVnumber(int kind, int v) throws IOException
      Throws:
      IOException
    • writeVnumber

      private void writeVnumber(int kind, long v) throws IOException
      Throws:
      IOException
    • writeVnumber

      private void writeVnumber(DataOutputStream s, int v) throws IOException
      Throws:
      IOException
    • writeVnumber

      private void writeVnumber(DataOutputStream s, long v) throws IOException
      Throws:
      IOException
    • writeUnsignedByte

      private void writeUnsignedByte(DataOutputStream s, int v) throws IOException
      Throws:
      IOException
    • writeDouble

      private void writeDouble(long raw) throws IOException
      Throws:
      IOException
    • encodeIntSign

      private int encodeIntSign(int v)
    • writeDiff

      private int writeDiff(int kind, int v, int prev) throws IOException
      Encoding: bit 6 = sign: 1 = negative bit 7 = delta: 1 = delta
      Parameters:
      kind - selects the stream to write to
      v - runs from iHeap + 3 to end of array
      prev - for difference encoding sets isUpdatePrevOK true if ok to update prev, false if writing 0 for any reason, or max neg nbr
      Throws:
      IOException - passthru
    • write0

      private void write0(int kind) throws IOException
      Throws:
      IOException
    • deserialize

      public void deserialize(InputStream istream) throws IOException
      Parameters:
      istream - -
      Throws:
      IOException - -
    • deserialize

      public void deserialize(InputStream istream, AllowPreexistingFS allowPreexistingFS) throws IOException
      Version used by uima-as to read delta cas from remote parallel steps
      Parameters:
      istream - input stream
      allowPreexistingFS - what to do if item already exists below the mark
      Throws:
      IOException - passthru
    • deserializeAfterVersion

      public void deserializeAfterVersion(DataInputStream istream, boolean isDelta, AllowPreexistingFS allowPreexistingFS) throws IOException
      Throws:
      IOException
    • createCurrentFs

      private void createCurrentFs(TypeImpl type, CASImpl view)
    • readArray

      private void readArray(boolean storeIt, TypeImpl srcType, TypeImpl tgtType) throws IOException
      Parameters:
      storeIt -
      srcType - may be null if there's no source type for target when deserializing
      tgtType - the type being deserialized
      Throws:
      IOException
    • getRefVal

      private TOP getRefVal(int tgtSeq)
    • readArrayLength

      private int readArrayLength() throws IOException
      Throws:
      IOException
    • readByKind

      private void readByKind(TOP fs, FeatureImpl tgtFeat, FeatureImpl srcFeat, boolean storeIt, TypeImpl tgtType) throws IOException
      Parameters:
      tgtFeat - the Feature being read
      srcFeat - the Feature being set (may be null if the feature doesn't exist)
      storeIt - false causes storing of values to be skipped
      The - feature structure to set feature value in, but may be null if it was deferred, - happens for Sofas and subtypes of AnnotationBase because those have "final" values For Sofa: these are the sofaid (String) and sofanum (int) For AnnotationBase : this is the sofaRef (and the view).
      Throws:
      IOException - passthru
    • maybeStoreOrDefer

      private void maybeStoreOrDefer(boolean storeIt, TOP fs, Consumer<TOP> doStore)
    • maybeStoreOrDefer_slotFixups

      private void maybeStoreOrDefer_slotFixups(int tgtSeq, Consumer<TOP> r)
      FS Ref slots fixups
      Parameters:
      tgtSeq - the int value of the target seq number
      r - is sofa-or-lfs.setFeatureValue-or-setLocalSofaData(TOP ref-d-fs)
    • readIndexedFeatureStructures

      private void readIndexedFeatureStructures() throws IOException
      process index information to re-index things
      Throws:
      IOException
    • readFsxPart

      private void readFsxPart(IntVector fsIndexes) throws IOException
      Each FS index is sorted, and output is by delta
      Throws:
      IOException
    • getInputStream

      private DataInput getInputStream(SlotKinds.SlotKind kind)
    • readVnumber

      private int readVnumber(DataInputStream dis) throws IOException
      Throws:
      IOException
    • readVlong

      private long readVlong(DataInputStream dis) throws IOException
      Throws:
      IOException
    • readIntoByteArray

      private void readIntoByteArray(byte[] array, int length, boolean storeIt) throws IOException
      Throws:
      IOException
    • readIntoShortArray

      private void readIntoShortArray(short[] array, int length, boolean storeIt) throws IOException
      Throws:
      IOException
    • readIntoLongArray

      private void readIntoLongArray(long[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws IOException
      Throws:
      IOException
    • readIntoDoubleArray

      private void readIntoDoubleArray(double[] array, SlotKinds.SlotKind kind, int length, boolean storeIt) throws IOException
      Throws:
      IOException
    • readDiff

      private int readDiff(SlotKinds.SlotKind kind, int prev) throws IOException
      Throws:
      IOException
    • readDiffIntSlot

      private int readDiffIntSlot(boolean storeIt, int featOffset, SlotKinds.SlotKind kind, TypeImpl tgtType) throws IOException
      Throws:
      IOException
    • readDiff

      private int readDiff(DataInput in, int prev) throws IOException
      Throws:
      IOException
    • readLongOrDouble

      private long readLongOrDouble(SlotKinds.SlotKind kind, long prev) throws IOException
      Throws:
      IOException
    • skipLong

      private void skipLong(int length) throws IOException
      Throws:
      IOException
    • skipDouble

      private void skipDouble(int length) throws IOException
      Throws:
      IOException
    • readFloat

      private int readFloat() throws IOException
      Throws:
      IOException
    • decodeIntSign

      private int decodeIntSign(int v)
    • readDouble

      private long readDouble() throws IOException
      Throws:
      IOException
    • decodeDouble

      private long decodeDouble(long mants, int exponent)
    • readVlong

      private long readVlong(DataInput dis) throws IOException
      Throws:
      IOException
    • readString

      private String readString(boolean storeIt) throws IOException
      Parameters:
      storeIt - true to store value, false to skip it
      Returns:
      the string
      Throws:
      IOException
    • skipBytes

      static void skipBytes(DataInputStream stream, int skipNumber) throws IOException
      Throws:
      IOException
    • processIndexedFeatureStructures

      private void processIndexedFeatureStructures(CASImpl cas1, boolean isWrite) throws IOException
      Throws:
      IOException
    • processFSsForView

      private void processFSsForView(boolean isEnqueue, Stream<TOP> fss)
      processes one view's worth of feature structures
      Parameters:
      fsIndexes -
      fsNdxStart -
      isDoingEnqueue -
      isWrite -
      Throws:
      IOException
    • enqueueFS

      private void enqueueFS(TOP fs)
      Add Fs to toBeProcessed and set foundxxx bit - skip this if doesn't exist in target type system
      Parameters:
      fs -
    • isTypeInTgt

      private boolean isTypeInTgt(TOP fs)
    • initSrcTgtIdMapsAndStrings

      private void initSrcTgtIdMapsAndStrings()
      Serializing: Called at beginning of serialize, scans whole CAS or just delta CAS If doing delta serialization, fsStartIndexes is passed in, pre-initialized with a copy of the map info below the line.
    • addStringsFromFS

      private void addStringsFromFS(TOP fs)
      Add all the strings ref'd by this FS. - if it is a string array, do all the array items - else scan the features and do all string-valued features, in feature offset order For delta, this isn't done here - another routine driven by FsChange info does this.
    • compareCASes

      public boolean compareCASes(CASImpl c1, CASImpl c2)
      Compare 2 CASes, with perhaps different type systems. If the type systems are different, construct a type mapper and use that to selectively ignore types or features not in other type system The Mapper is from CAS1 -> CAS2 When computing the things to compare from CAS1, filter to remove feature structures not reachable via indexes or refs
      Parameters:
      c1 - CAS to compare
      c2 - CAS to compare
      Returns:
      true if equal (for types / features in both)
    • makeDataOutputStream

      private static DataOutputStream makeDataOutputStream(Object f) throws FileNotFoundException
      Parameters:
      f - can be a DataOutputStream, an OutputStream a File
      Returns:
      a data output stream
      Throws:
      FileNotFoundException - passthru
    • setupOutputStreams

      private void setupOutputStreams(Object out) throws FileNotFoundException
      Set up Streams
      Throws:
      FileNotFoundException - passthru
    • setupOutputStreams

      static void setupOutputStreams(CASImpl cas, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources)
    • setupOutputStream

      private static void setupOutputStream(int i, int size, ByteArrayOutputStream[] baosZipSources, DataOutputStream[] dosZipSources)
    • setupReadStreams

      private void setupReadStreams() throws IOException
      Throws:
      IOException
    • setupReadStream

      private void setupReadStream(int slotIndex, int bytesCompr, int bytesOrig) throws IOException
      Throws:
      IOException
    • closeDataInputs

      private void closeDataInputs()
    • readHeader

      private CommonSerDes.Header readHeader(InputStream istream) throws IOException
      HEADERS
      Throws:
      IOException - passthru
    • writeStringInfo

      private void writeStringInfo() throws IOException
      Throws:
      IOException
    • getTgtSeqFromSrcFS

      private int getTgtSeqFromSrcFS(TOP fs)
      For Serialization only. Map src FS to tgt seq number: fs == null -> 0 type not in target -> 0 map src fs._id to tgt seq
      Parameters:
      fs -
      Returns:
      0 or the mapped src id
    • getTgtTs

      TypeSystemImpl getTgtTs()