Class BinaryCasSerDes

java.lang.Object
org.apache.uima.cas.impl.BinaryCasSerDes

public class BinaryCasSerDes extends Object
Binary (mostly non compressed) CAS deserialization The methods in this class were originally part of the CASImpl, and were moved here to this class for v3 Binary non compressed CAS serialization is in class CASSerializer, but that class uses routines and data structures in this class. There is one instance of this class per CAS (shared by all views of that CAS), created at the same time the CAS is created. This instance also holds data needed for binary serialization, and deserialization. For binary delta deserialization, it uses the data computed on a previous serialization, or, if none, it re-computes it. See scanAllFSsForBinarySerialization method. The data is computed lazily, and reset with cas reset. Lifecycle: created when a CAS (any view) is first created, as part of the shared view data for that CAS. never re-created. Data created when non-delta serializing, in case needed when delta-deserializing later: xxxAuxAddr2fsa maps aux arrays to FSs heaps and nextXXXHeapAddrAfterMark (in this case mark is the end). Reset: Instance Data: baseCas - ref to the corresponding CAS (final) tsi - the CAS's type system impl (can change; each use sets it from CAS API) heaps - there is 1 main heap, and 4 aux heaps (Byte, Short, Long, and String Some uses of this class require these be materialized. (May be input or output) for Delta deserialization: 5 ints - representing the first free address in the above 5 heaps, after the mark For delta deserialization: Maps for Aux arrays representing updatable arrays (not String): From starting addr in the aux array to the corresponding V3 FS object
  • Field Details

    • TRACE_DESER

      private static final boolean TRACE_DESER
      See Also:
    • SOFA_IN_NORMAL_ORDER

      private static final boolean SOFA_IN_NORMAL_ORDER
      See Also:
    • SOFA_AHEAD_OF_NORMAL_ORDER

      private static final boolean SOFA_AHEAD_OF_NORMAL_ORDER
      See Also:
    • arrayLengthFeatOffset

      private static final int arrayLengthFeatOffset
      The offset for the array length cell. An array consists of length+2 number of cells, where the first cell contains the type, the second one the length, and the rest the actual content of the array.
      See Also:
    • arrayContentOffset

      private static final int arrayContentOffset
      The number of cells we need to skip to get to the array contents. That is, if we have an array starting at addr, the first cell is at addr+arrayContentOffset.
      See Also:
    • baseCas

      private final CASImpl baseCas
    • tsi

      private TypeSystemImpl tsi
    • heap

      Heap heap
    • byteHeap

      ByteHeap byteHeap
    • shortHeap

      ShortHeap shortHeap
    • longHeap

      LongHeap longHeap
    • stringHeap

      StringHeap stringHeap
    • nextHeapAddrAfterMark

      int nextHeapAddrAfterMark
      These next are for delta (de)serialization, and identify the first slot in the aux or string tables for new FS data when there's a mark set. These values are read by CASSerializer when doing delta serialization, and set at the end of a matching binary deserialization. When serializing a delta, the heaps used are storing just the delta, so any numbers for offsets they yield are adjusted by adding these, so that when the delta is deserialized (and these augment the existing heaps), the references are correct with respect to the deserialized heap model.
    • nextStringHeapAddrAfterMark

      int nextStringHeapAddrAfterMark
    • nextByteHeapAddrAfterMark

      int nextByteHeapAddrAfterMark
    • nextShortHeapAddrAfterMark

      int nextShortHeapAddrAfterMark
    • nextLongHeapAddrAfterMark

      int nextLongHeapAddrAfterMark
    • byteAuxAddr2fsa

      private final Int2ObjHashMap<TOP,TOP> byteAuxAddr2fsa
      Map from an aux addr starting address for an array of boolean/byte/short/long/double to the V3 FS. key = simulated starting address in aux heap for the array value = FS having that array When deserializing a modification, used to find the v3 FS and the offset in the array to modify. created when serializing (in case receive delta deser back). created when delta deserializing if not available from previous serialization. updated when delta deserializing. reset at end of delta deserializings because multiple mods not supported
    • shortAuxAddr2fsa

      private final Int2ObjHashMap<TOP,TOP> shortAuxAddr2fsa
    • longAuxAddr2fsa

      private final Int2ObjHashMap<TOP,TOP> longAuxAddr2fsa
    • isBeforeV3

      boolean isBeforeV3
      used to calculate total heap size
  • Constructor Details

    • BinaryCasSerDes

      public BinaryCasSerDes(CASImpl baseCAS)
  • Method Details

    • reinit

      public void reinit(CASSerializer ser)
      Deserializer for Java-object serialized instance of CASSerializer.
      Parameters:
      ser - - The instance to convert back to a CAS
    • reinit

      void reinit(int[] heapMetadata, int[] heapArray, String[] stringTable, int[] fsIndex, byte[] byteHeapArray, short[] shortHeapArray, long[] longHeapArray)
      This is for deserializing (never delta) from a serialized java object representation or maybe from the JNI bridge both callers do a cas reset of some kind
      Parameters:
      heapMetadata - -
      heapArray - -
      stringTable - -
      fsIndex - -
      byteHeapArray - -
      shortHeapArray - -
      longHeapArray - -
    • setupCasFromCasMgrSerializer

      public CASImpl setupCasFromCasMgrSerializer(CASMgrSerializer casMgrSerializer)
    • reinit

      public void reinit(CASCompleteSerializer casCompSer)
      Deserializer for CASCompleteSerializer instances - includes type system and index definitions Never delta
      Parameters:
      casCompSer - -
    • reinit

      public SerialFormat reinit(InputStream istream) throws CASRuntimeException
      see Blob Format in CASSerializer This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. Supports delta deserialization. For that, the the csds from the serialization event must be used.
      Parameters:
      istream - -
      Returns:
      - the format of the input stream detected
      Throws:
      CASRuntimeException - wraps IOException
    • reinit

      public SerialFormat reinit(CommonSerDes.Header h, InputStream istream, CASMgrSerializer casMgrSerializer, CasLoadMode casLoadMode, BinaryCasSerDes6 f6, AllowPreexistingFS allowPreexistingFS, TypeSystemImpl ts) throws CASRuntimeException
      Deserialize a binary input stream, after reading the header, and optionally an externally provided type system and index spec used in compressed form 6 serialization previously This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. The corresponding serialization code is in org.apache.uima.cas.impl.Serialization, also see CasIOUtils
      Parameters:
      h - -
      istream - -
      casMgrSerializer - null or the Java object representing the externally supplied type and maybe indexes definition (TSI)
      casLoadMode - DEFAULT or REINIT. REINIT required with compressed form 6 to reinitialize the cas's type system and index definition, for form 6.
      f6 - only used for form 6 where an instance of BinaryCasSerDes6 has been initialized
      allowPreexistingFS - only used for form 6 delta deserialization
      ts - the type system
      Returns:
      the format that was deserialized
      Throws:
      CASRuntimeException - wraps IOException
    • maybeReadEmbeddedTSI

      static CASMgrSerializer maybeReadEmbeddedTSI(CommonSerDes.Header h, DataInputStream dis)
    • binaryDeserialization

      private SerialFormat binaryDeserialization(CommonSerDes.Header h)
      build a model of the heap, string and aux heaps. For delta deserialization, this is presumed to be in response to a previous serialization for delta - these can be just for the new ones read into these recreate / update V3 feature structures from this data delta CAS supported use case: CAS(1) -> binary serialize -> binary deserialize -> CAS(2). CAS(2) has mark set (before any new activity in deserialized CAS) CAS(2) has updates - new FSs, and mods to existing ones CAS(2) -> delta binary ser -> delta binary deser -> CAS(1). V3 supports the above scenario by retaining some information in CAS(2) at the end of the initial deserialization, including the model heap size/cellsUsed. - this is needed to properly do a compatible-with-v2 delta serialization. delta CAS edge use cases not supported: serialize (not binary), then receive delta binary serialization Both v2 and v3 assume that the delta mark is set immediately after binary deserialization; otherwise, subsequent binary deserialization of the delta will fail. This method assumes a previous binary serialization was done, and the following data structures are still valid (i.e. no CAS altering operations have been done) (these are reset: heap, stringHeap, byteHeap, shortHeap, longHeap) csds, [string/byte/short/long]auxAddr2fs (for array mods) nextHeapAddrAfterMark, next[string/byte/short/long]HeapAddrAfterMark
      Parameters:
      h - the Header (read by the caller)
      Returns:
      the format of the incoming serialized data
    • setHeapExtents

      void setHeapExtents()
    • updateAuxArrayMods

      int updateAuxArrayMods(CommonSerDes.Reading r, Int2ObjHashMap<TOP,TOP> auxAddr2fsa, Consumer_T_int_withIOException<TOP> setter) throws IOException
      Called 3 times to process non-compressed binary deserialization of aux array modifications - once for byte/boolean, short, and long/double
      Returns:
      heapsz (used by caller to do word alignment)
      Throws:
      IOException
    • reinitIndexedFSs

      void reinitIndexedFSs(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr)
      This routine is used by several of the deserializers. Each one may have a different way to go from the addr to the fs e.g. Compressed form 6: fsStartIndexes.getSrcFsFromTgtSeq(...) plain binary: addr2fs.get(...) gets number of views, number of sofas, For all sofas, adds them to the index repo in the base index registers the sofa insures initial view created for all views: does the view action and updates the documentannotation
      Parameters:
      fsIndex - - array of fsRefs and counts, for sofas, and all views
      isDeltaMods - - true for calls which are for delta mods - these have adds/removes
    • reinitIndexedFSs

      void reinitIndexedFSs(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, IntFunction<TOP> getSofaFromAddr)
    • reinitIndexedFSsSofas

      int reinitIndexedFSsSofas(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr)
    • reinitIndexedFSs

      void reinitIndexedFSs(int[] fsIndex, boolean isDeltaMods, IntFunction<TOP> getFsFromAddr, int numViews, int idx)
    • reinitDeltaIndexedFSsInner

      void reinitDeltaIndexedFSsInner(FSIndexRepositoryImpl ir, int[] fsindexes, int idx, int length, boolean isAdd, IntFunction<TOP> getFsFromAddr)
      Given a list of FSs and a starting index and length: iterate over the FSs, and add or remove that from the indexes.
      Parameters:
      ir - index repository
      length - the length
      isAdd - true to add, false to remove
      fss - the list having the fss
      fsIdx - the starting index
    • getSortedArrayAddrsIndex

      private int getSortedArrayAddrsIndex(int[] sortedArrayAddrs, int auxAddr, int sortedArrayAddrsIndex)
      given an aux address representing an element of an array, find the start of the array Fast path for the same as before array. binary search of subsequent ones (the addresses in the serializations are not sorted.)
      Parameters:
      auxAddr - the address being updated
      sortedStarts - the sorted array of start addresses
      currentStart - the last value found for fast path
      Returns:
      index into the sortedStarts
    • getIndexedFSs

      int[] getIndexedFSs(Obj2IntIdentityHashMap<TOP> fs2addr)
      Serialization support *
    • addIdsToIntVector

      void addIdsToIntVector(Collection<TOP> fss, IntVector v, Obj2IntIdentityHashMap<TOP> fs2addr)
    • addIdsToIntVector

      void addIdsToIntVector(Set<TOP> fss, IntVector v, Obj2IntIdentityHashMap<TOP> fs2addr)
    • getDeltaIndexedFSs

      int[] getDeltaIndexedFSs(MarkerImpl mark, Obj2IntIdentityHashMap<TOP> fs2addr)
    • createStringTableFromArray

      void createStringTableFromArray(String[] stringTable)
    • getFsSpaceReq

      public static int getFsSpaceReq(TOP fs, TypeImpl type)
    • scanAllFSsForBinarySerialization

      List<TOP> scanAllFSsForBinarySerialization(MarkerImpl mark, CommonSerDesSequential csds)
      Called when serializing a cas, or deserializing a delta CAS, if not saved in that case from a previous binary serialization (in that case, the scan is done as if it is doing a non-delta serialization). Initialize the serialization model for binary serialization in CASSerializer from a CAS Do 2 scans, each by walking all the reachable FSs - The first one processes all fs (including for delta, those below the line) -- computes the fs to addr map and its inverse, based on the size of each FS. -- done by CommonSerDesSequential class's "setup" method - The second one computes the values of the main and aux heaps and string heaps except for delta mods -- for delta, the heaps only have "new" values that binary serialization will write out as arrays --- mods are computed from FsChange info and added to the appropriate heaps, later - for byte/short/long/string array use, compute auxAddr2fsa maps. This is used when deserializing delta mod info, to locate the fs to update For delta serialization, the heaps are populated only with the new values. - Values "nextXXHeapAddrAfterMark" are added to main heap refs to aux heaps and to string tables, so they are correct after deserialization does delta deserialization and adds the aux heap and string heap info to the existing heaps. This is also done for the main heap refs, so that refs to existing FSs below the line and above the line are treated uniformly. The results must be retained for the use case of subsequently receiving back a delta cas.
      Parameters:
      mark - null or the mark to use for separating the new from from the previously existing used by delta cas.
      cs - the CASSerializer instance used to record the results of the scan
      Returns:
      null or for delta, all the found FSs
    • extractFsToV2Heaps

      private void extractFsToV2Heaps(TOP fs, boolean isMarkSet, Obj2IntIdentityHashMap<TOP> fs2addr)
      called in fs._id order to populate heaps from all FSs. For delta cas, only called for new above-the-line FSs
      Parameters:
      fs - Feature Structure to use to set heaps
      isMarkSet - true if mark is set, used to compute first
    • createFSsFromHeaps

      private void createFSsFromHeaps(boolean isDelta, int startPos, CommonSerDesSequential csds)
      Given the deserialized main heap, byte heap, short heap, long heap and string heap, a) create the corresponding FSs, populating a b) addr2fs map, key = fsAddr, value = FS c) auxAddr2fs map, key = aux Array Start addr, value = FS corresponding to that primitive bool/byte/short/long/double array For some use cases, the byte / short / long heaps have not yet been initialized. - when data is available, deserialization will update the values in the fs directly Each new fs created augments the addr2fs map. - forward fs refs are put into deferred update list deferModFs Each new fs created which is a Boolean/Byte/Short/Long/Double array updates auxAddr2fsa map if the aux data is not available (update is put on deferred list). deferModByte deferModShort deferModLong Each new fs created which has a slot referencing a long/double not yet read in creates a deferred update specifying the fs, the slot, indexed by the addr in the aux table. see deferModStr deferModLong deferModDouble Notes: Subtypes of AnnotationBase created in the right view DocumentAnnotation - update out-of-indexes FSs not subtypes of AnnotationBase are **all** associated with the initial view. Delta serialization: this routine adds just the new (above-the-line) FSs, and augments existing addr2fs and auxAddr2fsa
    • setFeatOrDefer

      private void setFeatOrDefer(int heapIndex, FeatureImpl feat, List<Runnable> fixups4forwardFsRefs, Consumer<TOP> setter, Int2ObjHashMap<TOP,TOP> addr2fs)
    • heapFeat

      private int heapFeat(int nextFsAddr, FeatureImpl feat)
    • getSofaFromAnnotBase

      private Sofa getSofaFromAnnotBase(int annotBaseAddr, StringHeap stringHeap2, Int2ObjHashMap<TOP,TOP> addr2fs, CommonSerDesSequential csds)
    • makeSofaFromHeap

      private Sofa makeSofaFromHeap(int sofaAddr, StringHeap stringHeap2, CommonSerDesSequential csds, boolean isUnordered)
    • updateHeapSlot

      private void updateHeapSlot(BinaryCasSerDes.BinDeserSupport bds, int slotAddr, int slotValue, Int2ObjHashMap<TOP,TOP> addr2fs)
      Doing updates for delta cas for existing objects. Cases: - item in heap-stored-array = update the corresponding item in the FS - non-ref in feature slot - update the corresponding feature - ref (to long/double value, to string) -- these always reference entries in long/string tables that are new (above the line) -- these have already been deserialized - ref (to main heap) - can update this directly NOTE: entire aux arrays never have their refs to the aux heaps updated, for arrays of boolean, byte, short, long, double NOTE: Slot updates for FS refs always point to addr which are in the addr2fs table or are 0 (null), because if the ref is to a new one, those have been already deserialized by this point, and if the ref is to a below-the-line one, those are already put into the addr2fs table
      Parameters:
      bds - - helper data
      slotAddr - - the main heap slot addr being updated
      slotValue - - the new value
    • updateStringFeature

      private boolean updateStringFeature(TOP fs, FeatureImpl feat, String s, List<Runnable> fixups4forwardFsRefs)
      Parameters:
      fs -
      feat -
      s -
      fixups4forwardFsRefs -
      Returns:
      true if caller needs to do an appropriate fs._setStringValue...
    • getCas

      CASImpl getCas()
    • clearDeltaOffsets

      private void clearDeltaOffsets()
    • clearAuxAddr2fsa

      private void clearAuxAddr2fsa()
    • clear

      public void clear()
      called by cas reset