Class BinaryCasSerDes


  • public class BinaryCasSerDes
    extends java.lang.Object
    Binary (mostly non compressed) CAS deserialization The methods in this class were originally part of the CASImpl, and were moved here to this class for v3 Binary non compressed CAS serialization is in class CASSerializer, but that class uses routines and data structures in this class. There is one instance of this class per CAS (shared by all views of that CAS), created at the same time the CAS is created. This instance also holds data needed for binary serialization, and deserialization. For binary delta deserialization, it uses the data computed on a previous serialization, or, if none, it re-computes it. See scanAllFSsForBinarySerialization method. The data is computed lazily, and reset with cas reset. Lifecycle: created when a CAS (any view) is first created, as part of the shared view data for that CAS. never re-created. Data created when non-delta serializing, in case needed when delta-deserializing later: xxxAuxAddr2fsa maps aux arrays to FSs heaps and nextXXXHeapAddrAfterMark (in this case mark is the end). Reset: Instance Data: baseCas - ref to the corresponding CAS (final) tsi - the CAS's type system impl (can change; each use sets it from CAS API) heaps - there is 1 main heap, and 4 aux heaps (Byte, Short, Long, and String Some uses of this class require these be materialized. (May be input or output) for Delta deserialization: 5 ints - representing the first free address in the above 5 heaps, after the mark For delta deserialization: Maps for Aux arrays representing updatable arrays (not String): From starting addr in the aux array to the corresponding V3 FS object
    • Field Detail

      • SOFA_IN_NORMAL_ORDER

        private static final boolean SOFA_IN_NORMAL_ORDER
        See Also:
        Constant Field Values
      • SOFA_AHEAD_OF_NORMAL_ORDER

        private static final boolean SOFA_AHEAD_OF_NORMAL_ORDER
        See Also:
        Constant Field Values
      • arrayLengthFeatOffset

        private static final int arrayLengthFeatOffset
        The offset for the array length cell. An array consists of length+2 number of cells, where the first cell contains the type, the second one the length, and the rest the actual content of the array.
        See Also:
        Constant Field Values
      • arrayContentOffset

        private static final int arrayContentOffset
        The number of cells we need to skip to get to the array contents. That is, if we have an array starting at addr, the first cell is at addr+arrayContentOffset.
        See Also:
        Constant Field Values
      • baseCas

        private final CASImpl baseCas
      • nextHeapAddrAfterMark

        int nextHeapAddrAfterMark
        These next are for delta (de)serialization, and identify the first slot in the aux or string tables for new FS data when there's a mark set. These values are read by CASSerializer when doing delta serialization, and set at the end of a matching binary deserialization. When serializing a delta, the heaps used are storing just the delta, so any numbers for offsets they yield are adjusted by adding these, so that when the delta is deserialized (and these augment the existing heaps), the references are correct with respect to the deserialized heap model.
      • nextStringHeapAddrAfterMark

        int nextStringHeapAddrAfterMark
      • nextByteHeapAddrAfterMark

        int nextByteHeapAddrAfterMark
      • nextShortHeapAddrAfterMark

        int nextShortHeapAddrAfterMark
      • nextLongHeapAddrAfterMark

        int nextLongHeapAddrAfterMark
      • byteAuxAddr2fsa

        private final Int2ObjHashMap<TOP,​TOP> byteAuxAddr2fsa
        Map from an aux addr starting address for an array of boolean/byte/short/long/double to the V3 FS. key = simulated starting address in aux heap for the array value = FS having that array When deserializing a modification, used to find the v3 FS and the offset in the array to modify. created when serializing (in case receive delta deser back). created when delta deserializing if not available from previous serialization. updated when delta deserializing. reset at end of delta deserializings because multiple mods not supported
      • isBeforeV3

        boolean isBeforeV3
        used to calculate total heap size
    • Constructor Detail

      • BinaryCasSerDes

        public BinaryCasSerDes​(CASImpl baseCAS)
    • Method Detail

      • reinit

        public void reinit​(CASSerializer ser)
        Deserializer for Java-object serialized instance of CASSerializer.
        Parameters:
        ser - - The instance to convert back to a CAS
      • reinit

        void reinit​(int[] heapMetadata,
                    int[] heapArray,
                    java.lang.String[] stringTable,
                    int[] fsIndex,
                    byte[] byteHeapArray,
                    short[] shortHeapArray,
                    long[] longHeapArray)
        This is for deserializing (never delta) from a serialized java object representation or maybe from the JNI bridge both callers do a cas reset of some kind
        Parameters:
        heapMetadata - -
        heapArray - -
        stringTable - -
        fsIndex - -
        byteHeapArray - -
        shortHeapArray - -
        longHeapArray - -
      • setupCasFromCasMgrSerializer

        public CASImpl setupCasFromCasMgrSerializer​(CASMgrSerializer casMgrSerializer)
      • reinit

        public void reinit​(CASCompleteSerializer casCompSer)
        Deserializer for CASCompleteSerializer instances - includes type system and index definitions Never delta
        Parameters:
        casCompSer - -
      • reinit

        public SerialFormat reinit​(java.io.InputStream istream)
                            throws CASRuntimeException
        see Blob Format in CASSerializer This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. Supports delta deserialization. For that, the the csds from the serialization event must be used.
        Parameters:
        istream - -
        Returns:
        - the format of the input stream detected
        Throws:
        CASRuntimeException - wraps IOException
      • reinit

        public SerialFormat reinit​(CommonSerDes.Header h,
                                   java.io.InputStream istream,
                                   CASMgrSerializer casMgrSerializer,
                                   CasLoadMode casLoadMode,
                                   BinaryCasSerDes6 f6,
                                   AllowPreexistingFS allowPreexistingFS,
                                   TypeSystemImpl ts)
                            throws CASRuntimeException
        Deserialize a binary input stream, after reading the header, and optionally an externally provided type system and index spec used in compressed form 6 serialization previously This reads in and deserializes CAS data from a stream. Byte swapping may be needed if the blob is from C++ -- C++ blob serialization writes data in native byte order. The corresponding serialization code is in org.apache.uima.cas.impl.Serialization, also see CasIOUtils
        Parameters:
        h - -
        istream - -
        casMgrSerializer - null or the Java object representing the externally supplied type and maybe indexes definition (TSI)
        casLoadMode - DEFAULT or REINIT. REINIT required with compressed form 6 to reinitialize the cas's type system and index definition, for form 6.
        f6 - only used for form 6 where an instance of BinaryCasSerDes6 has been initialized
        allowPreexistingFS - only used for form 6 delta deserialization
        ts - the type system
        Returns:
        the format that was deserialized
        Throws:
        CASRuntimeException - wraps IOException
      • binaryDeserialization

        private SerialFormat binaryDeserialization​(CommonSerDes.Header h)
        build a model of the heap, string and aux heaps. For delta deserialization, this is presumed to be in response to a previous serialization for delta - these can be just for the new ones read into these recreate / update V3 feature structures from this data delta CAS supported use case: CAS(1) -> binary serialize -> binary deserialize -> CAS(2). CAS(2) has mark set (before any new activity in deserialized CAS) CAS(2) has updates - new FSs, and mods to existing ones CAS(2) -> delta binary ser -> delta binary deser -> CAS(1). V3 supports the above scenario by retaining some information in CAS(2) at the end of the initial deserialization, including the model heap size/cellsUsed. - this is needed to properly do a compatible-with-v2 delta serialization. delta CAS edge use cases not supported: serialize (not binary), then receive delta binary serialization Both v2 and v3 assume that the delta mark is set immediately after binary deserialization; otherwise, subsequent binary deserialization of the delta will fail. This method assumes a previous binary serialization was done, and the following data structures are still valid (i.e. no CAS altering operations have been done) (these are reset: heap, stringHeap, byteHeap, shortHeap, longHeap) csds, [string/byte/short/long]auxAddr2fs (for array mods) nextHeapAddrAfterMark, next[string/byte/short/long]HeapAddrAfterMark
        Parameters:
        h - the Header (read by the caller)
        Returns:
        the format of the incoming serialized data
      • setHeapExtents

        void setHeapExtents()
      • updateAuxArrayMods

        int updateAuxArrayMods​(CommonSerDes.Reading r,
                               Int2ObjHashMap<TOP,​TOP> auxAddr2fsa,
                               Consumer_T_int_withIOException<TOP> setter)
                        throws java.io.IOException
        Called 3 times to process non-compressed binary deserialization of aux array modifications - once for byte/boolean, short, and long/double
        Returns:
        heapsz (used by caller to do word alignment)
        Throws:
        java.io.IOException
      • reinitIndexedFSs

        void reinitIndexedFSs​(int[] fsIndex,
                              boolean isDeltaMods,
                              java.util.function.IntFunction<TOP> getFsFromAddr)
        This routine is used by several of the deserializers. Each one may have a different way to go from the addr to the fs e.g. Compressed form 6: fsStartIndexes.getSrcFsFromTgtSeq(...) plain binary: addr2fs.get(...) gets number of views, number of sofas, For all sofas, adds them to the index repo in the base index registers the sofa insures initial view created for all views: does the view action and updates the documentannotation
        Parameters:
        fsIndex - - array of fsRefs and counts, for sofas, and all views
        isDeltaMods - - true for calls which are for delta mods - these have adds/removes
      • reinitIndexedFSs

        void reinitIndexedFSs​(int[] fsIndex,
                              boolean isDeltaMods,
                              java.util.function.IntFunction<TOP> getFsFromAddr,
                              java.util.function.IntFunction<TOP> getSofaFromAddr)
      • reinitIndexedFSsSofas

        int reinitIndexedFSsSofas​(int[] fsIndex,
                                  boolean isDeltaMods,
                                  java.util.function.IntFunction<TOP> getFsFromAddr)
      • reinitIndexedFSs

        void reinitIndexedFSs​(int[] fsIndex,
                              boolean isDeltaMods,
                              java.util.function.IntFunction<TOP> getFsFromAddr,
                              int numViews,
                              int idx)
      • reinitDeltaIndexedFSsInner

        void reinitDeltaIndexedFSsInner​(FSIndexRepositoryImpl ir,
                                        int[] fsindexes,
                                        int idx,
                                        int length,
                                        boolean isAdd,
                                        java.util.function.IntFunction<TOP> getFsFromAddr)
        Given a list of FSs and a starting index and length: iterate over the FSs, and add or remove that from the indexes.
        Parameters:
        ir - index repository
        fss - the list having the fss
        fsIdx - the starting index
        length - the length
        isAdd - true to add, false to remove
      • getSortedArrayAddrsIndex

        private int getSortedArrayAddrsIndex​(int[] sortedArrayAddrs,
                                             int auxAddr,
                                             int sortedArrayAddrsIndex)
        given an aux address representing an element of an array, find the start of the array Fast path for the same as before array. binary search of subsequent ones (the addresses in the serializations are not sorted.)
        Parameters:
        sortedStarts - the sorted array of start addresses
        auxAddr - the address being updated
        currentStart - the last value found for fast path
        Returns:
        index into the sortedStarts
      • createStringTableFromArray

        void createStringTableFromArray​(java.lang.String[] stringTable)
      • getFsSpaceReq

        public static int getFsSpaceReq​(TOP fs,
                                        TypeImpl type)
      • scanAllFSsForBinarySerialization

        java.util.List<TOP> scanAllFSsForBinarySerialization​(MarkerImpl mark,
                                                             CommonSerDesSequential csds)
        Called when serializing a cas, or deserializing a delta CAS, if not saved in that case from a previous binary serialization (in that case, the scan is done as if it is doing a non-delta serialization). Initialize the serialization model for binary serialization in CASSerializer from a CAS Do 2 scans, each by walking all the reachable FSs - The first one processes all fs (including for delta, those below the line) -- computes the fs to addr map and its inverse, based on the size of each FS. -- done by CommonSerDesSequential class's "setup" method - The second one computes the values of the main and aux heaps and string heaps except for delta mods -- for delta, the heaps only have "new" values that binary serialization will write out as arrays --- mods are computed from FsChange info and added to the appropriate heaps, later - for byte/short/long/string array use, compute auxAddr2fsa maps. This is used when deserializing delta mod info, to locate the fs to update For delta serialization, the heaps are populated only with the new values. - Values "nextXXHeapAddrAfterMark" are added to main heap refs to aux heaps and to string tables, so they are correct after deserialization does delta deserialization and adds the aux heap and string heap info to the existing heaps. This is also done for the main heap refs, so that refs to existing FSs below the line and above the line are treated uniformly. The results must be retained for the use case of subsequently receiving back a delta cas.
        Parameters:
        cs - the CASSerializer instance used to record the results of the scan
        mark - null or the mark to use for separating the new from from the previously existing used by delta cas.
        Returns:
        null or for delta, all the found FSs
      • extractFsToV2Heaps

        private void extractFsToV2Heaps​(TOP fs,
                                        boolean isMarkSet,
                                        Obj2IntIdentityHashMap<TOP> fs2addr)
        called in fs._id order to populate heaps from all FSs. For delta cas, only called for new above-the-line FSs
        Parameters:
        fs - Feature Structure to use to set heaps
        isMarkSet - true if mark is set, used to compute first
      • createFSsFromHeaps

        private void createFSsFromHeaps​(boolean isDelta,
                                        int startPos,
                                        CommonSerDesSequential csds)
        Given the deserialized main heap, byte heap, short heap, long heap and string heap, a) create the corresponding FSs, populating a b) addr2fs map, key = fsAddr, value = FS c) auxAddr2fs map, key = aux Array Start addr, value = FS corresponding to that primitive bool/byte/short/long/double array For some use cases, the byte / short / long heaps have not yet been initialized. - when data is available, deserialization will update the values in the fs directly Each new fs created augments the addr2fs map. - forward fs refs are put into deferred update list deferModFs Each new fs created which is a Boolean/Byte/Short/Long/Double array updates auxAddr2fsa map if the aux data is not available (update is put on deferred list). deferModByte deferModShort deferModLong Each new fs created which has a slot referencing a long/double not yet read in creates a deferred update specifying the fs, the slot, indexed by the addr in the aux table. see deferModStr deferModLong deferModDouble Notes: Subtypes of AnnotationBase created in the right view DocumentAnnotation - update out-of-indexes FSs not subtypes of AnnotationBase are **all** associated with the initial view. Delta serialization: this routine adds just the new (above-the-line) FSs, and augments existing addr2fs and auxAddr2fsa
      • setFeatOrDefer

        private void setFeatOrDefer​(int heapIndex,
                                    FeatureImpl feat,
                                    java.util.List<java.lang.Runnable> fixups4forwardFsRefs,
                                    java.util.function.Consumer<TOP> setter,
                                    Int2ObjHashMap<TOP,​TOP> addr2fs)
      • heapFeat

        private int heapFeat​(int nextFsAddr,
                             FeatureImpl feat)
      • updateHeapSlot

        private void updateHeapSlot​(BinaryCasSerDes.BinDeserSupport bds,
                                    int slotAddr,
                                    int slotValue,
                                    Int2ObjHashMap<TOP,​TOP> addr2fs)
        Doing updates for delta cas for existing objects. Cases: - item in heap-stored-array = update the corresponding item in the FS - non-ref in feature slot - update the corresponding feature - ref (to long/double value, to string) -- these always reference entries in long/string tables that are new (above the line) -- these have already been deserialized - ref (to main heap) - can update this directly NOTE: entire aux arrays never have their refs to the aux heaps updated, for arrays of boolean, byte, short, long, double NOTE: Slot updates for FS refs always point to addr which are in the addr2fs table or are 0 (null), because if the ref is to a new one, those have been already deserialized by this point, and if the ref is to a below-the-line one, those are already put into the addr2fs table
        Parameters:
        bds - - helper data
        slotAddr - - the main heap slot addr being updated
        slotValue - - the new value
      • updateStringFeature

        private boolean updateStringFeature​(TOP fs,
                                            FeatureImpl feat,
                                            java.lang.String s,
                                            java.util.List<java.lang.Runnable> fixups4forwardFsRefs)
        Parameters:
        fs -
        feat -
        s -
        fixups4forwardFsRefs -
        Returns:
        true if caller needs to do an appropriate fs._setStringValue...
      • clearDeltaOffsets

        private void clearDeltaOffsets()
      • clearAuxAddr2fsa

        private void clearAuxAddr2fsa()
      • clear

        public void clear()
        called by cas reset