Class CASSerializer

  • All Implemented Interfaces:
    java.io.Serializable

    public class CASSerializer
    extends java.lang.Object
    implements java.io.Serializable
    This object has 2 purposes. - it can hold a collection of individually Java-object-serializable objects representing a CAS + the list of FS's indexed in the CAS - it has special methods (versions of addCAS) to do a custom binary serialization (no compression) of a CAS + lists of its indexed FSs. One use of this class follows this form: 1) create an instance of this class 2) add a Cas to it (via addCAS methods) 3) use the instance of this class as the argument to anObjectOutputStream.writeObject(anInstanceOfThisClass) In UIMA this is done in the SerializationUtils class; it appears to be used for Vinci service adapters. There are also custom serialization methods that serialize to outputStreams. The format of the serialized data is in one of several formats: normal Java object serialization / custom binary serialization The custom binary serialization is in several formats: full / delta: full - the entire cas delta - only differences from a previous "mark" are serialized This class only does uncompressed forms of custom binary serialization. This class is for internal use. Some of the serialized formats are readable by the C++ implementation, and used for efficiently transferring CASes between Java frameworks and other ones. Others are used with Vinci to communicate to remote annotators. To serialize the type definition and index specifications for a CAS
    See Also:
    CASMgrSerializer, Serialized Form
    • Field Detail

      • heapArray

        public int[] heapArray
      • heapMetaData

        public int[] heapMetaData
      • stringTable

        public java.lang.String[] stringTable
      • fsIndex

        public int[] fsIndex
      • byteHeapArray

        public byte[] byteHeapArray
      • shortHeapArray

        public short[] shortHeapArray
      • longHeapArray

        public long[] longHeapArray
    • Constructor Detail

      • CASSerializer

        public CASSerializer()
        Constructor for CASSerializer.
    • Method Detail

      • addNoMetaData

        public void addNoMetaData​(CASImpl casImpl)
        Serialize CAS data without heap-internal meta data. Currently used for serialization to C++.
        Parameters:
        casImpl - The CAS to be serialized.
      • addCAS

        public void addCAS​(CASImpl cas)
        Add the CAS to be serialized. Note that we need the implementation here, the interface is not enough.
        Parameters:
        cas - The CAS to be serialized.
      • addCAS

        public void addCAS​(CASImpl cas,
                           boolean addMetaData)
        Add the CAS to be serialized.
        Parameters:
        cas - The CAS to be serialized.
        addMetaData - - true to include metadata
      • addTsiCAS

        void addTsiCAS​(CASImpl cas,
                       java.io.OutputStream ostream)
      • addCAS

        public void addCAS​(CASImpl cas,
                           java.io.OutputStream ostream)
        Serializes the CAS data and writes it to the output stream. --------------------------------------------------------------------- Blob Format Element Size Number of Description (bytes) Elements ------------ --------- -------------------------------- 4 1 Blob key = "UIMA" in utf-8 4 1 Version (currently = 1) 4 1 size of 32-bit FS Heap array = s32H 4 s32H 32-bit FS heap array 4 1 size of 16-bit string Heap array = sSH 2 sSH 16-bit string heap array 4 1 size of string Ref Heap zrray = sSRH 4 2*sSRH string ref offsets and lengths 4 1 size of FS index array = sFSI 4 sFSI FS index array 4 1 size of 8-bit Heap array = s8H 1 s8H 8-bit Heap array 4 1 size of 16-bit Heap array = s16H 2 s16H 16-bit Heap array 4 1 size of 64-bit Heap array = s64H 8 s64H 64-bit Heap array ---------------------------------------------------------------------
        Parameters:
        cas - The CAS to be serialized. ostream The output stream.
        ostream - -
      • addCAS

        public void addCAS​(CASImpl cas,
                           java.io.OutputStream ostream,
                           boolean includeTsi)
      • addCAS

        public void addCAS​(CASImpl cas,
                           java.io.OutputStream ostream,
                           Marker trackingMark)
        Serializes only new and modified FS and index operations made after the tracking mark is created. Serializes CAS data in binary Delta format described below and writes it to the output stream. ElementSize NumberOfElements Description ----------- ---------------- --------------------------------------------------------- 4 1 Blob key = "UIMA" in utf-8 (byte order flag) 4 1 Version (1 = complete cas, 2 = delta cas) 4 1 size of 32-bit heap array = s32H 4 s32H 32-bit FS heap array (new elements) 4 1 size of 16-bit string Heap array = sSH 2 sSH 16-bit string heap array (new strings) 4 1 size of string Ref Heap array = sSRH 4 2*sSRH string ref offsets and lengths (for new strings) 4 1 number of modified, preexisting 32-bit modified FS heap elements = sM32H 4 2*sM32H 32-bit heap offset and value (preexisting cells modified) 4 1 size of FS index array = sFSI 4 sFSI FS index array in Delta format 4 1 size of 8-bit Heap array = s8H 1 s8H 8-bit Heap array (new elements) 4 1 size of 16-bit Heap array = s16H 2 s16H 16-bit Heap array (new elements) 4 1 size of 64-bit Heap array = s64H 8 s64H 64-bit Heap array (new elements) 4 1 number of modified, preexisting 8-bit heap elements = sM8H 4 sM8H 8-bit heap offsets (preexisting cells modified) 1 sM8H 8-bit heap values (preexisting cells modified) 4 1 number of modified, preexisting 16-bit heap elements = sM16H 4 sM16H 16-bit heap offsets (preexisting cells modified) 2 sM16H 16-bit heap values (preexisting cells modified) 4 1 number of modified, preexisting 64-bit heap elements = sM64H 4 sM64H 64-bit heap offsets (preexisting cells modified) 2 sM64H 64-bit heap values (preexisting cells modified)
        Parameters:
        cas - -
        ostream - -
        trackingMark - -
      • convertArrayIndexToAuxHeapAddr

        private static int convertArrayIndexToAuxHeapAddr​(BinaryCasSerDes bcsd,
                                                          int index,
                                                          TOP fs,
                                                          Obj2IntIdentityHashMap<TOP> fs2auxOffset)
        The offset in the modeled heaps:
        Parameters:
        index - the 0-based index into the array
        fs - the feature structure representing the array
        Returns:
        the addr into an aux array or main heap
      • convertArrayIndexToMainHeapAddr

        private static int convertArrayIndexToMainHeapAddr​(int index,
                                                           TOP fs,
                                                           Obj2IntIdentityHashMap<TOP> fs2addr)
      • scanModifications

        static void scanModifications​(BinaryCasSerDes bcsd,
                                      CommonSerDesSequential csds,
                                      CASImpl.FsChange[] fssModified,
                                      Obj2IntIdentityHashMap<TOP> fs2auxOffset,
                                      java.util.List<CASSerializer.AddrPlusValue> chgMainAvs,
                                      java.util.List<CASSerializer.AddrPlusValue> chgByteAvs,
                                      java.util.List<CASSerializer.AddrPlusValue> chgShortAvs,
                                      java.util.List<CASSerializer.AddrPlusValue> chgLongAvs)
        Scan the v3 fsChange info and produce v2 style info into chgXxxAddr, chgXxxValue A prescan approach is needed in order to write the number of modifications preceding the write of the values (which unfortunately were written to the same stream in V2).
        Parameters:
        bcsd - holds the model needed for v2 aux arrays
        cas - the cas to use for the delta serialization
        chgMainHeapAddr - an ordered collection of changed addresses as an array for the main heap
        chgByteAddr - an ordered collection of changed addresses as an array for the aux byte heap
        chgShortAddr - an ordered collection of changed addresses as an array for the aus short heap
        chgLongAddr - an ordered collection of changed addresses as an array for the aux long heap
        chgMainHeapValue - corresponding values
      • getHeapMetadata

        int[] getHeapMetadata()
      • getHeapArray

        int[] getHeapArray()
      • getStringTable

        java.lang.String[] getStringTable()
      • getFSIndex

        int[] getFSIndex()
      • getByteArray

        byte[] getByteArray()
      • getShortArray

        short[] getShortArray()
      • getLongArray

        long[] getLongArray()
      • copyHeapsToArrays

        private void copyHeapsToArrays​(BinaryCasSerDes bcsd)