Class ArtifactProducer

  • All Implemented Interfaces:
    java.lang.Runnable

    public class ArtifactProducer
    extends java.lang.Object
    implements java.lang.Runnable
    Component responsible for continuously filling a work queue with bundles containing Cas'es. The queue is shared with a Processing Pipeline that consumes bundles of Cas. As soon as the the bundle is removed from the queue, this component fetches data from configured Collection Reader and enques it onto the queue. This component facilitates asynchronous reading and processing of CAS by seperate threads running in the CPE. When end of processing is reached due to CPM shutdown or max number of entities are processed a special token, called EOFToken is placed onto a queue. It marks end of processing for Processing Units. No more data is expected to be placed on the work queue. The Processing Threads upon seeing the EOFToken are expected to complete processing and do necessary cleanup.
    • Field Detail

      • threadState

        public int threadState
        The thread state.
      • casPool

        private CPECasPool casPool
        The cas pool.
      • readerFetchSize

        private int readerFetchSize
        The reader fetch size.
      • casList

        private CAS[] casList
        The cas list.
      • entityCount

        private long entityCount
        The entity count.
      • maxToProcess

        private long maxToProcess
        The max to process.
      • cpmStatTable

        private java.util.Map cpmStatTable
        The cpm stat table.
      • lastDocId

        private java.lang.String[] lastDocId
        The last doc id.
      • totalFetchTime

        private long totalFetchTime
        The total fetch time.
      • callbackListeners

        private java.util.ArrayList callbackListeners
        The callback listeners.
      • timedoutDocs

        private java.util.Hashtable timedoutDocs
        The timedout docs.
      • isRunning

        private boolean isRunning
        The is running.
      • globalSharedProcessTrace

        private ProcessTrace globalSharedProcessTrace
        The global shared process trace.
    • Constructor Detail

      • ArtifactProducer

        public ArtifactProducer​(CPMEngine acpm)
        Instantiates and initializes this instance.
        Parameters:
        acpm - the acpm
      • ArtifactProducer

        public ArtifactProducer​(CPMEngine acpm,
                                CPECasPool aPool)
        Construct instance of this class with a reference to the cpe engine and a pool of cas'es.
        Parameters:
        acpm - - reference to the cpe
        aPool - - pool of cases
    • Method Detail

      • isRunning

        public boolean isRunning()
        Checks if is running.
        Returns:
        true, if is running
      • setUimaTimer

        public void setUimaTimer​(UimaTimer aTimer)
        Plug in Custom Timer to time events.
        Parameters:
        aTimer - - custom timer
      • setProcessTrace

        public void setProcessTrace​(ProcessTrace aProcTrace)
        Sets the process trace.
        Parameters:
        aProcTrace - the new process trace
      • getCollectionReaderTotalFetchTime

        public long getCollectionReaderTotalFetchTime()
        Returns total time spent when fetching entities from a CollectionReader. This provides a way of gauging throughput of a particular CR.
        Returns:
        total time spent when fetching entities. -1 when the fetch time is unknown.
      • cleanup

        public void cleanup()
        Null out fields of this object. Call this only when this object is no longer needed.
      • setNumEntitiesToProcess

        public void setNumEntitiesToProcess​(long aNumToProcess)
        Assign total number of entities to process.
        Parameters:
        aNumToProcess - - number of entities to read from the Collection Reader
      • setCollectionReader

        public void setCollectionReader​(BaseCollectionReader aCollectionReader)
        Assign CollectionReader to be used for reading.
        Parameters:
        aCollectionReader - - collection reader as source of data
      • setWorkQueue

        public void setWorkQueue​(BoundedWorkQueue aQueue)
        Assigns a queue where the artifacts produced by this component will be deposited.
        Parameters:
        aQueue - - queue for the artifacts this class is producing
      • setCPMStatTable

        public void setCPMStatTable​(java.util.Map aStatTable)
        Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.
        Parameters:
        aStatTable - the new CPM stat table
      • endOfProcessingReached

        private boolean endOfProcessingReached()
        Determines if the CPM has processed configured number of entities. Called after each fetch from the Collection Reader.
        Returns:
        true - all configurted entities processed, false otherwise
      • fillQueue

        public void fillQueue()
                       throws java.lang.Exception
        Fills the queue up to capacity. This is called before activating ProcessingPipeline as means of optimizing processing. When pipelines start up there are already entities in the work queue to process.
        Throws:
        java.lang.Exception - the exception
      • readNext

        private java.lang.Object[] readNext​(int fetchSize)
                                     throws java.io.IOException,
                                            CollectionException
        Reads next set of entities from the CollectionReader. This method may return more than one Cas at a time.
        Parameters:
        fetchSize - the fetch size
        Returns:
        - The Object returned from the method depends on the type of the CollectionReader. Either CASData[] or CASObject[] initialized with document metadata and content is returned. If the CollectionReader has no more entities (EOF), null is returned.
        Throws:
        java.io.IOException - - error while reading corpus
        CollectionException - -
      • run

        public void run()
        Runs this thread until the CPM halts or the CollectionReader has no more entities. It continuously fills the work queue with entities returned by the CollectionReader.
        Specified by:
        run in interface java.lang.Runnable
      • notifyListeners

        private void notifyListeners​(CAS aCas,
                                     java.lang.Exception anException)
        Notify registered callback listeners of a given exception.
        Parameters:
        aCas - the a cas
        anException - - exception to propagate to callback listeners
      • placeEOFToken

        private void placeEOFToken()
        Place terminating EOFToken into a Work Queue. Any thread reading this token from the queue is responsible for terminating itself.
      • getLastDocId

        public java.lang.String getLastDocId()
        Gets the last doc id.
        Returns:
        the last doc id
      • invalidate

        public void invalidate​(CAS[] aCasList)
        Invalidate.
        Parameters:
        aCasList - the a cas list