Class ArtifactProducer
- java.lang.Object
-
- org.apache.uima.collection.impl.cpm.engine.ArtifactProducer
-
- All Implemented Interfaces:
java.lang.Runnable
public class ArtifactProducer extends java.lang.Object implements java.lang.Runnable
Component responsible for continuously filling a work queue with bundles containing Cas'es. The queue is shared with a Processing Pipeline that consumes bundles of Cas. As soon as the the bundle is removed from the queue, this component fetches data from configured Collection Reader and enques it onto the queue. This component facilitates asynchronous reading and processing of CAS by seperate threads running in the CPE. When end of processing is reached due to CPM shutdown or max number of entities are processed a special token, called EOFToken is placed onto a queue. It marks end of processing for Processing Units. No more data is expected to be placed on the work queue. The Processing Threads upon seeing the EOFToken are expected to complete processing and do necessary cleanup.
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.ArrayList
callbackListeners
The callback listeners.private CAS[]
casList
The cas list.private CPECasPool
casPool
The cas pool.private BaseCollectionReader
collectionReader
The collection reader.private CPMEngine
cpm
The cpm.private java.util.Map
cpmStatTable
The cpm stat table.private long
entityCount
The entity count.private ProcessTrace
globalSharedProcessTrace
The global shared process trace.private boolean
isRunning
The is running.private java.lang.String[]
lastDocId
The last doc id.private long
maxToProcess
The max to process.private int
readerFetchSize
The reader fetch size.int
threadState
The thread state.private java.util.Hashtable
timedoutDocs
The timedout docs.private UimaTimer
timer
The timer.private long
totalFetchTime
The total fetch time.private BoundedWorkQueue
workQueue
The work queue.
-
Constructor Summary
Constructors Constructor Description ArtifactProducer(CPMEngine acpm)
Instantiates and initializes this instance.ArtifactProducer(CPMEngine acpm, CPECasPool aPool)
Construct instance of this class with a reference to the cpe engine and a pool of cas'es.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
cleanup()
Null out fields of this object.private boolean
endOfProcessingReached()
Determines if the CPM has processed configured number of entities.void
fillQueue()
Fills the queue up to capacity.long
getCollectionReaderTotalFetchTime()
Returns total time spent when fetching entities from a CollectionReader.java.lang.String
getLastDocId()
Gets the last doc id.void
invalidate(CAS[] aCasList)
Invalidate.boolean
isRunning()
Checks if is running.private void
notifyListeners(CAS aCas, java.lang.Exception anException)
Notify registered callback listeners of a given exception.private void
placeEOFToken()
Place terminating EOFToken into a Work Queue.private java.lang.Object[]
readNext(int fetchSize)
Reads next set of entities from the CollectionReader.void
run()
Runs this thread until the CPM halts or the CollectionReader has no more entities.void
setCollectionReader(BaseCollectionReader aCollectionReader)
Assign CollectionReader to be used for reading.void
setCPMStatTable(java.util.Map aStatTable)
Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.void
setNumEntitiesToProcess(long aNumToProcess)
Assign total number of entities to process.void
setProcessTrace(ProcessTrace aProcTrace)
Sets the process trace.void
setUimaTimer(UimaTimer aTimer)
Plug in Custom Timer to time events.void
setWorkQueue(BoundedWorkQueue aQueue)
Assigns a queue where the artifacts produced by this component will be deposited.
-
-
-
Field Detail
-
threadState
public int threadState
The thread state.
-
casPool
private CPECasPool casPool
The cas pool.
-
workQueue
private BoundedWorkQueue workQueue
The work queue.
-
collectionReader
private BaseCollectionReader collectionReader
The collection reader.
-
readerFetchSize
private int readerFetchSize
The reader fetch size.
-
casList
private CAS[] casList
The cas list.
-
entityCount
private long entityCount
The entity count.
-
maxToProcess
private long maxToProcess
The max to process.
-
cpm
private CPMEngine cpm
The cpm.
-
cpmStatTable
private java.util.Map cpmStatTable
The cpm stat table.
-
lastDocId
private java.lang.String[] lastDocId
The last doc id.
-
totalFetchTime
private long totalFetchTime
The total fetch time.
-
timer
private UimaTimer timer
The timer.
-
callbackListeners
private java.util.ArrayList callbackListeners
The callback listeners.
-
timedoutDocs
private java.util.Hashtable timedoutDocs
The timedout docs.
-
isRunning
private boolean isRunning
The is running.
-
globalSharedProcessTrace
private ProcessTrace globalSharedProcessTrace
The global shared process trace.
-
-
Constructor Detail
-
ArtifactProducer
public ArtifactProducer(CPMEngine acpm)
Instantiates and initializes this instance.- Parameters:
acpm
- the acpm
-
ArtifactProducer
public ArtifactProducer(CPMEngine acpm, CPECasPool aPool)
Construct instance of this class with a reference to the cpe engine and a pool of cas'es.- Parameters:
acpm
- - reference to the cpeaPool
- - pool of cases
-
-
Method Detail
-
isRunning
public boolean isRunning()
Checks if is running.- Returns:
- true, if is running
-
setUimaTimer
public void setUimaTimer(UimaTimer aTimer)
Plug in Custom Timer to time events.- Parameters:
aTimer
- - custom timer
-
setProcessTrace
public void setProcessTrace(ProcessTrace aProcTrace)
Sets the process trace.- Parameters:
aProcTrace
- the new process trace
-
getCollectionReaderTotalFetchTime
public long getCollectionReaderTotalFetchTime()
Returns total time spent when fetching entities from a CollectionReader. This provides a way of gauging throughput of a particular CR.- Returns:
- total time spent when fetching entities. -1 when the fetch time is unknown.
-
cleanup
public void cleanup()
Null out fields of this object. Call this only when this object is no longer needed.
-
setNumEntitiesToProcess
public void setNumEntitiesToProcess(long aNumToProcess)
Assign total number of entities to process.- Parameters:
aNumToProcess
- - number of entities to read from the Collection Reader
-
setCollectionReader
public void setCollectionReader(BaseCollectionReader aCollectionReader)
Assign CollectionReader to be used for reading.- Parameters:
aCollectionReader
- - collection reader as source of data
-
setWorkQueue
public void setWorkQueue(BoundedWorkQueue aQueue)
Assigns a queue where the artifacts produced by this component will be deposited.- Parameters:
aQueue
- - queue for the artifacts this class is producing
-
setCPMStatTable
public void setCPMStatTable(java.util.Map aStatTable)
Add table that will contain statistics gathered while reading entities from a Collection This table is used for non-uima reports.- Parameters:
aStatTable
- the new CPM stat table
-
endOfProcessingReached
private boolean endOfProcessingReached()
Determines if the CPM has processed configured number of entities. Called after each fetch from the Collection Reader.- Returns:
- true - all configurted entities processed, false otherwise
-
fillQueue
public void fillQueue() throws java.lang.Exception
Fills the queue up to capacity. This is called before activating ProcessingPipeline as means of optimizing processing. When pipelines start up there are already entities in the work queue to process.- Throws:
java.lang.Exception
- the exception
-
readNext
private java.lang.Object[] readNext(int fetchSize) throws java.io.IOException, CollectionException
Reads next set of entities from the CollectionReader. This method may return more than one Cas at a time.- Parameters:
fetchSize
- the fetch size- Returns:
- - The Object returned from the method depends on the type of the CollectionReader. Either CASData[] or CASObject[] initialized with document metadata and content is returned. If the CollectionReader has no more entities (EOF), null is returned.
- Throws:
java.io.IOException
- - error while reading corpusCollectionException
- -
-
run
public void run()
Runs this thread until the CPM halts or the CollectionReader has no more entities. It continuously fills the work queue with entities returned by the CollectionReader.- Specified by:
run
in interfacejava.lang.Runnable
-
notifyListeners
private void notifyListeners(CAS aCas, java.lang.Exception anException)
Notify registered callback listeners of a given exception.- Parameters:
aCas
- the a casanException
- - exception to propagate to callback listeners
-
placeEOFToken
private void placeEOFToken()
Place terminating EOFToken into a Work Queue. Any thread reading this token from the queue is responsible for terminating itself.
-
getLastDocId
public java.lang.String getLastDocId()
Gets the last doc id.- Returns:
- the last doc id
-
invalidate
public void invalidate(CAS[] aCasList)
Invalidate.- Parameters:
aCasList
- the a cas list
-
-