Package org.apache.uima.tools.components
Class FileSystemCollectionReader
- java.lang.Object
-
- org.apache.uima.resource.Resource_ImplBase
-
- org.apache.uima.resource.ConfigurableResource_ImplBase
-
- org.apache.uima.collection.CollectionReader_ImplBase
-
- org.apache.uima.tools.components.FileSystemCollectionReader
-
- All Implemented Interfaces:
BaseCollectionReader
,CollectionReader
,ConfigurableResource
,Resource
public class FileSystemCollectionReader extends CollectionReader_ImplBase
A simple collection reader that reads documents from a directory in the filesystem. It can be configured with the following parameters:InputDirectory
- path to directory containing filesEncoding
(optional) - character encoding of the input filesLanguage
(optional) - language of the input documents
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
lenient
private int
mCurrentIndex
private java.lang.String
mEncoding
private java.util.ArrayList
mFiles
private java.lang.String
mLanguage
private boolean
mTEXT
private java.lang.String
mXCAS
static java.lang.String
PARAM_ENCODING
Name of configuration parameter that contains the character encoding used by the input files.static java.lang.String
PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.static java.lang.String
PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory.static java.lang.String
PARAM_LENIENT
Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknownstatic java.lang.String
PARAM_XCAS
Optional configuration parameter that specifies XCAS input files-
Fields inherited from interface org.apache.uima.resource.Resource
PARAM_AGGREGATE_SOFA_MAPPINGS, PARAM_CONFIG_MANAGER, PARAM_CONFIG_PARAM_SETTINGS, PARAM_EXTERNAL_OVERRIDE_SETTINGS, PARAM_PERFORMANCE_TUNING_SETTINGS, PARAM_RESOURCE_MANAGER, PARAM_UIMA_CONTEXT
-
-
Constructor Summary
Constructors Constructor Description FileSystemCollectionReader()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes thisCollectionReader
, after which it may no longer be used.static CollectionReaderDescription
getDescription()
Parses and returns the descriptor for this collection reader.static java.net.URL
getDescriptorURL()
void
getNext(CAS aCAS)
Gets the next element of the collection.int
getNumberOfDocuments()
Gets the total number of documents that will be returned by this collection reader.Progress[]
getProgress()
Gets information about the number of entities and/or amount of data that has been read from thisCollectionReader
, and the total amount that remains (if that information is available).boolean
hasNext()
Gets whether there are any elements remaining to be read from thisCollectionReader
.void
initialize()
This method is called during initialization, and does nothing by default.-
Methods inherited from class org.apache.uima.collection.CollectionReader_ImplBase
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
-
Methods inherited from class org.apache.uima.resource.ConfigurableResource_ImplBase
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
-
Methods inherited from class org.apache.uima.resource.Resource_ImplBase
getCasManager, getLogger, getMetaData, getRelativePathResolver, getResourceManager, getUimaContext, getUimaContextAdmin, loadUserClass, loadUserClassOrThrow, setContextHolder, setContextHolderX, setLogger, setMetaData, withContextHolder
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.uima.resource.ConfigurableResource
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
-
Methods inherited from interface org.apache.uima.resource.Resource
getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger
-
-
-
-
Field Detail
-
PARAM_INPUTDIR
public static final java.lang.String PARAM_INPUTDIR
Name of configuration parameter that must be set to the path of a directory containing input files.- See Also:
- Constant Field Values
-
PARAM_ENCODING
public static final java.lang.String PARAM_ENCODING
Name of configuration parameter that contains the character encoding used by the input files. If not specified, the default system encoding will be used.- See Also:
- Constant Field Values
-
PARAM_LANGUAGE
public static final java.lang.String PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of the documents in the input directory. If specified this information will be added to the CAS.- See Also:
- Constant Field Values
-
PARAM_XCAS
public static final java.lang.String PARAM_XCAS
Optional configuration parameter that specifies XCAS input files- See Also:
- Constant Field Values
-
PARAM_LENIENT
public static final java.lang.String PARAM_LENIENT
Name of the configuration parameter that must be set to indicate if the execution proceeds if an encountered type is unknown- See Also:
- Constant Field Values
-
mFiles
private java.util.ArrayList mFiles
-
mEncoding
private java.lang.String mEncoding
-
mLanguage
private java.lang.String mLanguage
-
mCurrentIndex
private int mCurrentIndex
-
mTEXT
private boolean mTEXT
-
mXCAS
private java.lang.String mXCAS
-
lenient
private boolean lenient
-
-
Method Detail
-
initialize
public void initialize() throws ResourceInitializationException
Description copied from class:CollectionReader_ImplBase
This method is called during initialization, and does nothing by default. Subclasses should override it to perform one-time startup logic.- Overrides:
initialize
in classCollectionReader_ImplBase
- Throws:
ResourceInitializationException
- if a failure occurs during initialization.- See Also:
CollectionReader_ImplBase.initialize()
-
hasNext
public boolean hasNext()
Description copied from interface:BaseCollectionReader
Gets whether there are any elements remaining to be read from thisCollectionReader
.- Returns:
- true if and only if there are more elements available from this
CollectionReader
. - See Also:
BaseCollectionReader.hasNext()
-
getNext
public void getNext(CAS aCAS) throws java.io.IOException, CollectionException
Description copied from interface:CollectionReader
Gets the next element of the collection. The element will be stored in the provided CAS object. If this is a consumingCollectionReader
(seeBaseCollectionReader.isConsuming()
), this element will also be removed from the collection.- Parameters:
aCAS
- the CAS to populate with the next element of the collection- Throws:
java.io.IOException
- if an I/O failure occursCollectionException
- if there is some other problem with reading from the Collection- See Also:
CollectionReader.getNext(org.apache.uima.cas.CAS)
-
close
public void close() throws java.io.IOException
Description copied from interface:BaseCollectionReader
Closes thisCollectionReader
, after which it may no longer be used.- Throws:
java.io.IOException
- if an I/O failure occurs- See Also:
BaseCollectionReader.close()
-
getProgress
public Progress[] getProgress()
Description copied from interface:BaseCollectionReader
Gets information about the number of entities and/or amount of data that has been read from thisCollectionReader
, and the total amount that remains (if that information is available).This method returns an array of
Progress
objects so that results can be reported using different units. For example, the CollectionReader could report progress in terms of the number of documents that have been read and also in terms of the number of bytes that have been read. In many cases, it will be sufficient to return just oneProgress
object.- Returns:
- an array of
Progress
objects. Each object may have different units (for example number of entities or bytes). - See Also:
BaseCollectionReader.getProgress()
-
getNumberOfDocuments
public int getNumberOfDocuments()
Gets the total number of documents that will be returned by this collection reader. This is not part of the general collection reader interface.- Returns:
- the number of documents in the collection
-
getDescription
public static CollectionReaderDescription getDescription() throws InvalidXMLException
Parses and returns the descriptor for this collection reader. The descriptor is stored in the uima.jar file and located using the ClassLoader.- Returns:
- an object containing all of the information parsed from the descriptor.
- Throws:
InvalidXMLException
- if the descriptor is invalid or missing
-
getDescriptorURL
public static java.net.URL getDescriptorURL()
-
-