Class XmlDetagger

  • All Implemented Interfaces:
    AnalysisComponent

    public class XmlDetagger
    extends CasAnnotator_ImplBase
    A multi-sofa annotator that does XML detagging. Reads XML data from the input Sofa (named "xmlDocument"); this data can be stored in the CAS as a string or array, or it can be a URI to a remote file. The XML is parsed using the JVM's default parser, and the plain-text content is written to a new sofa called "plainTextDocument".
    • Field Detail

      • PARAM_TEXT_TAG

        public static final java.lang.String PARAM_TEXT_TAG
        Name of optional configuration parameter that contains the name of an XML tag that appears in the input file. Only text that falls within this XML tag will be considered part of the "document" that it is added to the CAS by this CAS Initializer. If not specified, the entire file will be considered the document.
        See Also:
        Constant Field Values
      • parserFactory

        private javax.xml.parsers.SAXParserFactory parserFactory
      • sourceDocInfoType

        private Type sourceDocInfoType
      • mXmlTagContainingText

        private java.lang.String mXmlTagContainingText
    • Constructor Detail

      • XmlDetagger

        public XmlDetagger()
    • Method Detail

      • initialize

        public void initialize​(UimaContext aContext)
                        throws ResourceInitializationException
        Description copied from interface: AnalysisComponent
        Performs any startup tasks required by this component. The framework calls this method only once, just after the AnalysisComponent has been instantiated.

        The framework supplies this AnalysisComponent with a reference to the UimaContext that it will use, for example to access configuration settings or resources. This AnalysisComponent should store a reference to its the UimaContext for later use.

        Specified by:
        initialize in interface AnalysisComponent
        Overrides:
        initialize in class AnalysisComponent_ImplBase
        Parameters:
        aContext - Provides access to services and resources managed by the framework. This includes configuration parameters, logging, and access to external resources.
        Throws:
        ResourceInitializationException - if this AnalysisComponent cannot initialize successfully.
      • typeSystemInit

        public void typeSystemInit​(TypeSystem aTypeSystem)
                            throws AnalysisEngineProcessException
        Description copied from class: CasAnnotator_ImplBase
        Informs this annotator that the CAS TypeSystem has changed. The Analysis Engine calls this from PrimitiveAnalysisEngine_impl which-calls CasAnnotator_ImplBase.process which-calls checkTypeSystemChange

        In this method, the Annotator should use the TypeSystem to resolve the names of Type and Features to the actual Type and Feature objects, which can then be used during processing.

        Overrides:
        typeSystemInit in class CasAnnotator_ImplBase
        Parameters:
        aTypeSystem - the new type system to use as input to your initialization
        Throws:
        AnalysisEngineProcessException - if the provided type system is missing types or features required by this annotator
      • getDescription

        public static AnalysisEngineDescription getDescription()
                                                        throws InvalidXMLException
        Parses and returns the descriptor for this Analysis Gnein. The descriptor is stored in the uima-core.jar file and located using the ClassLoader.
        Returns:
        an object containing all of the information parsed from the descriptor.
        Throws:
        InvalidXMLException - if the descriptor is invalid or missing
      • getDescriptorURL

        public static java.net.URL getDescriptorURL()