Class ElementRemover

  • All Implemented Interfaces:
    org.apache.xerces.xni.parser.XMLComponent, org.apache.xerces.xni.parser.XMLDocumentFilter, org.apache.xerces.xni.parser.XMLDocumentSource, org.apache.xerces.xni.XMLDocumentHandler, HTMLComponent

    public class ElementRemover
    extends DefaultFilter
    This class is a document filter capable of removing specified elements from the processing stream. There are two options for processing document elements:
    • specifying those elements which should be accepted and, optionally, which attributes of that element should be kept; and
    • specifying those elements whose tags and content should be completely removed from the event stream.

    The first option allows the application to specify which elements appearing in the event stream should be accepted and, therefore, passed on to the next stage in the pipeline. All elements not in the list of acceptable elements have their start and end tags stripped from the event stream unless those elements appear in the list of elements to be removed.

    The second option allows the application to specify which elements should be completely removed from the event stream. When an element appears that is to be removed, the element's start and end tag as well as all of that element's content is removed from the event stream.

    A common use of this filter would be to only allow rich-text and linking elements as well as the character content to pass through the filter — all other elements would be stripped. The following code shows how to configure this filter to perform this task:

      ElementRemover remover = new ElementRemover();
      remover.acceptElement("b", null);
      remover.acceptElement("i", null);
      remover.acceptElement("u", null);
      remover.acceptElement("a", new String[] { "href" });
     

    However, this would still allow the text content of other elements to pass through, which may not be desirable. In order to further "clean" the input, the removeElement option can be used. The following piece of code adds the ability to completely remove any <SCRIPT> tags and content from the stream.

      remover.removeElement("script");
     

    Note: All text and accepted element children of a stripped element is retained. To completely remove an element's content, use the removeElement method.

    Note: Care should be taken when using this filter because the output may not be a well-balanced tree. Specifically, if the application removes the <HTML> element (with or without retaining its children), the resulting document event stream will no longer be well-formed.

    Version:
    $Id: ElementRemover.java,v 1.5 2005/02/14 03:56:54 andyc Exp $
    Author:
    Andy Clark
    • Constructor Summary

      Constructors 
      Constructor Description
      ElementRemover()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void acceptElement​(java.lang.String element, java.lang.String[] attributes)
      Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.
      void characters​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Characters.
      void comment​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Comment.
      protected boolean elementAccepted​(java.lang.String element)
      Returns true if the specified element is accepted.
      protected boolean elementRemoved​(java.lang.String element)
      Returns true if the specified element should be removed.
      void emptyElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
      Empty element.
      void endCDATA​(org.apache.xerces.xni.Augmentations augs)
      End CDATA section.
      void endElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
      End element.
      void endGeneralEntity​(java.lang.String name, org.apache.xerces.xni.Augmentations augs)
      End general entity.
      void endPrefixMapping​(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs)
      End prefix mapping.
      protected boolean handleOpenTag​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes)
      Handles an open tag.
      void ignorableWhitespace​(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
      Ignorable whitespace.
      void processingInstruction​(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)
      Processing instruction.
      void removeElement​(java.lang.String element)
      Specifies that the given element should be completely removed.
      void startCDATA​(org.apache.xerces.xni.Augmentations augs)
      Start CDATA section.
      void startDocument​(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Start document.
      void startDocument​(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs)
      Start document.
      void startElement​(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
      Start element.
      void startGeneralEntity​(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Start general entity.
      void startPrefixMapping​(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs)
      Start prefix mapping.
      void textDecl​(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
      Text declaration.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NULL

        protected static final java.lang.Object NULL
        A "null" object.
      • fAcceptedElements

        protected java.util.Hashtable fAcceptedElements
        Accepted elements.
      • fRemovedElements

        protected java.util.Hashtable fRemovedElements
        Removed elements.
      • fElementDepth

        protected int fElementDepth
        The element depth.
      • fRemovalElementDepth

        protected int fRemovalElementDepth
        The element depth at element removal.
    • Constructor Detail

      • ElementRemover

        public ElementRemover()
    • Method Detail

      • acceptElement

        public void acceptElement​(java.lang.String element,
                                  java.lang.String[] attributes)
        Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.
        Parameters:
        element - The element to accept.
        attributes - The list of attributes to be kept or null if no attributes should be kept for this element. see #removeElement
      • removeElement

        public void removeElement​(java.lang.String element)
        Specifies that the given element should be completely removed. If an element is encountered during processing that is on the remove list, the element's start and end tags as well as all of content contained within the element will be removed from the processing stream.
        Parameters:
        element - The element to completely remove.
      • startDocument

        public void startDocument​(org.apache.xerces.xni.XMLLocator locator,
                                  java.lang.String encoding,
                                  org.apache.xerces.xni.NamespaceContext nscontext,
                                  org.apache.xerces.xni.Augmentations augs)
                           throws org.apache.xerces.xni.XNIException
        Start document.
        Specified by:
        startDocument in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        startDocument in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • startDocument

        public void startDocument​(org.apache.xerces.xni.XMLLocator locator,
                                  java.lang.String encoding,
                                  org.apache.xerces.xni.Augmentations augs)
                           throws org.apache.xerces.xni.XNIException
        Start document.
        Overrides:
        startDocument in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • startPrefixMapping

        public void startPrefixMapping​(java.lang.String prefix,
                                       java.lang.String uri,
                                       org.apache.xerces.xni.Augmentations augs)
                                throws org.apache.xerces.xni.XNIException
        Start prefix mapping.
        Overrides:
        startPrefixMapping in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • startElement

        public void startElement​(org.apache.xerces.xni.QName element,
                                 org.apache.xerces.xni.XMLAttributes attributes,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        Start element.
        Specified by:
        startElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        startElement in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • emptyElement

        public void emptyElement​(org.apache.xerces.xni.QName element,
                                 org.apache.xerces.xni.XMLAttributes attributes,
                                 org.apache.xerces.xni.Augmentations augs)
                          throws org.apache.xerces.xni.XNIException
        Empty element.
        Specified by:
        emptyElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        emptyElement in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • comment

        public void comment​(org.apache.xerces.xni.XMLString text,
                            org.apache.xerces.xni.Augmentations augs)
                     throws org.apache.xerces.xni.XNIException
        Comment.
        Specified by:
        comment in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        comment in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • processingInstruction

        public void processingInstruction​(java.lang.String target,
                                          org.apache.xerces.xni.XMLString data,
                                          org.apache.xerces.xni.Augmentations augs)
                                   throws org.apache.xerces.xni.XNIException
        Processing instruction.
        Specified by:
        processingInstruction in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        processingInstruction in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • characters

        public void characters​(org.apache.xerces.xni.XMLString text,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        Characters.
        Specified by:
        characters in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        characters in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • ignorableWhitespace

        public void ignorableWhitespace​(org.apache.xerces.xni.XMLString text,
                                        org.apache.xerces.xni.Augmentations augs)
                                 throws org.apache.xerces.xni.XNIException
        Ignorable whitespace.
        Specified by:
        ignorableWhitespace in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        ignorableWhitespace in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • startGeneralEntity

        public void startGeneralEntity​(java.lang.String name,
                                       org.apache.xerces.xni.XMLResourceIdentifier id,
                                       java.lang.String encoding,
                                       org.apache.xerces.xni.Augmentations augs)
                                throws org.apache.xerces.xni.XNIException
        Start general entity.
        Specified by:
        startGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        startGeneralEntity in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • textDecl

        public void textDecl​(java.lang.String version,
                             java.lang.String encoding,
                             org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        Text declaration.
        Specified by:
        textDecl in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        textDecl in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • endGeneralEntity

        public void endGeneralEntity​(java.lang.String name,
                                     org.apache.xerces.xni.Augmentations augs)
                              throws org.apache.xerces.xni.XNIException
        End general entity.
        Specified by:
        endGeneralEntity in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        endGeneralEntity in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • startCDATA

        public void startCDATA​(org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        Start CDATA section.
        Specified by:
        startCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        startCDATA in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • endCDATA

        public void endCDATA​(org.apache.xerces.xni.Augmentations augs)
                      throws org.apache.xerces.xni.XNIException
        End CDATA section.
        Specified by:
        endCDATA in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        endCDATA in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • endElement

        public void endElement​(org.apache.xerces.xni.QName element,
                               org.apache.xerces.xni.Augmentations augs)
                        throws org.apache.xerces.xni.XNIException
        End element.
        Specified by:
        endElement in interface org.apache.xerces.xni.XMLDocumentHandler
        Overrides:
        endElement in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • endPrefixMapping

        public void endPrefixMapping​(java.lang.String prefix,
                                     org.apache.xerces.xni.Augmentations augs)
                              throws org.apache.xerces.xni.XNIException
        End prefix mapping.
        Overrides:
        endPrefixMapping in class DefaultFilter
        Throws:
        org.apache.xerces.xni.XNIException
      • elementAccepted

        protected boolean elementAccepted​(java.lang.String element)
        Returns true if the specified element is accepted.
      • elementRemoved

        protected boolean elementRemoved​(java.lang.String element)
        Returns true if the specified element should be removed.
      • handleOpenTag

        protected boolean handleOpenTag​(org.apache.xerces.xni.QName element,
                                        org.apache.xerces.xni.XMLAttributes attributes)
        Handles an open tag.