Class ElementRemover
- java.lang.Object
-
- org.cyberneko.html.filters.DefaultFilter
-
- org.cyberneko.html.filters.ElementRemover
-
- All Implemented Interfaces:
org.apache.xerces.xni.parser.XMLComponent
,org.apache.xerces.xni.parser.XMLDocumentFilter
,org.apache.xerces.xni.parser.XMLDocumentSource
,org.apache.xerces.xni.XMLDocumentHandler
,HTMLComponent
public class ElementRemover extends DefaultFilter
This class is a document filter capable of removing specified elements from the processing stream. There are two options for processing document elements:- specifying those elements which should be accepted and, optionally, which attributes of that element should be kept; and
- specifying those elements whose tags and content should be completely removed from the event stream.
The first option allows the application to specify which elements appearing in the event stream should be accepted and, therefore, passed on to the next stage in the pipeline. All elements not in the list of acceptable elements have their start and end tags stripped from the event stream unless those elements appear in the list of elements to be removed.
The second option allows the application to specify which elements should be completely removed from the event stream. When an element appears that is to be removed, the element's start and end tag as well as all of that element's content is removed from the event stream.
A common use of this filter would be to only allow rich-text and linking elements as well as the character content to pass through the filter — all other elements would be stripped. The following code shows how to configure this filter to perform this task:
ElementRemover remover = new ElementRemover(); remover.acceptElement("b", null); remover.acceptElement("i", null); remover.acceptElement("u", null); remover.acceptElement("a", new String[] { "href" });
However, this would still allow the text content of other elements to pass through, which may not be desirable. In order to further "clean" the input, the
removeElement
option can be used. The following piece of code adds the ability to completely remove any <SCRIPT> tags and content from the stream.remover.removeElement("script");
Note: All text and accepted element children of a stripped element is retained. To completely remove an element's content, use the
removeElement
method.Note: Care should be taken when using this filter because the output may not be a well-balanced tree. Specifically, if the application removes the <HTML> element (with or without retaining its children), the resulting document event stream will no longer be well-formed.
- Version:
- $Id: ElementRemover.java,v 1.5 2005/02/14 03:56:54 andyc Exp $
- Author:
- Andy Clark
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.Hashtable
fAcceptedElements
Accepted elements.protected int
fElementDepth
The element depth.protected int
fRemovalElementDepth
The element depth at element removal.protected java.util.Hashtable
fRemovedElements
Removed elements.protected static java.lang.Object
NULL
A "null" object.-
Fields inherited from class org.cyberneko.html.filters.DefaultFilter
fDocumentHandler, fDocumentSource
-
-
Constructor Summary
Constructors Constructor Description ElementRemover()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
acceptElement(java.lang.String element, java.lang.String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.void
characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
Characters.void
comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
Comment.protected boolean
elementAccepted(java.lang.String element)
Returns true if the specified element is accepted.protected boolean
elementRemoved(java.lang.String element)
Returns true if the specified element should be removed.void
emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
Empty element.void
endCDATA(org.apache.xerces.xni.Augmentations augs)
End CDATA section.void
endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs)
End element.void
endGeneralEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs)
End general entity.void
endPrefixMapping(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs)
End prefix mapping.protected boolean
handleOpenTag(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes)
Handles an open tag.void
ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs)
Ignorable whitespace.void
processingInstruction(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs)
Processing instruction.void
removeElement(java.lang.String element)
Specifies that the given element should be completely removed.void
startCDATA(org.apache.xerces.xni.Augmentations augs)
Start CDATA section.void
startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
Start document.void
startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs)
Start document.void
startElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs)
Start element.void
startGeneralEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
Start general entity.void
startPrefixMapping(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs)
Start prefix mapping.void
textDecl(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs)
Text declaration.-
Methods inherited from class org.cyberneko.html.filters.DefaultFilter
doctypeDecl, endDocument, getDocumentHandler, getDocumentSource, getFeatureDefault, getPropertyDefault, getRecognizedFeatures, getRecognizedProperties, merge, reset, setDocumentHandler, setDocumentSource, setFeature, setProperty, xmlDecl
-
-
-
-
Field Detail
-
NULL
protected static final java.lang.Object NULL
A "null" object.
-
fAcceptedElements
protected java.util.Hashtable fAcceptedElements
Accepted elements.
-
fRemovedElements
protected java.util.Hashtable fRemovedElements
Removed elements.
-
fElementDepth
protected int fElementDepth
The element depth.
-
fRemovalElementDepth
protected int fRemovalElementDepth
The element depth at element removal.
-
-
Method Detail
-
acceptElement
public void acceptElement(java.lang.String element, java.lang.String[] attributes)
Specifies that the given element should be accepted and, optionally, which attributes of that element should be kept.- Parameters:
element
- The element to accept.attributes
- The list of attributes to be kept or null if no attributes should be kept for this element. see #removeElement
-
removeElement
public void removeElement(java.lang.String element)
Specifies that the given element should be completely removed. If an element is encountered during processing that is on the remove list, the element's start and end tags as well as all of content contained within the element will be removed from the processing stream.- Parameters:
element
- The element to completely remove.
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.NamespaceContext nscontext, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start document.- Specified by:
startDocument
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
startDocument
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
startDocument
public void startDocument(org.apache.xerces.xni.XMLLocator locator, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start document.- Overrides:
startDocument
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start prefix mapping.- Overrides:
startPrefixMapping
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
startElement
public void startElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start element.- Specified by:
startElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
startElement
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
emptyElement
public void emptyElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Empty element.- Specified by:
emptyElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
emptyElement
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
comment
public void comment(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Comment.- Specified by:
comment
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
comment
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
processingInstruction
public void processingInstruction(java.lang.String target, org.apache.xerces.xni.XMLString data, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Processing instruction.- Specified by:
processingInstruction
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
processingInstruction
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
characters
public void characters(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Characters.- Specified by:
characters
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
characters
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
ignorableWhitespace
public void ignorableWhitespace(org.apache.xerces.xni.XMLString text, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Ignorable whitespace.- Specified by:
ignorableWhitespace
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
ignorableWhitespace
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
startGeneralEntity
public void startGeneralEntity(java.lang.String name, org.apache.xerces.xni.XMLResourceIdentifier id, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start general entity.- Specified by:
startGeneralEntity
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
startGeneralEntity
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
textDecl
public void textDecl(java.lang.String version, java.lang.String encoding, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Text declaration.- Specified by:
textDecl
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
textDecl
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
endGeneralEntity
public void endGeneralEntity(java.lang.String name, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
End general entity.- Specified by:
endGeneralEntity
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
endGeneralEntity
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
startCDATA
public void startCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
Start CDATA section.- Specified by:
startCDATA
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
startCDATA
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
endCDATA
public void endCDATA(org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
End CDATA section.- Specified by:
endCDATA
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
endCDATA
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
endElement
public void endElement(org.apache.xerces.xni.QName element, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
End element.- Specified by:
endElement
in interfaceorg.apache.xerces.xni.XMLDocumentHandler
- Overrides:
endElement
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix, org.apache.xerces.xni.Augmentations augs) throws org.apache.xerces.xni.XNIException
End prefix mapping.- Overrides:
endPrefixMapping
in classDefaultFilter
- Throws:
org.apache.xerces.xni.XNIException
-
elementAccepted
protected boolean elementAccepted(java.lang.String element)
Returns true if the specified element is accepted.
-
elementRemoved
protected boolean elementRemoved(java.lang.String element)
Returns true if the specified element should be removed.
-
handleOpenTag
protected boolean handleOpenTag(org.apache.xerces.xni.QName element, org.apache.xerces.xni.XMLAttributes attributes)
Handles an open tag.
-
-