Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
-
- Direct Known Subclasses:
CompareTool.CmpTaggedPdfReaderTool
public class TaggedPdfReaderTool extends java.lang.Object
Converts a tagged PDF document into an XML file.- Since:
- 5.0.2
-
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
convertToXml(PdfReader reader, java.io.OutputStream os)
Parses a string with structured content.void
convertToXml(PdfReader reader, java.io.OutputStream os, java.lang.String charset)
Parses a string with structured content.private static java.lang.String
fixTagName(java.lang.String tag)
void
inspectChild(PdfObject k)
Inspects a child of a structured element.void
inspectChildArray(PdfArray k)
If the child of a structured element is an array, we need to loop over the elements.void
inspectChildDictionary(PdfDictionary k)
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
inspectChildDictionary(PdfDictionary k, boolean inspectAttributes)
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
parseTag(java.lang.String tag, PdfObject object, PdfDictionary page)
Searches for a tag in a page.protected java.lang.String
xmlName(PdfName name)
-
-
-
Field Detail
-
reader
protected PdfReader reader
The reader object from which the content streams are read.
-
out
protected java.io.PrintWriter out
The writer object to which the XML will be written
-
-
Method Detail
-
convertToXml
public void convertToXml(PdfReader reader, java.io.OutputStream os, java.lang.String charset) throws java.io.IOException
Parses a string with structured content.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be writtencharset
- the charset to encode the data- Throws:
java.io.IOException
- Since:
- 5.0.5
-
convertToXml
public void convertToXml(PdfReader reader, java.io.OutputStream os) throws java.io.IOException
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be written- Throws:
java.io.IOException
-
inspectChild
public void inspectChild(PdfObject k) throws java.io.IOException
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k
- the child to inspect- Throws:
java.io.IOException
-
inspectChildArray
public void inspectChildArray(PdfArray k) throws java.io.IOException
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k
- the child array to inspect- Throws:
java.io.IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k) throws java.io.IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
java.io.IOException
-
inspectChildDictionary
public void inspectChildDictionary(PdfDictionary k, boolean inspectAttributes) throws java.io.IOException
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
java.io.IOException
-
xmlName
protected java.lang.String xmlName(PdfName name)
-
fixTagName
private static java.lang.String fixTagName(java.lang.String tag)
-
parseTag
public void parseTag(java.lang.String tag, PdfObject object, PdfDictionary page) throws java.io.IOException
Searches for a tag in a page.- Parameters:
tag
- the name of the tagobject
- an identifier to find the marked contentpage
- a page dictionary- Throws:
java.io.IOException
-
-