Package com.itextpdf.kernel.utils
Class TaggedPdfReaderTool
- java.lang.Object
-
- com.itextpdf.kernel.utils.TaggedPdfReaderTool
-
public class TaggedPdfReaderTool extends java.lang.Object
Converts a tagged PDF document into an XML file.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
TaggedPdfReaderTool.MarkedContentEventListener
-
Field Summary
Fields Modifier and Type Field Description protected PdfDocument
document
private java.util.Set<PdfObject>
inspectedStructTreeElems
protected java.io.OutputStreamWriter
out
protected java.util.Map<PdfDictionary,java.util.Map<java.lang.Integer,java.lang.String>>
parsedTags
protected java.lang.String
rootTag
-
Constructor Summary
Constructors Constructor Description TaggedPdfReaderTool(PdfDocument document)
Constructs aTaggedPdfReaderTool
via a givenPdfDocument
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
convertToXml(java.io.OutputStream os)
Converts the current tag structure into an XML file with default encoding (UTF-8).void
convertToXml(java.io.OutputStream os, java.lang.String charset)
Converts the current tag structure into an XML file with provided encoding.protected static java.lang.String
escapeXML(java.lang.String s, boolean onlyASCII)
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.protected static java.lang.String
fixTagName(java.lang.String tag)
Fixes specified tag name to be valid XML tag.protected void
inspectAttributes(PdfStructElem kid)
Inspects attributes dictionary of the StructTreeRoot child.protected void
inspectKid(IStructureNode kid)
Inspect the child of the StructTreeRoot.protected void
inspectKids(java.util.List<IStructureNode> kids)
Inspect the children of the StructTreeRoot.static boolean
isValidCharacterValue(int c)
Checks if a character value should be escaped/unescaped.protected void
parseTag(PdfMcr kid)
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.TaggedPdfReaderTool
setRootTag(java.lang.String rootTagName)
Sets the name of the root tag of the resultant XML file
-
-
-
Field Detail
-
document
protected PdfDocument document
-
out
protected java.io.OutputStreamWriter out
-
rootTag
protected java.lang.String rootTag
-
parsedTags
protected java.util.Map<PdfDictionary,java.util.Map<java.lang.Integer,java.lang.String>> parsedTags
-
inspectedStructTreeElems
private final java.util.Set<PdfObject> inspectedStructTreeElems
-
-
Constructor Detail
-
TaggedPdfReaderTool
public TaggedPdfReaderTool(PdfDocument document)
Constructs aTaggedPdfReaderTool
via a givenPdfDocument
.- Parameters:
document
- the document to read tag structure from
-
-
Method Detail
-
isValidCharacterValue
public static boolean isValidCharacterValue(int c)
Checks if a character value should be escaped/unescaped.- Parameters:
c
- a character value- Returns:
- true if it's OK to escape or unescape this value.
-
convertToXml
public void convertToXml(java.io.OutputStream os) throws java.io.IOException
Converts the current tag structure into an XML file with default encoding (UTF-8).- Parameters:
os
- the output stream to save XML file to- Throws:
java.io.IOException
- in case of any I/O error
-
convertToXml
public void convertToXml(java.io.OutputStream os, java.lang.String charset) throws java.io.IOException
Converts the current tag structure into an XML file with provided encoding.- Parameters:
os
- the output stream to save XML file tocharset
- the charset of the resultant XML file- Throws:
java.io.IOException
- in case of any I/O error
-
setRootTag
public TaggedPdfReaderTool setRootTag(java.lang.String rootTagName)
Sets the name of the root tag of the resultant XML file- Parameters:
rootTagName
- the name of the root tag- Returns:
- this object
-
inspectKids
protected void inspectKids(java.util.List<IStructureNode> kids)
Inspect the children of the StructTreeRoot.- Parameters:
kids
- list of the direct kids of the StructTreeRoot
-
inspectKid
protected void inspectKid(IStructureNode kid)
Inspect the child of the StructTreeRoot.- Parameters:
kid
- the direct kid of the StructTreeRoot
-
inspectAttributes
protected void inspectAttributes(PdfStructElem kid)
Inspects attributes dictionary of the StructTreeRoot child.- Parameters:
kid
- the direct kid of the StructTreeRoot
-
parseTag
protected void parseTag(PdfMcr kid)
Parses tag of the Marked Content Reference (MCR) kid of the StructTreeRoot.- Parameters:
kid
- the directPdfMcr
kid of the StructTreeRoot
-
fixTagName
protected static java.lang.String fixTagName(java.lang.String tag)
Fixes specified tag name to be valid XML tag.- Parameters:
tag
- tag name to fix- Returns:
- fixed tag name.
-
escapeXML
protected static java.lang.String escapeXML(java.lang.String s, boolean onlyASCII)
NOTE: copied from itext5 XMLUtils class Escapes a string with the appropriated XML codes.- Parameters:
s
- the string to be escapedonlyASCII
- codes above 127 will always be escaped with &#nn; iftrue
- Returns:
- the escaped string
-
-