Class TagStructureContext
- java.lang.Object
-
- com.itextpdf.kernel.pdf.tagutils.TagStructureContext
-
public class TagStructureContext extends java.lang.Object
TagStructureContext
class is used to track necessary information of document's tag structure. It is also used to make some global modifications of the tag tree like removing or flushing page tags, however these two methods and also others are called automatically and are for the most part for internal usage.
There shall be only one instance of this class perPdfDocument
. To obtain instance of this class usePdfDocument.getTagStructureContext()
.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.util.Set<java.lang.String>
ALLOWED_ROOT_TAG_ROLES
protected TagTreePointer
autoTaggingPointer
private PdfDocument
document
private PdfNamespace
documentDefaultNamespace
private boolean
forbidUnknownRoles
private java.util.Set<PdfDictionary>
namespaces
private java.util.Map<java.lang.String,PdfNamespace>
nameToNamespace
private PdfStructElem
rootTagElement
private PdfVersion
tagStructureTargetVersion
private WaitingTagsManager
waitingTagsManager
-
Constructor Summary
Constructors Constructor Description TagStructureContext(PdfDocument document)
Do not use this constructor, instead usePdfDocument.getTagStructureContext()
method.TagStructureContext(PdfDocument document, PdfVersion tagStructureTargetVersion)
Do not use this constructor, instead usePdfDocument.getTagStructureContext()
method.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private void
actualizeNamespacesInStructTreeRoot()
boolean
checkIfRoleShallBeMappedToStandardRole(java.lang.String role, PdfNamespace namespace)
Checks if the given role and namespace are specified to be obligatory mapped to the standard structure namespace in order to be a valid role in the Tagged PDF.private java.lang.String
composeExceptionBasedOnNamespacePresence(java.lang.String role, PdfNamespace namespace, java.lang.String withoutNsEx, java.lang.String withNsEx)
private java.lang.String
composeInvalidRoleException(java.lang.String role, PdfNamespace namespace)
private java.lang.String
composeTooMuchTransitiveMappingsException(java.lang.String role, PdfNamespace namespace)
TagTreePointer
createPointerForStructElem(PdfStructElem structElem)
Creates a newTagTreePointer
which points at givenPdfStructElem
.(package private) void
ensureNamespaceRegistered(PdfNamespace namespace)
PdfNamespace
fetchNamespace(java.lang.String namespaceName)
This method defines a recommended way to obtainPdfNamespace
class instances.TagStructureContext
flushPageTags(PdfPage page)
Flushes the tags which are considered to belong to the given page.(package private) void
flushParentIfBelongsToPage(PdfStructElem parent, PdfPage currentPage)
TagTreePointer
getAutoTaggingPointer()
All tagging logic performed by iText automatically (along with addition of content, annotations etc) usesTagTreePointer
returned by this method to manipulate the tag structure.(package private) PdfDocument
getDocument()
PdfNamespace
getDocumentDefaultNamespace()
A namespace that is used as a default value for the tagging for any newTagTreePointer
created (including the pointer returned bygetAutoTaggingPointer()
, which implies that automatically created tag structure will be in this namespace by default).PdfStructElem
getPointerStructElem(TagTreePointer pointer)
GetsPdfStructElem
at whichTagTreePointer
points.IRoleMappingResolver
getRoleMappingResolver(java.lang.String role)
Gets an instance of theIRoleMappingResolver
corresponding to the current tag structure target version.IRoleMappingResolver
getRoleMappingResolver(java.lang.String role, PdfNamespace namespace)
Gets an instance of theIRoleMappingResolver
corresponding to the current tag structure target version.(package private) PdfStructElem
getRootTag()
TagTreePointer
getTagPointerById(byte[] id)
Retrieve a pointer to a structure element by ID.TagTreePointer
getTagPointerByIdString(java.lang.String id)
Retrieve a pointer to a structure element by ID.PdfVersion
getTagStructureTargetVersion()
Gets the version of the PDF standard to which the tag structure shall adhere.WaitingTagsManager
getWaitingTagsManager()
GetsWaitingTagsManager
for the current document.private void
initRegisteredNamespaces()
private boolean
isRoleAllowedToBeRoot(java.lang.String role)
void
normalizeDocumentRootTag()
Transforms root tags in a way that complies with the tagged PDF specification.void
prepareToDocumentClosing()
A utility method that prepares the current instance of theTagStructureContext
for the closing of document.TagTreePointer
removeAnnotationTag(PdfAnnotation annotation)
Removes annotation content item from the tag structure.TagTreePointer
removeAnnotationTag(PdfAnnotation annotation, boolean setAutoTaggingPointer)
Removes annotation content item from the tag structure and sets autoTaggingPointer if true is passed.TagTreePointer
removeContentItem(PdfPage page, int mcid)
Removes content item from the tag structure.private void
removePageTagFromParent(IStructureNode pageTag, IStructureNode parent)
TagStructureContext
removePageTags(PdfPage page)
Removes all tags that belong only to this page.IRoleMappingResolver
resolveMappingToStandardOrDomainSpecificRole(java.lang.String role, PdfNamespace namespace)
Gets an instance of theIRoleMappingResolver
which is already in the "resolved" state: it returns role in the standard or domain-specific namespace for theIRoleMappingResolver.getRole()
andIRoleMappingResolver.getNamespace()
methods calls which correspond to the mapping of the given role; or null if the given role is not mapped to the standard or domain-specific one.TagStructureContext
setDocumentDefaultNamespace(PdfNamespace namespace)
Sets a namespace that will be used as a default value for the tagging for any newTagTreePointer
created.TagStructureContext
setForbidUnknownRoles(boolean forbidUnknownRoles)
If forbidUnknownRoles is set to true, then if you would try to add new tag which has not a standard role and it's role is not mapped through RoleMap, an exception will be raised.private void
setNamespaceForNewTagsBasedOnExistingRoot()
(package private) boolean
targetTagStructureVersionIs2()
(package private) void
throwExceptionIfRoleIsInvalid(AccessibilityProperties properties, PdfNamespace pointerCurrentNamespace)
(package private) void
throwExceptionIfRoleIsInvalid(java.lang.String role, PdfNamespace namespace)
-
-
-
Field Detail
-
ALLOWED_ROOT_TAG_ROLES
private static final java.util.Set<java.lang.String> ALLOWED_ROOT_TAG_ROLES
-
document
private final PdfDocument document
-
tagStructureTargetVersion
private final PdfVersion tagStructureTargetVersion
-
waitingTagsManager
private final WaitingTagsManager waitingTagsManager
-
namespaces
private final java.util.Set<PdfDictionary> namespaces
-
nameToNamespace
private final java.util.Map<java.lang.String,PdfNamespace> nameToNamespace
-
autoTaggingPointer
protected TagTreePointer autoTaggingPointer
-
rootTagElement
private PdfStructElem rootTagElement
-
forbidUnknownRoles
private boolean forbidUnknownRoles
-
documentDefaultNamespace
private PdfNamespace documentDefaultNamespace
-
-
Constructor Detail
-
TagStructureContext
public TagStructureContext(PdfDocument document)
Do not use this constructor, instead usePdfDocument.getTagStructureContext()
method.
CreatesTagStructureContext
for document. There shall be only one instance of this class perPdfDocument
.- Parameters:
document
- the document which tag structure will be manipulated with this class.
-
TagStructureContext
public TagStructureContext(PdfDocument document, PdfVersion tagStructureTargetVersion)
Do not use this constructor, instead usePdfDocument.getTagStructureContext()
method.Creates
TagStructureContext
for document. There shall be only one instance of this class perPdfDocument
.- Parameters:
document
- the document which tag structure will be manipulated with this class.tagStructureTargetVersion
- the version of the pdf standard to which the tag structure shall adhere.
-
-
Method Detail
-
setForbidUnknownRoles
public TagStructureContext setForbidUnknownRoles(boolean forbidUnknownRoles)
If forbidUnknownRoles is set to true, then if you would try to add new tag which has not a standard role and it's role is not mapped through RoleMap, an exception will be raised. Default value - true.- Parameters:
forbidUnknownRoles
- new value of the flag- Returns:
- current
TagStructureContext
instance.
-
getTagStructureTargetVersion
public PdfVersion getTagStructureTargetVersion()
Gets the version of the PDF standard to which the tag structure shall adhere.- Returns:
- the tag structure target version
-
getAutoTaggingPointer
public TagTreePointer getAutoTaggingPointer()
All tagging logic performed by iText automatically (along with addition of content, annotations etc) usesTagTreePointer
returned by this method to manipulate the tag structure. Typically it points at the root tag. This pointer also could be used to tweak auto tagging process (e.g. move this pointer to the Section tag, which would result in placing all automatically tagged content under Section tag).- Returns:
- the
TagTreePointer
which is used for all automatic tagging of the document.
-
getWaitingTagsManager
public WaitingTagsManager getWaitingTagsManager()
GetsWaitingTagsManager
for the current document. It allows to mark tags as waiting, which would indicate that they are incomplete and are not ready to be flushed.- Returns:
- document's
WaitingTagsManager
class instance.
-
getDocumentDefaultNamespace
public PdfNamespace getDocumentDefaultNamespace()
A namespace that is used as a default value for the tagging for any newTagTreePointer
created (including the pointer returned bygetAutoTaggingPointer()
, which implies that automatically created tag structure will be in this namespace by default).By default, this value is defined based on the PDF document version and the existing tag structure inside a document. For the new empty PDF 2.0 documents this namespace is set to
StandardNamespaces.PDF_2_0
.This value has meaning only for the PDF documents of version 2.0 and higher.
- Returns:
- a
PdfNamespace
which is used as a default value for the document tagging.
-
setDocumentDefaultNamespace
public TagStructureContext setDocumentDefaultNamespace(PdfNamespace namespace)
Sets a namespace that will be used as a default value for the tagging for any newTagTreePointer
created. SeegetDocumentDefaultNamespace()
for more info.Be careful when changing this property value. It is most recommended doing it right after the
PdfDocument
was created, before any content was added. Changing this value after any content was added might result in the mingled tag structure from the namespaces point of view. So in order to maintain the document consistent but in the namespace different from default, set this value before any modifications to the document were made and beforegetAutoTaggingPointer()
method was called for the first time.This value has meaning only for the PDF documents of version 2.0 and higher.
- Parameters:
namespace
- aPdfNamespace
which is to be used as a default value for the document tagging.- Returns:
- current
TagStructureContext
instance.
-
fetchNamespace
public PdfNamespace fetchNamespace(java.lang.String namespaceName)
This method defines a recommended way to obtainPdfNamespace
class instances.Returns either a wrapper over an already existing namespace dictionary in the document or over a new one if such namespace wasn't encountered before. Calling this method is considered as encountering a namespace, i.e. two sequential calls on this method will return the same namespace instance (which is not true in general case of two method calls, for instance if several namespace instances with the same name are created via
PdfNamespace
constructors and set to the elements of the tag structure, then the last encountered one will be returned by this method). However encountered namespaces will not be added to the document's structure tree root/Namespaces
array unless they were set to the certain element of the tag structure.- Parameters:
namespaceName
- aString
defining the namespace name (conventionally a uniform resource identifier, or URI).- Returns:
PdfNamespace
wrapper over either already existing namespace object or over the new one.
-
getRoleMappingResolver
public IRoleMappingResolver getRoleMappingResolver(java.lang.String role)
Gets an instance of theIRoleMappingResolver
corresponding to the current tag structure target version. This method implies that role is in the default standard structure namespace.- Parameters:
role
- a role in the default standard structure namespace which mapping is to be resolved.- Returns:
- a
IRoleMappingResolver
instance, with the giving role as current.
-
getRoleMappingResolver
public IRoleMappingResolver getRoleMappingResolver(java.lang.String role, PdfNamespace namespace)
Gets an instance of theIRoleMappingResolver
corresponding to the current tag structure target version.- Parameters:
role
- a role in the given namespace which mapping is to be resolved.namespace
- aPdfNamespace
which this role belongs to.- Returns:
- a
IRoleMappingResolver
instance, with the giving role in the givenPdfNamespace
as current.
-
checkIfRoleShallBeMappedToStandardRole
public boolean checkIfRoleShallBeMappedToStandardRole(java.lang.String role, PdfNamespace namespace)
Checks if the given role and namespace are specified to be obligatory mapped to the standard structure namespace in order to be a valid role in the Tagged PDF.- Parameters:
role
- a role in the given namespace which mapping necessity is to be checkednamespace
- aPdfNamespace
which this role belongs to,null
value refers to the default standard structure namespace- Returns:
true
, if the given role in the given namespace is either mapped to the standard structure role or doesn't have to; otherwisefalse
which means that role is not mapped to the standard or domain specific namespace, and it shall be mapped to standard role to become valid in the Tagged PDF
-
resolveMappingToStandardOrDomainSpecificRole
public IRoleMappingResolver resolveMappingToStandardOrDomainSpecificRole(java.lang.String role, PdfNamespace namespace)
Gets an instance of theIRoleMappingResolver
which is already in the "resolved" state: it returns role in the standard or domain-specific namespace for theIRoleMappingResolver.getRole()
andIRoleMappingResolver.getNamespace()
methods calls which correspond to the mapping of the given role; or null if the given role is not mapped to the standard or domain-specific one.- Parameters:
role
- a role in the given namespace which mapping is to be resolved.namespace
- aPdfNamespace
which this role belongs to.- Returns:
- an instance of the
IRoleMappingResolver
which returns false for theIRoleMappingResolver.currentRoleShallBeMappedToStandard()
method call; if mapping cannot be resolved to this state, this method returns null, which means that the given role in the specified namespace is not mapped to the standard role in the standard namespace.
-
removeAnnotationTag
public TagTreePointer removeAnnotationTag(PdfAnnotation annotation)
Removes annotation content item from the tag structure. If annotation is not added to the document or is not tagged, nothing will happen.- Parameters:
annotation
- thePdfAnnotation
that will be removed from the tag structure- Returns:
TagTreePointer
instance which points at annotation tag parent if annotation was removed, otherwise returns null
-
removeAnnotationTag
public TagTreePointer removeAnnotationTag(PdfAnnotation annotation, boolean setAutoTaggingPointer)
Removes annotation content item from the tag structure and sets autoTaggingPointer if true is passed. If annotation is not added to the document or is not tagged, nothing will happen.- Parameters:
annotation
- thePdfAnnotation
that will be removed from the tag structuresetAutoTaggingPointer
- true ifTagTreePointer
should be set to autoTaggingPointer- Returns:
TagTreePointer
instance which points at annotation tag parent if annotation was removed, otherwise returns null
-
removeContentItem
public TagTreePointer removeContentItem(PdfPage page, int mcid)
Removes content item from the tag structure.
Nothing happens if there is no such mcid on given page.- Parameters:
page
- page, which contains this content itemmcid
- marked content id of this content item- Returns:
TagTreePointer
which points at the parent of the removed content item, or null if there is no such mcid on given page.
-
removePageTags
public TagStructureContext removePageTags(PdfPage page)
Removes all tags that belong only to this page. The logic which defines if tag belongs to the page is described atflushPageTags(PdfPage)
.- Parameters:
page
- page that defines which tags are to be removed- Returns:
- current
TagStructureContext
instance
-
flushPageTags
public TagStructureContext flushPageTags(PdfPage page)
Flushes the tags which are considered to belong to the given page. The logic that defines if the given tag (structure element) belongs to the page is the following: if all the marked content references (dictionary or number references), that are the descendants of the given structure element, belong to the current page - the tag is considered to belong to the page. If tag has descendants from several pages - it is flushed, if all other pages except the current one are flushed.
If some of the page's tags have waiting state (seeWaitingTagsManager
these tags are considered as not yet finished ones, and they and their children won't be flushed.- Parameters:
page
- a page which tags will be flushed- Returns:
- current
TagStructureContext
instance
-
normalizeDocumentRootTag
public void normalizeDocumentRootTag()
Transforms root tags in a way that complies with the tagged PDF specification. Depending on PDF version behaviour may differ.
ISO 32000-1 (PDF 1.7 and lower) 14.8.4.2 Grouping Elements
"In a tagged PDF document, the structure tree shall contain a single top-level element; that is, the structure tree root (identified by the StructTreeRoot entry in the document catalogue) shall have only one child in its K (kids) array. If the PDF file contains a complete document, the structure type Document should be used for this top-level element in the logical structure hierarchy. If the file contains a well-formed document fragment, one of the structure types Part, Art, Sect, or Div may be used instead."
For PDF 2.0 and higher root tag is allowed to have only the Document role.
-
prepareToDocumentClosing
public void prepareToDocumentClosing()
A utility method that prepares the current instance of theTagStructureContext
for the closing of document. Essentially it flushes all the "hanging" information to the document.
-
getPointerStructElem
public PdfStructElem getPointerStructElem(TagTreePointer pointer)
GetsPdfStructElem
at whichTagTreePointer
points.NOTE: Be aware that
PdfStructElem
is a low level class, use it carefully, especially in conjunction with high levelTagTreePointer
andTagStructureContext
classes.- Parameters:
pointer
- aTagTreePointer
which points at desiredPdfStructElem
.- Returns:
- a
PdfStructElem
at which givenTagTreePointer
points.
-
getTagPointerById
public TagTreePointer getTagPointerById(byte[] id)
Retrieve a pointer to a structure element by ID.- Parameters:
id
- the ID of the element to retrieve- Returns:
- a
TagTreePointer
to the element in question, or null if there is none.
-
getTagPointerByIdString
public TagTreePointer getTagPointerByIdString(java.lang.String id)
Retrieve a pointer to a structure element by ID. * The ID will be encoded as a UTF-8 string and passed togetTagPointerById(byte[])
.- Parameters:
id
- the ID of the element to retrieve- Returns:
- a
TagTreePointer
to the element in question, or null if there is none.
-
createPointerForStructElem
public TagTreePointer createPointerForStructElem(PdfStructElem structElem)
Creates a newTagTreePointer
which points at givenPdfStructElem
.- Parameters:
structElem
- aPdfStructElem
for whichTagTreePointer
will be created.- Returns:
- a new
TagTreePointer
.
-
getRootTag
PdfStructElem getRootTag()
-
getDocument
PdfDocument getDocument()
-
ensureNamespaceRegistered
void ensureNamespaceRegistered(PdfNamespace namespace)
-
throwExceptionIfRoleIsInvalid
void throwExceptionIfRoleIsInvalid(AccessibilityProperties properties, PdfNamespace pointerCurrentNamespace)
-
throwExceptionIfRoleIsInvalid
void throwExceptionIfRoleIsInvalid(java.lang.String role, PdfNamespace namespace)
-
targetTagStructureVersionIs2
boolean targetTagStructureVersionIs2()
-
flushParentIfBelongsToPage
void flushParentIfBelongsToPage(PdfStructElem parent, PdfPage currentPage)
-
isRoleAllowedToBeRoot
private boolean isRoleAllowedToBeRoot(java.lang.String role)
-
setNamespaceForNewTagsBasedOnExistingRoot
private void setNamespaceForNewTagsBasedOnExistingRoot()
-
composeInvalidRoleException
private java.lang.String composeInvalidRoleException(java.lang.String role, PdfNamespace namespace)
-
composeTooMuchTransitiveMappingsException
private java.lang.String composeTooMuchTransitiveMappingsException(java.lang.String role, PdfNamespace namespace)
-
initRegisteredNamespaces
private void initRegisteredNamespaces()
-
actualizeNamespacesInStructTreeRoot
private void actualizeNamespacesInStructTreeRoot()
-
removePageTagFromParent
private void removePageTagFromParent(IStructureNode pageTag, IStructureNode parent)
-
composeExceptionBasedOnNamespacePresence
private java.lang.String composeExceptionBasedOnNamespacePresence(java.lang.String role, PdfNamespace namespace, java.lang.String withoutNsEx, java.lang.String withNsEx)
-
-