Class RegexBasedLocationExtractionStrategy

java.lang.Object
com.itextpdf.kernel.pdf.canvas.parser.listener.RegexBasedLocationExtractionStrategy
All Implemented Interfaces:
IEventListener, ILocationExtractionStrategy

public class RegexBasedLocationExtractionStrategy extends Object implements ILocationExtractionStrategy
This class is designed to search for the occurrences of a regular expression and return the resultant rectangles. Do note that this class holds all text locations and can't be used for processing multiple pages. If you want to extract text from several pages of pdf document you have to create a new instance of RegexBasedLocationExtractionStrategy for each page.

Here is an example of usage with new instance per each page: PdfDocument document = new PdfDocument(new PdfReader("...")); for (int i = 1; i <= document.getNumberOfPages(); ++i) { RegexBasedLocationExtractionStrategy extractionStrategy = new RegexBasedLocationExtractionStrategy(""); PdfCanvasProcessor processor = new PdfCanvasProcessor(extractionStrategy); processor.processPageContent(document.getPage(i)); for (IPdfTextLocation location : extractionStrategy.getResultantLocations()) { //process locations ... } }