Using DynaPDF parser to find characters
Please check out the DynaPDFParserMBS class in MBS Xojo DynaPDF Plugin. This class allows you to:
You can limit the search to a part of the page or the whole page and use various options like whether the text search is case insensitive.
Today we want to show you how you can identify the exact position of any character in a PDF. Like this picture where we show all characters with a box, even for mirrored or rotated text:
Let us show the code for this. You may review the example project Text Positions with parser and see where we load the PDF. Once it is loaded, we initialize the DynaPDFParserMBS object. We use the kstMatchAlways here to have it not look for a particular text, but to report the position of every character:
The loop runs while we have more text. For each character, we get the selection text and the bounding box as an array of points. You can of course just get the rectangle, but that won't handle rotated text. We continue the loop with calling FindText again and passing true to continue search.
In the paint event of the window, we draw the PDF page first. Then we loop over the found text pieces and show each character surrounded with the box drawn from the points we got:
As shown you can know from each character where it is. You may use DeleteText function to precisely cut text and remove individual characters from the PDF page. Or annotate the PDF page. Like you could add WebLinks to specific words once you know the surrounding rectangle.
Please try the example project and let us know what questions you have. The recent addition of SelBBOx2 and SelText properties in v24.1 are based on customers asking for them.