DynaPDF Content Parser
As PDF documents contain pages with content streams, you may be interested to inspect the content with our DynaPDFParserMBS class. You open a PDF document, you import pages into memory and then parse the page. Once you parse them page, you can access the content objects. That's great for a few things:
- Extract text or vector graphics
- Remove unwanted elements
- Modify drawings
- Get bounding boxes and coordinates from every item.
- Check which font is active for which text fragment.
A lot of properties in the classes are settable, so you can for example change a color easily. Or adjust a coordinate in some vector graphic or adjust the line width.
Or when you like to place a template on top of an existing page, you may need to modify the content to remove rectangle in the background, so you can see through the template to the content behind it.
Here is a sample, that marks all images for deletion and then writes the page back.
Here is the list of operators and the classes used for the operators. By default you get a DynaPDFParserContentMBS object. If the operator is one with parameters, we use the matching subclass, so you can access the properties.
| Constant | Value | Description | Class |
|---|---|---|---|
| kopBeginCompatibility | 1 |
Ignore unknown operators until the section is terminated with kopEndCompatibility. |
no parameters |
| kopBeginMarkedContent | 2 |
Begins marked content. |
DynaPDFParserContentBeginMarkedContentMBS |
| kopBeginText | 3 |
Begins text. |
no parameters |
| kopClipPath | 4 |
Clip current path. |
DynaPDFParserContentClipPathMBS |
| kopClipPathExt | 5 |
Clip path with extended options. |
DynaPDFParserContentClipPathExtMBS |
| kopDrawImage | 6 |
Draw an image. |
DynaPDFParserContentDrawImageMBS |
| kopDrawInlineImage | 7 |
Draw an inline image. |
DynaPDFParserContentDrawInlineImageMBS |
| kopDrawPath | 8 |
Draw a path. |
DynaPDFParserContentDrawPathMBS |
| kopDrawPathExt | 9 |
Draw a path with more options. |
DynaPDFParserContentDrawPathExtMBS |
| kopDrawShading | 10 |
Draw shading. |
DynaPDFParserContentDrawShadingMBS |
| kopDrawTemplate | 11 |
Draw a template. |
DynaPDFParserContentDrawTemplateMBS |
| kopDrawTranspGroup | 12 |
Draw a transparent group. |
DynaPDFParserContentDrawGroupMBS |
| kopEndCompatibility | 13 |
Compatibility section ends. |
no parameters |
| kopEndMarkedContent | 14 |
End marked content. |
no parameters |
| kopEndText | 15 |
End text. |
no parameters |
| kopInitType3Glyph0 | 16 |
Init 3D Glyph |
DynaPDFParserContentInitType3GlyphMBS |
| kopInitType3Glyph1 | 17 |
Init 3D Glyph |
DynaPDFParserContentInitType3GlyphMBS |
| kopInsertPostscript | 18 |
Insert PostScript. Can be considered when printing on a Postscript device. |
DynaPDFParserContentInsertPostscriptMBS |
| kopMarkedContPoint | 19 |
Marked content point. |
DynaPDFParserContentMarkedContPntMBS |
| kopMulMatrix | 20 |
Multiply matrix. |
DynaPDFParserContentMulMatrixMBS |
| kopNull | 0 |
This represents a deleted node. |
none |
| kopPageHeader | 21 |
Page Header |
DynaPDFParserContentPageHeaderMBS |
| kopRestoreGS | 22 |
Restore Graphics State |
no parameters |
| kopSaveGS | 23 |
Save Graphics State |
no parameters |
| kopSetCharSpacing | 24 |
Set character spacing |
DynaPDFParserContentFloatMBS |
| kopSetExtGState | 25 |
Set extended graphics state. |
DynaPDFParserContentExtGStateMBS |
| kopSetFillColor | 26 |
Set fill color. |
DynaPDFParserContentColorMBS |
| kopSetFillColorSpace | 27 |
Set fill color space. |
DynaPDFParserContentColorSpaceMBS |
| kopSetFillPattern | 28 |
Set fill pattern. |
DynaPDFParserContentPatternMBS |
| kopSetFlatnessTolerance | 29 |
Set flatness tolerance. |
DynaPDFParserContentFloatMBS |
| kopSetFont | 30 |
Set font |
DynaPDFParserContentFontMBS |
| kopSetLineCapStyle | 31 |
Set line cap style. |
DynaPDFParserContentIntMBS |
| kopSetLineDashPattern | 32 |
Set line dash pattern. |
DynaPDFParserContentLineDashPatternMBS |
| kopSetLineJoinStyle | 33 |
Set line join style. |
DynaPDFParserContentIntMBS |
| kopSetLineWidth | 34 |
Set line width. |
DynaPDFParserContentFloatMBS |
| kopSetMiterLimit | 35 |
Set miter limit. |
DynaPDFParserContentFloatMBS |
| kopSetRenderingIntent | 36 |
Set rendering intent. |
DynaPDFParserContentIntMBS |
| kopSetStrokeColor | 37 |
Set stroke color. |
DynaPDFParserContentColorMBS |
| kopSetStrokeColorSpace | 38 |
Set stroke color space. |
DynaPDFParserContentColorSpaceMBS |
| kopSetStrokePattern | 39 |
Set stroke pattern. |
DynaPDFParserContentPatternMBS |
| kopSetTextDrawMode | 40 |
Set text drawing mode. |
DynaPDFParserContentIntMBS |
| kopSetTextScale | 41 |
Set text scale. |
DynaPDFParserContentFloatMBS |
| kopSetWordSpacing | 42 |
Set word spacing. |
DynaPDFParserContentFloatMBS |
| kopShowText | 43 |
Shows text. |
DynaPDFParserContentShowTextMBS |
Here is a sample where we check whether a content object is a DynaPDFParserContentDrawImageMBS object, so we can assign it to such a variable and access properties:
Please try this. You may enjoy walking all the content of the pages in your PDF documents and make interesting adjustments.