Extract PDF Data using categorized annotations


Are there any existing tools to extract data from PDF files using this method. Lets say I have 3 categories

  1. Image - this will be a rectangle annotation which will crop that specific area where the annotation is placed
  2. Title - this is another rectangle annotation will just get the text within that rectangle. If image, it will be converted to text through OCR
  3. Author - same with #2 , but now mapped to author

Then will produce a file format , let's say a CSV:


This categories (fields) should be also grouped to records so it will have 1 record per row.

If there is no existing tool to do this what tools or programming API/SDK could help me build one?


Posted 2017-02-19T17:11:19.487

Reputation: 153

No answers