Apache PDFBox
Apache PDFBox is an open source pure-Java library that can be used to create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.
Developer(s) | Apache Software Foundation | ||||
---|---|---|---|---|---|
Stable release |
| ||||
Repository | PDFBox Repository (Mirror) | ||||
Written in | Java | ||||
Operating system | Cross-platform | ||||
Type | Portable Document Format (PDF) | ||||
License | Apache License 2.0 | ||||
Website | pdfbox |
Open Hub reports over 11,000 commits (since the start as an Apache project) by 18 contributors representing more than 140,000 lines of code. PDFBox has a well established, mature codebase maintained by an average size development team with increasing year-over-year commits. Using the COCOMO model, it took an estimated 46 person-years of effort.[1]
Structure
Apache PDFBox has these components:
- PDFBox: the main part
- FontBox: handles font information
- XmpBox: handles XMP metadata
- Preflight (optional): checks PDF files for PDF/A-1b conformity.
History
PDFBox was started in 2002 in SourceForge by Ben Litchfield who wanted to be able to extract text of PDF files for Lucene.[2] It became an Apache Incubator project in 2008, and an Apache top level project in 2009.[3]
Preflight was originally named PaDaF and developed by Atos worldline, and donated to the project in 2011.[4]
In February 2015, Apache PDFBox was named an Open Source Partner Organization of the PDF Association.[5]
See also
References
- "The Apache PDFBox Open Source Project on Open Hub". openhub.net. 2017-03-18. Retrieved 2017-03-18.
- Apache PDFBox and FontBox 1.0.0 released, The H Open, 16 February 2010
- PDFBox Project Incubation Status
- PaDaF Preflight Codebase Intellectual Property (IP) Clearance Status
- Apache™ PDFBox™ named an Open Source Partner Organization of the PDF Association, February 3, 2015