|  |
BFO adds text extraction to PDF Library

BFO ( Big Faceless Organization ), a global supplier of java reporting solutions, strengthens the acclaimed Big Faceless PDF Library with the addition of text and image extraction.
The 2.6.2 release adds the ability to extract text and bitmap images from PDF documents, as well as index the PDF using the Apache Lucene search engine. The library extracts and indexes text in Unicode from the form fields, annotations and document metadata as well as the document body, and at roughly 50 pages a second for large documents.
Speed and accuracy of text extraction coupled with the existing features of the PDF Library makes it a wise choice for developers involved in data mining, content management systems and form processing environments. As well as being beneficial in settings that require the ability to search or extract text from large numbers of PDF files.
Text and image extraction requires the Big Faceless PDF Library Extended Edition plus Viewer license, which can be downloaded from BFO’s website. 28.10.2005, BFO


Subscribe to the newsletter
|  |  |
|  | |  |