The OCR systems within the Extended DMS system powers the following functionality:
- Auto-save incoming paper correspondence to a case, by receiving the pages from the scanning machine
- Enable all scanned documents to be searched using keywords that the document contains
- Default OCR Engine
- Enable/Disable OCR
- Commercial OCR Applications
Default OCR Engine
The OCR functionality is enabled by default, using the tesseract OCR engine (maintained by google: ).
Enable/Disable OCR
You can disable the OCR functionality as follows:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":false},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation
You can re-enable the OCR functionality by calling this REST command:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":true},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation
You can check whether OCR is currently enabled by calling this REST command:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/IsOcrEnabledOperation
Commercial OCR Applications
It is also possible to utilise any OCR engine that can be called by command, or via an SDK. There are fees levied by the OCR software vendors; however, they typically produce the most accurate OCR results. Please contact Patrix if you would like to arrange to setup a commercial OCR system as part of your DMS installation.
For the OCR tools to be used, they must be able to convert PDF or images to text from the command-line.
Here are some options that are available:
- OCRKit (~$75, for OS X)
- Ephesoft OCR (Windows, varied pricing for cloud versus on-premise versus community edition)
- ABBYY OCR for Linux (version 9 CLI, competitive pricing)
- ABBYY SDK/Runtime (all platforms, USD$5,000+)
- Nuance (all platforms, USD$5,000+)
- LEADTOOLS (all platforms, USD$5,000+)
See a comparison of OCR offerings at Wikipedia.