The OCR systems within the Extended DMS system powers the following functionality:
The OCR functionality is enabled by default, using the tesseract OCR engine (maintained by google: ).
You can disable the OCR functionality as follows:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":false},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation
You can re-enable the OCR functionality by calling this REST command:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":true},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation
You can check whether OCR is currently enabled by calling this REST command:
curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{},"context":{}}' -u
Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/IsOcrEnabledOperation
It is also possible to utilise any OCR engine that can be called by command, or via an SDK. There are fees levied by the OCR software vendors; however, they typically produce the most accurate OCR results. Please contact Patrix if you would like to arrange to setup a commercial OCR system as part of your DMS installation.
For the OCR tools to be used, they must be able to convert PDF or images to text from the command-line.
Here are some options that are available:
See a comparison of OCR offerings at Wikipedia.