Auto-save incoming paper correspondence to a case, by receiving the pages from the scanning machine
Enable all scanned documents to be searched using keywords that the document contains

Default OCR Engine

...

The OCR functionality is enabled by default, using the tesseract OCR engine (maintained by google: ).

...

). Tesseract 3 is included in the nuxeo container and will be used for OCRing by default. This is only recommended for low document numbers and reasonably low image-pdf document throughput.

Alternatives to default options

Dockerized tesseract appliance: Configuring external dockerized tesseract appliance for DMS
OCRkit appliance (eg. ABBY OCR for Linux): Configuring OCR appliance for DMS using OCRKit
Unspecified appliance (must be controllable through ssh and mount NFS share from DMS appliance): Configuring OCR appliance for DMS
Using the online Google Cloud Vision service by which the OCRing will be done: Configuring Google Cloud Vision OCR appliance for DMS

...

You can disable the OCR functionality as follows:

curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":false},"context":{}}' -u

Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation

You can re-enable the OCR functionality by calling this REST command:

curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{"enabled":true},"context":{}}' -u

Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/SetOcrEnabledOperation

You can check whether OCR is currently enabled by calling this REST command:

curl -H 'Content-Type:application/json+nxrequest' -X POST -d '{"params":{},"context":{}}' -u

Administrator:Administrator http://yournuxeoserver.com:8080/nuxeo/site/automation/IsOcrEnabledOperation

...

Commercial OCR Applications

It is also possible to utilise any OCR engine that can be called by command, or via an SDK. There the command line (eg. via ssh). There are fees levied by the OCR software vendors; however, they typically produce the most accurate OCR results. Please contact Patrix if you would like to arrange to setup a an alternative commercial OCR system as part of your DMS installation.

For the OCR tools to be used, they must be able to convert PDF or images to text from the command - line.

Here are some options that are available:

OCRKit (~$75, for OS X) Ephesoft OCR (Windows, varied pricing for cloud versus on-premise versus community edition)- Integration as a stand alone OCR appliance possible: see here
ABBYY OCR for Linux (version 9 11 CLI, competitive pricing)
ABBYY SDK/Runtime (all platforms, USD$5,000+)
Nuance (all platforms, USD$5,000+)
LEADTOOLS (all platforms, USD$5,000+pricing varies according to volume OCRed)

See a comparison of OCR offerings at Wikipedia.

...

Space shortcuts

Child pages

Versions Compared

Old Version 3

New Version Current

Key

Default OCR Engine

Alternatives to default options

Commercial OCR Applications

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 3

New Version Current

Key

Default OCR Engine

Alternatives to default options

Commercial OCR Applications