This page describes the setup of an independent OCR appliance for use with the Extended DMS environment. DMS will access the OCR appliance and issue commands so that OCR is performed. The advantages are that the DMS appliance's resources (CPU, RAM) are thus not consumed by the OCR processing.

OCRKit Professional is a low cost multi language OCR utility with good output quality. Given it's multi core architecture, it has also good performance. While OCRKit is available for MacOS  and Windows, the current guide only explains a setup using the MacOS version given the current Windows version does not include CLI access. Get OCRKit Professional from here: http://www.ocrkit.com.

Components and setup required:

  1. Recent version of MacOS X running in VM or on bare metal and recent version of OCRKit Professional installed in /Applications/ ("OCRKit appliance")
  2. OCRKit appliance must be set up to allow remote administration using ssh from the DMS appliance's console by public key authentication (Sharing setup: "Remote administration") for a user with administrative rights (<<adminuser>>)
  3. DMS appliance needs /<storagepath>/nuxeo/data/ and /<storagepath>/nuxeo/tmp/ NFS-exports
  4. OCRKit appliance must mount these NFS-shares to /var/lib/nuxeo/data/ and /opt/nuxeo/server/tmp/ respectively and make suer these mounts will be auto mounted (kept alive)
  5. In DMS appliance, place the below script "ocrKit" in /<storagepath>/nuxeo/scripts/ and make executable (chmod +x)
  6. Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into /Users/<<adminuser>>/.ssh/authorized_keys file of OCR appliance
  7. Login from DSM appliance to OCR appliance using ssh. After successful login, move the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from ~/.ssh/ to /<storagepath>/nuxeo/ssh/ 
  8. Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "ocrkitmac"
  9. Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in ~/deploy/config/)
  10. Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script

Scripts:

 

#!/bin/bash
ssh <<adminuser>>@<<OCRKit.appliance.name>> "/Applications/OCRKit\ Pro.app/Contents/MacOS/OCRKit\ Pro --format text --output $1 $2"

 

 

  add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh"
  add_volume "/<storage_path>/nuxeo/scripts/ocrKit" "/usr/bin/ocrKit"