This page describes the setup of an independent OCR appliance for use with the Extended DMS environment. DMS will access the OCR appliance and issue commands so that OCR is performed. The advantages are that the DMS appliance's resources (CPU, RAM) are thus not consumed by the OCR processing.

OCRKit Professional is a low cost multi language OCR utility with good output quality. Given it's multi core architecture, it has also good performance. While OCRKit is available for MacOS  and Windows, the current guide only explains a setup using the MacOS version given the current Windows version does not include CLI access. Get OCRKit Professional from here: http://www.ocrkit.com.

Components and setup required:

  1. Recent version of MacOS X running in VM or on bare metal and recent version of OCRKit Professional installed in /Applications/ ("OCRKit appliance")
  2. OCRKit appliance must be set up to allow remote administration using ssh from the DMS appliance's console by public key authentication (Sharing setup: "Remote administration") for a user with administrative rights (<<adminuser>>)
  3. DMS appliance needs /<storagepath>/nuxeo/data/ and /<storagepath>/nuxeo/tmp/ NFS-exports
  4. OCRKit appliance must mount these NFS-shares to /var/lib/nuxeo/data/ and /opt/nuxeo/server/tmp/ respectively and make sure these mounts will be auto mounted (kept alive using below applescript or apply https://coderwall.com/p/fuoa-g/automounting-nfs-share-in-os-x-into-volumes which appears to work in High Sierra and newer versions of MacOS)
  5. In DMS appliance, place the below script "ocrKit" in /<storagepath>/nuxeo/scripts/ and make executable (chmod +x)
  6. Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into /Users/<<adminuser>>/.ssh/authorized_keys file of OCR appliance. Alternatively, you can use ssh-copy-id as per the steps outlined here: https://help.dreamhost.com/hc/en-us/articles/216499537-How-to-configure-passwordless-login-in-Mac-OS-X-and-Linux
  7. Login from DMS appliance to OCR appliance using ssh. After successful login, copy the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from ~/.ssh/ to /<storagepath>/nuxeo/ssh/. Make sure this folder has ownership root:root. Alternatively, you can "docker exec" into the nuxeo container, create the ssh keys directly inside the container and transfer them directly over to the OCR appliance using ssh-copy-id as per the above link.
  8. Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "ocrkitmac"
  9. Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in ~/deploy/config/)
  10. Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script
  11. In your DMS VM, you may need to open the necessary firewall ports for NFS services to run. See below.

Scripts:

  • Make sure you replace <<adminuser>> with correct user name of an administrative user in MacOS, and <<OCRKit.appliance.name>> with proper FQND or IP address:

 

#!/bin/bash
ssh <<adminuser>>@<<OCRKit.appliance.name>> "nice -n 10 /Applications/OCRKit\ Pro.app/Contents/MacOS/OCRKit\ Pro --format text --output $1 $2"
  • Make sure in "commands.conf" file of auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [ ${1} = "NUXEO" ] then") so as to be added the container run command.:

 

  add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh"
  add_volume "/<storage_path>/nuxeo/scripts/ocrKit" "/usr/bin/ocrKit"

Persistent NFS mount:

To simplify setting up the persistent NFS mounts on the mac (and indeed work around some bug that had been in the NFS stack for a while), the following scripts can be used – edited using ScriptEditor to fill in the missing parameters. Define this scripts as startup documents in your mac; it will check if an NFS volume is mounted and if not, mount it if it's available.

The below scripts are provided without prejudice, liability, warranty or support. If you do not agree, please do not use this script.

When using the script, please make sure you have the /var/lib/nuxeo/data/ and /opt/nuxeo/server/tmp/ paths created.

Note: You need to run both versions of this script, one for the /tmp mountpoint and one for the /data mountpoint.

Remount scripts.zip

DMS VM firewall settings:

If your DMS has strict firewall settings (you will notice that the NFS mount points are still not mounted after several minutes and "showmount -e <your_dms_ip>" returns an error), you may need to open the firewall to allow all NFS related services. to do this, in your DMS VM run as follows:

firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload 
  • No labels