This page describes the setup of an independent OCR appliance for use with the Extended DMS environment. DMS will access the OCR appliance and issue commands so that OCR is performed. The advantages are that the DMS appliance's resources (CPU, RAM) are thus not consumed by the OCR processing.
The method described here is not specific to a given OCR engine. Prerequisites are that the OCR engine runs in a VM that is accessible by ssh (or in the host system that is running the DMS). If a separate VM is used, file system mounts may be needed between the host VMs running the DMS and the OCR engine.
Given its advantages, the below instructions describe the setup of a OCR engine in a separate VM.
Components and setup required:
Provide OCR engine in VM
("OCR appliance")- OCR appliance must be set up to allow remote administration using ssh from the DMS appliance's console by public key authentication for a user with administrative rights (<<adminuser>>)
- DMS appliance needs
/<storagepath>/nuxeo/data/
and/<storagepath>/nuxeo/tmp/
NFS-exports - OCR appliance must mount these NFS-shares to
/var/lib/nuxeo/data/
and/opt/nuxeo/server/tmp/
respectively and make suer these mounts will be auto mounted (kept alive) - In DMS appliance, place a script "piocr" in
/<storagepath>/nuxeo/scripts/
and make executable (chmod +x). This script contains the commands to drive the OCR process for each file. - Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into
<<adminuserhome>>/.ssh/authorized_keys
file of OCR appliance - Login from DSM appliance to OCR appliance using ssh. After successful login, move the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from
~/.ssh/
to/<storagepath>/nuxeo/ssh/
- Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
- Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in
~/deploy/config/
) - Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script
Scripts:
#!/bin/bash ssh <<adminuser>>@<<ocr.appliance.name>> "nice -n 10 /path/to/ocr-command <<parameters>> $1 $2""
Make sure you replace <<adminuser>> with the correct user name of an administrative user in the OCR appliance, and <<ocr.appliance.name>> with the proper FQND or IP address.
Deploy script changes:
Make sure that in the "commands.conf" file of the auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [${1} = "NUXEO"] then"
) so as to be added the container run command:
add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh" add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr"