This page describes the setup of an independent OCR appliance for use with the Extended DMS environment. DMS will access the OCR appliance and issue commands so that OCR is performed. The advantages are that the DMS appliance's resources (CPU, RAM) are thus not consumed by the OCR processing.
The below instructions describe the setup of the appliance on DMS host machine. Probably it will not affect performance of DMS, because it's only making small http calls to google API, no processing done with these scripts
Create 2 permanent (can be done with adding lines to /etc/environment) environment variables (replace $gcloud_storage_bucket_name and $path_to_json with actual values):
GCLOUD_OCR_BUCKET=$gcloud_storage_bucket_name
GOOGLE_APPLICATION_CREDENTIALS=$path_to_json
Download pi-gcloud-ocr.jar (Link will be provided after successful testing) to /opt/pi-gcloud-ocr.jar
DMS host needs /<storagepath>/nuxeo/data linked to /var/lib/nuxeo/data
and /<storagepath>/nuxeo/tmp
linked to /opt/nuxeo/server/tmp
/<storagepath>/nuxeo/scripts/
and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside of nuxeo container.<<adminuserhome>>/.ssh/authorized_keys
file of DMS host vm~/.ssh/
to /<storagepath>/nuxeo/ssh/
~/deploy/config/
)#!/usr/bin/env bash gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET} input_filename=$(basename $2) output_filename=$(basename $1) json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename}) echo "$json" > "$1" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json" |
piocr script
#!/bin/bash echo "variables $1 $2" ssh <<adminuser>>@<<dms.host.name>> "pi-google-ocr $1 $2" |
$1 is the path and filename of the output file, $2 is the path and filename of the input file that are handed over when the DMS calls the piocr script. The command in piocr script must be such that the OCR engine reads the input file (pointed to by $2) and writes to the output file (pointed to by $1).
Make sure you replace <<adminuser>> with correct user name of an administrative user in OCR appliance, and <<dms.host.name>> with proper FQDN or IP address.
"elif [ ${1} = "NUXEO" ] then"
) so as to be added the container run command.:add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh" add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr" |