...
- Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
- Configure Google Cloud SDK on DMS host machine: (i) see https://cloud.google.com/sdk/install and note that if you installed your DMS from the base OVA image provided by Patrix/Practice Insight, you operating system is CentOS. (ii) Further see https://cloud.google.com/sdk/docs/initializing and https://cloud.google.com/sdk/docs/authorizing.
- Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkey
Create 2 permanent environment variables. For example, add the following 2 lines to
/etc/environment
:Code Block language bash GCLOUD_OCR_BUCKET=<$gcloud_storage_bucket_name> GOOGLE_APPLICATION_CREDENTIALS=<$path_to_json>
replace <$gcloud_storage_bucket_name> and <$path_to_json> with their resective values
Install Java Development Kit in the DMS VM by running
Code Block language bash yum install java-1.8.0-openjdk
- Download pi-gcloud-ocr.jar (https://pi-cdn-sg.s3.amazonaws.com/dmswww.pace-ip.com/edms/downloads/components/pi-google-ocr.jar) and copy it to
/opt/pi-gcloud-ocr.jar
of the DMS VM host machine. - Copy the below "pi-google-ocr" script to
/usr/bin/
of the DMS VM host machine and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file. - Copy the below "piocr" script to
/<storagepath>/nuxeo/scripts/
and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside the DMS. - Create ssh key in DMS VM host machine using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into
<<adminuserhome>>/.ssh/authorized_keys
file of DMS VM host machine (i.e. the same box) - DMS host needs
/<storagepath>/nuxeo/data mounted to /var/lib/nuxeo/data
and/<storagepath>/nuxeo/tmp
mounted to/opt/nuxeo/server/tmp
(see below section "Deploy Script changes") - Login from DMS VM host machine to itself using ssh. After successful login, move the DMS VM host machine's key files "id_rsa", "id_rsa.pub" and "known_hosts" from
~/.ssh/
to/<storagepath>/nuxeo/ssh/
- Set permissions for the files in /<storagepath>/nuxeo/ssh/ folder to 1000:1000
- Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
- Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in
~/deploy/config/
) - Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script
Scripts:
Code Block | ||||
---|---|---|---|---|
| ||||
#!/usr/bin/env bash gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET} input_filename=$(basename $2) output_filename=$(basename $1) json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename}) echo "$json" > "$1" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json*" |
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash echo "variables $1 $2" touch ~/.ssh_config && ssh -F ~/.ssh_config <<adminuser>>@<<dms.host.name>> "pi-google-ocr $1 $2" |
Make sure you replace <<adminuser>>
with the correct user name of an administrative user of the DMS host VM, and <<dms.host.name>>
with the FQDN or IP address of the DMS host VM.
Deploy script changes:
Make sure that in the "commands.conf" file of the auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [${1} = "NUXEO"] then"
) so as to be added the container run command:
Code Block |
---|
add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh" add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr" |