...
- Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
- Configure Google Cloud SDK on DMS host machine: (i) see https://cloud.google.com/sdk/install and note that if you installed your DMS from the base OVA image provided by Patrix/Practice Insight, you operating system is CentOS. (ii) Further see https://cloud.google.com/sdk/docs/initializing and https://cloud.google.com/sdk/docs/authorizing.
- Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkey
Create 2 permanent environment variables. For example, add the following 2 lines to
/etc/environment
:Code Block language bash GCLOUD_OCR_BUCKET=<$gcloud_storage_bucket_name> GOOGLE_APPLICATION_CREDENTIALS=<$path_to_json>
replace <$gcloud_storage_bucket_name> and <$path_to_json> with their resective values.
Download pi-gcloud-ocr.jar (Link will be provided after successful testing) and copy it to
/opt/pi-gcloud-ocr.jar
of the DMS VM host machine.Copy the below "pi-google-ocr" script to
/usr/bin/
of the DMS VM host machine and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file.- Copy the below "piocr" script to
/<storagepath>/nuxeo/scripts/
and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside the DMS. - Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into
<<adminuserhome>>/.ssh/authorized_keys
file of DMS host vm - Login from DMS appliance to itself using ssh. After successful login, move the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from
~/.ssh/
to/<storagepath>/nuxeo/ssh/
- Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
- Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in
~/deploy/config/
) - Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script
Scripts:
- pi-google-ocr
Code Block | ||||
---|---|---|---|---|
| ||||
#!/usr/bin/env bash gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET} input_filename=$(basename $2) output_filename=$(basename $1) json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename}) echo "$json" > "$1" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json" |
piocr script
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash echo "variables $1 $2" ssh <<adminuser>>@<<dms.host.name>> "pi-google-ocr $1 $2" |
$1 is the path and filename of the output file, $2 is the path and filename of the input file that are handed over when the DMS calls the piocr script. The command in piocr script must be such that the OCR engine reads the input file (pointed to by $2) and writes to the output file (pointed to by $1).
Make sure you replace <<adminuser>>
with the correct user name of an administrative user
of the DMS host VM, and <<dms.host.name>>
with
the FQDN or IP address of the DMS host VM.
Deploy script changes:
Make sure that in the "commands.conf" file of the auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [${1} = "NUXEO"] then"
) so as to be added the container run command
:
Code Block |
---|
add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh" add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr" |