Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
  2. Configure Google Cloud SDK on DMS host machine: (i) see https://cloud.google.com/sdk/install and note that if you installed your DMS from the base OVA image provided by Patrix/Practice Insight, you operating system is CentOS. (ii) Further see https://cloud.google.com/sdk/docs/initializing and https://cloud.google.com/sdk/docs/authorizing.
  3. Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkey
  4. Create 2 permanent environment variables. For example, add the following 2 lines to /etc/environment:

    Code Block
    languagebash
    GCLOUD_OCR_BUCKET=<$gcloud_storage_bucket_name>
    GOOGLE_APPLICATION_CREDENTIALS=<$path_to_json>

    replace <$gcloud_storage_bucket_name> and <$path_to_json> with their resective values

  5. Install Java Development Kit in the DMS VM by running

    Code Block
    languagebash
    yum install java-1.8.0-openjdk


  6. Download pi-gcloud-ocr.jar (https://pi-cdn-sg.s3.amazonaws.com/dmswww.pace-ip.com/edms/downloads/components/pi-google-ocr.jar) and copy it to /opt/pi-gcloud-ocr.jar of the DMS VM host machine.
  7. Copy the below "pi-google-ocr" script to /usr/bin/ of the DMS VM host machine and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file.
  8. Copy the below "piocr" script to /<storagepath>/nuxeo/scripts/ and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside the DMS.
  9. Create ssh key in DMS VM host machine using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into <<adminuserhome>>/.ssh/authorized_keys file of DMS VM host machine (i.e. the same box)
  10. DMS host needs c linked needs /<storagepath>/nuxeo/data mounted to /var/lib/nuxeo/data data and /<storagepath>/nuxeo/tmp linked to tmp mounted to /opt/nuxeo/server/tmp (see below section "Deploy Script changes")
  11. Login from DMS VM host machine to itself using ssh. After successful login, move the DMS VM host machine's key files "id_rsa", "id_rsa.pub" and "known_hosts" from ~/.ssh/ to /<storagepath>/nuxeo/ssh/ 
  12. Set permissions for the files in /<storagepath>/nuxeo/ssh/ folder to 1000:1000
  13. Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
  14. Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in ~/deploy/config/)
  15. Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script

Scripts:

Code Block
languagebash
titlepi-google-ocr
#!/usr/bin/env bash
gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET}
input_filename=$(basename $2)
output_filename=$(basename $1)
json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename})
echo "$json"  > "$1"
gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}"
gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json*"


Code Block
languagebash
titlepiocr
#!/bin/bash
echo "variables $1 $2"
touch ~/.ssh_config && ssh -F ~/.ssh_config <<adminuser>>@<<dms.host.name>> "pi-google-ocr $1 $2"

Make sure you replace <<adminuser>> with the correct user name of an administrative user of the DMS host VM, and <<dms.host.name>> with the FQDN or IP address of the DMS host VM.

Deploy script changes:

Make sure that in the "commands.conf" file of the auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [${1} = "NUXEO"] then") so as to be added the container run command:


Code Block
  add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh"
  add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr"