Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
  2. Configure Google Cloud SDK on DMS host machine: (i) see https://cloud.google.com/sdk/install and note that if you installed your DMS from the base OVA image provided by Patrix/Practice Insight, you operating system is CentOS. (ii) Further see https://cloud.google.com/sdk/docs/initializing and https://cloud.google.com/sdk/docs/authorizing.
  3. Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkey
  4. Create 2 permanent environment variables. For example, add the following 2 lines to /etc/environment:

    Code Block
    languagebash
    GCLOUD_OCR_BUCKET=<$gcloud_storage_bucket_name>
    GOOGLE_APPLICATION_CREDENTIALS=<$path_to_json>

    replace <$gcloud_storage_bucket_name> and <$path_to_json> with their resective values

  5. Install Java Development Kit in the DMS VM by running

    Code Block
    languagebash
    yum install java-1.8.0-openjdk


  6. Download pi-gcloud-ocr.jar (https://pi-cdn-sg.s3.amazonaws.com/dms/pi-google-ocr.jar) and copy it to /opt/pi-gcloud-ocr.jar of the DMS VM host machine
  7. Copy the below "pi-google-ocr" script to /usr/bin/ of the DMS VM host machine and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file
  8. Copy the below "piocr" script to /<storagepath>/nuxeo/scripts/ and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside the DMS
  9. Create ssh key in DMS VM host machine using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into <<adminuserhome>>/.ssh/authorized_keys file of DMS VM host machine (i.e. the same box)
  10. DMS host needs /<storagepath>/nuxeo/data mounted to /var/lib/nuxeo/data and /<storagepath>/nuxeo/tmp mounted to /opt/nuxeo/server/tmp
  11. Login from DMS VM host machine to itself using ssh. After successful login, move the DMS VM host machine's key files "id_rsa", "id_rsa.pub" and "known_hosts" from ~/.ssh/ to /<storagepath>/nuxeo/ssh/ 
  12. Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
  13. Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in ~/deploy/config/)
  14. Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script

Scripts:

Code Block
languagebash
titlepi-google-ocr
#!/usr/bin/env bash
gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET}
input_filename=$(basename $2)
output_filename=$(basename $1)
json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename})
echo "$json"  > "$1"
gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}"
gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json"

...