Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
  2. Configure Google Cloud SDK on DMS host machine
  3. Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkeySave JSON as /opt/gcloud.
  4. Create 2 permanent (can be done with adding lines to /etc/environment) environment variables (replace $gcloud_storage_bucket_name and $path_to_json with actual values):

    1. GCLOUD_OCR_BUCKET=$gcloud_storage_bucket_name
    2. GOOGLE_APPLICATION_CREDENTIALS=$path_to_json
  5. Download pi-gcloud-ocr.jar (Link will be provided after successful testing) to /opt/pi-gcloud-ocr.jar

  6. DMS host needs /<storagepath>/nuxeo/data linked to /var/lib/nuxeo/data and /<storagepath>/nuxeo/tmp linked to /opt/nuxeo/server/tmp

  7. Place the below script with name "pi-google-ocr" to /usr/bin/ and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file.
  8. Place the below script with name "piocr" in /<storagepath>/nuxeo/scripts/ and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside of nuxeo container.
  9. Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into <<adminuserhome>>/.ssh/authorized_keys file of DMS host vm
  10. Login from DMS appliance to itself using ssh. After successful login, move the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from ~/.ssh/ to /<storagepath>/nuxeo/ssh/ 
  11. Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
  12. Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in ~/deploy/config/)
  13. Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script

Scripts:

  • pi-google-ocr (WIP bucket name must go to environment variable, same as google creds)
Code Block
languagebash
#!/usr/bin/env bash
gsutil cp "$2" gs://pi-dms-testbucket${GCLOUD_OCR_BUCKET}
input_filename=$(basename $2)
output_filename=$(basename $1)
export GOOGLE_APPLICATION_CREDENTIALS=/opt/gcloud.json
json=$(java -jar /opt/pi-google-ocr.jar gs://pi-dms-testbucket/$input_filename${GCLOUD_OCR_BUCKET}/${input_filename} gs://pi-dms-testbucket/$output_filename${GCLOUD_OCR_BUCKET}/${output_filename})
echo "$json"  > "$1"
gsutil rm "gs://pi-dms-testbucket/$input_filename${GCLOUD_OCR_BUCKET}/${input_filename}"
gsutil rm "gs://pi-dms-testbucket${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json"

...


Code Block
languagebash
#!/bin/bash
echo "variables $1 $2"
ssh <<adminuser>>@<<dms.host.name>> "piocrpi-google-ocr $1 $2"

$1 is the path and filename of the output file, $2 is the path and filename of the input file that are handed over when the DMS calls the piocr script. The command in piocr script must be such that the OCR engine reads the input file (pointed to by $2) and writes to the output file (pointed to by $1).

Make sure you replace <<adminuser>> with correct user name of an administrative user in OCR appliance, and <<dms.host.name>> with proper FQDN or IP address.

  • Make sure in "commands.conf" file of auto-deploy client specific repository, the following commands are added to the nuxeo container definition (under section "elif [ ${1} = "NUXEO" ] then") so as to be added the container run command.:


Code Block
  add_volume "/<storage_path>/nuxeo/ssh" "/home/nuxeo/.ssh"
  add_volume "/<storage_path>/nuxeo/scripts/piocr" "/usr/local/bin/piocr"