...
The below instructions describe the setup of the appliance on DMS host machine. Probably it will not affect performance of DMS, because it's only making small http calls to google API, no processing done with these scripts
Components and setup required:
- Configure Google Cloud account: enable Google Vision API there and create Google Cloud Storage bucket for temp files created while processing
- Configure Google Cloud SDK on DMS host machine
- Prepare service account key in JSON format for Google Cloud here: https://console.cloud.google.com/apis/credentials/serviceaccountkey
Create 2 permanent (can be done with adding lines to /etc/environment) environment variables (replace $gcloud_storage_bucket_name and $path_to_json with actual values):
GCLOUD_OCR_BUCKET=$gcloud_storage_bucket_name
GOOGLE_APPLICATION_CREDENTIALS=$path_to_json
Download pi-gcloud-ocr.jar (Link will be provided after successful testing) to /opt/pi-gcloud-ocr.jar
DMS host needs
/<storagepath>/nuxeo/data linked to /var/lib/nuxeo/data
and/<storagepath>/nuxeo/tmp
linked to /opt/nuxeo/server/tmp- Place the below script with name "pi-google-ocr" to /usr/bin/ and make it executable (chmod +x). This script contains the commands to drive the OCR process for each file.
- Place the below script with name "piocr" in
/<storagepath>/nuxeo/scripts/
and make it executable (chmod +x). This script contains the commands to access pi-google-ocr script from inside of nuxeo container. - Create ssh key in DMS appliance using "ssh-keygen" command, copy public key (id_rsa.pub) and paste into
<<adminuserhome>>/.ssh/authorized_keys
file of DMS host vm - Login from DMS appliance to itself using ssh. After successful login, move the DMS appliance's key files "id_rsa", "id_rsa.pub" and "known_hosts" from
~/.ssh/
to/<storagepath>/nuxeo/ssh/
- Set key "ocr.engine.name" in PAT_DMS_SETTINGS table of Patricia db to "piocr"
- Add below change commands to the "commands.conf" file of auto-deploy client specific repository (also found in
~/deploy/config/
) - Start re-deploy of DMS using ~/deploy/deploy_script/deploy.sh command as outlined here: Auto-deploy script
Scripts:
- pi-google-ocr
Code Block | ||
---|---|---|
| ||
#!/usr/bin/env bash gsutil cp "$2" gs://${GCLOUD_OCR_BUCKET} input_filename=$(basename $2) output_filename=$(basename $1) json=$(java -jar /opt/pi-google-ocr.jar gs://${GCLOUD_OCR_BUCKET}/${input_filename} gs://${GCLOUD_OCR_BUCKET}/${output_filename}) echo "$json" > "$1" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${input_filename}" gsutil rm "gs://${GCLOUD_OCR_BUCKET}/${output_filename}output-1-to-1.json" |
...