The overall concept of the new importer is modified from the old one by adding a staging path where all import documents are moved to at first and then, in a second step, are moved away into their final destinations based on the mapping file. This works much faster (file import to staging is about 5-10 times faster) than the usual import process (as it’s one open transaction for all import and not opening a transaction per file) and requires no file duplication as the files are then moved to their right destination paths using the import sweeper.
The instructions for the testing of the new importer are as follows:
./import.sh fs
”). This step will import all the files into a staging folder in nuxeo (/default-domain/Workspaces/Patricia/Import/
)/storage/nuxeo/data/import_mappings/
folder. This Import_mappings folder and its content must be owned by “1000:1000” user ("chown 1000:1000 /storage/nuxeo/data/import_mappings
"). Delimiter for mapping file is pipe (“|”)/default-domain/Workspaces/Patricia/Import/
to the correct paths according to the mapping file. You can monitor the progress on INFO level of the /storage/logs/nuxeo/server.log
. Users can use the system in production mode while import sweeper is running, however, the system will of course require some resource to move the data around. The import sweeper is relatively low footprint though.mapping.txt
' in /storage/nuxeo/data/import_mappings/
is being renamed to 'mapping.txt.completed
'
A word of caution regarding sweepers in general: It is possible, but not recommended to handle the sweeper process during production time. It’s possible to run the normal sweepers in parallel while the import sweeper is also working (and while users are working). This is, however, truly challenging from a CPU/memory/storage i/o perspective so we highly recommend to do the usual controlled sweeping process:
(a) turn off all sweepers and set do.not.modify.case.onupdate to TRUE, as well as do.not.ocr.onupdate to TRUE.
(b) Then when import sweeper is done, turn on email sweeper and wait until finished.
(c) Then turn on metadata sweeper,
(d) full textsync sweeper,
(e) preview and, finally,
(f) set do.not.modify.case.onupdate to FALSE again and turn on text extraction.
It’s highly recommendable to wait at least until step (d) has concluded before yet another import is run.