Page 1 of 1

in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Feb 03, 2016 3:24 pm
by rdecoster
What happens when you do an implace import with hard linking (-- --transfer=ln) at the moment when the log spits FILE_UPLOAD_STARTED ...
I had a 4,7G tif file which was uploading for 15 minutes. One would think that with hardlinking it would just take a split second to upload. Probably more than upload happens. Could someone explain what happens at this stage?
We have the binary repository mounted on a network volume and during this upload the cifs daemon is constant reading at around 10M/s. When the upload is done higher valueas are observed (up to 30M/s)

below is a snip of the log
thx in advance!
Raf

Code: Select all
2016-02-03 15:24:34,674 3208       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2016-02-03 15:24:35,611 4145       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2016-02-03 15:24:36,144 4678       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.2.1
2016-02-03 15:24:36,144 4678       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.2.1-ice35-b15
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 1.7.0_79
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.16.0-49-generic
2016-02-03 15:24:36,363 4897       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Call context: {omero.group:4}
2016-02-03 15:24:36,394 4928       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_PREPARATION
2016-02-03 15:24:37,237 5771       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_START
2016-02-03 15:24:37,266 5800       [      main] INFO   .importer.transfers.HardlinkFileTransfer - Transferring /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif...
2016-02-03 15:24:37,385 5919       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_STARTED: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
2016-02-03 15:40:01,676 930210     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_COMPLETE: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
2016-02-03 15:41:04,178 992712     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_END
2016-02-03 15:41:04,364 992898     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_STARTED Logfile: 7001
2016-02-03 15:41:06,092 994626     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 7001
2016-02-03 15:44:09,728 1178262    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 7001
2016-02-03 15:44:12,105 1180639    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 7001
2016-02-03 15:44:12,138 1180672    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 7001
2016-02-03 15:44:12,165 1180699    [.Client-17] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 7001
2016-02-03 15:44:12,435 1180969    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
Imported pixels:
3201
Other imported objects:
Fileset:2251
Image:3201
2016-02-03 15:44:12,436 1180970    [.Client-15] INFO      ome.formats.importer.cli.ErrorHandler - Number of errors: 0
2016-02-03 15:44:12,486 1181020    [      main] INFO       ome.formats.OMEROMetadataStoreClient - Call context: {omero.group:4}

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Feb 03, 2016 4:57 pm
by cblackburn
Hi Raf,

rdecoster wrote:What happens when you do an implace import with hard linking (-- --transfer=ln) at the moment when the log spits FILE_UPLOAD_STARTED ...
I had a 4,7G tif file which was uploading for 15 minutes. One would think that with hardlinking it would just take a split second to upload. Probably more than upload happens. Could someone explain what happens at this stage?


Yes, more than just an upload happens during this phase of the import. In order to ensure the integrity of the uploaded file it is checksummed before upload (client-side) and then again after upload (server-side). This means parsing the file twice before the server-side import starts. There are some details of the import workflow here:

http://www.openmicroscopy.org/site/supp ... t-overview

However, the checksum algorithm you use can be configured to speed up this process and so a faster algorithm may be more applicable for hard-linked imports. The defaul algorithm on a vanilla system is SHA1-160 and this is relatively slow. The fastest checksum is File-Size-64 though this really does just check the file size. See:

http://www.openmicroscopy.org/site/supp ... #checksums

and

http://www.openmicroscopy.org/site/supp ... ng-started

Some of these advanced import options are also available via the
Code: Select all
  --skip {all,checksum,minmax,thumbnails,upgrade}
                                        Optional step to skip during import

option, see:

http://www.openmicroscopy.org/site/supp ... mport.html

I'd be interested to hear of your experience using theseoptions.

Cheers,

Colin

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Feb 03, 2016 8:08 pm
by rdecoster
Since we are talking about a hard link on file system level, one could skipp checksum all together without any problems, no?
I will set it to the fastest checksum available at present.
I now had a 3 files tifs to import: roughy 12GB, 9GB and 5GB. It took 1h48.
I'll run the test again with the other checksum algorithm and let you know.

thx for your response.
Best,
Raf

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Feb 03, 2016 8:36 pm
by rdecoster
Same set of files now get's imported in 13 min! I'm impressed ... :)

Is the import checksum test related to the FilenameExclusion checksum? Do they use the same algorithm?

Code: Select all
2016-02-03 20:53:00,695 67191      [      main] INFO   ts.importer.exclusions.FilenameExclusion - Checksum match for filename: Ctrl7_run2_1704_stack.tif


Cheers,
Raf

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Thu Feb 04, 2016 8:25 am
by cblackburn
Hi Raf,

rdecoster wrote:Same set of files now get's imported in 13 min! I'm impressed ... :)


I glad to hear that!

rdecoster wrote:Is the import checksum test related to the FilenameExclusion checksum? Do they use the same algorithm?

Code: Select all
2016-02-03 20:53:00,695 67191      [      main] INFO   ts.importer.exclusions.FilenameExclusion - Checksum match for filename: Ctrl7_run2_1704_stack.tif



Yes, they use the same algorithm. The algorithm used at the upload stage is stored in the database along with the checksum so that any future checks are comparing like with like.

Cheers,

Colin

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Aug 17, 2016 3:30 pm
by ehrenfeu
Hi Colin et al,

I'm wondering whether it would make sense to have the checksumming disabled by default for hardlink imports (or even inplace imports in general). We just realized now that we kind of wasted hours running a large hardlink-import, additionally creating quite some load on our storage for doing rather useless checksums.

Cheers, and thanks a lot for this useful thread!! :)
~Niko

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Wed Aug 17, 2016 4:11 pm
by cblackburn
Hi Niko,

ehrenfeu wrote:I'm wondering whether it would make sense to have the checksumming disabled by default for hardlink imports (or even inplace imports in general). We just realized now that we kind of wasted hours running a large hardlink-import, additionally creating quite some load on our storage for doing rather useless checksums.


It's certainly something that has merit and that we should consider. I'll raise it for discussion with some of the team and feedback to you as soon as I can.

Cheers, and thanks a lot for this useful thread!! :)


No problem!

Cheers,

Colin

Re: in-place import and FILE_UPLOAD_STARTED

PostPosted: Thu Aug 18, 2016 7:20 am
by ehrenfeu
Thanks, Colin! 8-)