We're Hiring!

in-place import and FILE_UPLOAD_STARTED

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

in-place import and FILE_UPLOAD_STARTED

Postby rdecoster » Wed Feb 03, 2016 3:24 pm

What happens when you do an implace import with hard linking (-- --transfer=ln) at the moment when the log spits FILE_UPLOAD_STARTED ...
I had a 4,7G tif file which was uploading for 15 minutes. One would think that with hardlinking it would just take a split second to upload. Probably more than upload happens. Could someone explain what happens at this stage?
We have the binary repository mounted on a network volume and during this upload the cifs daemon is constant reading at around 10M/s. When the upload is done higher valueas are observed (up to 30M/s)

below is a snip of the log
thx in advance!
Raf

Code: Select all
2016-02-03 15:24:34,674 3208       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Attempting initial SSL connection to localhost:4064
2016-02-03 15:24:35,611 4145       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Insecure connection requested, falling back
2016-02-03 15:24:36,144 4678       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Server: 5.2.1
2016-02-03 15:24:36,144 4678       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Client: 5.2.1-ice35-b15
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Java Version: 1.7.0_79
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Name: Linux
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Arch: amd64
2016-02-03 15:24:36,145 4679       [      main] INFO       ome.formats.OMEROMetadataStoreClient - OS Version: 3.16.0-49-generic
2016-02-03 15:24:36,363 4897       [      main] INFO       ome.formats.OMEROMetadataStoreClient - Call context: {omero.group:4}
2016-02-03 15:24:36,394 4928       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_PREPARATION
2016-02-03 15:24:37,237 5771       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_START
2016-02-03 15:24:37,266 5800       [      main] INFO   .importer.transfers.HardlinkFileTransfer - Transferring /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif...
2016-02-03 15:24:37,385 5919       [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_STARTED: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
2016-02-03 15:40:01,676 930210     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILE_UPLOAD_COMPLETE: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
2016-02-03 15:41:04,178 992712     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - FILESET_UPLOAD_END
2016-02-03 15:41:04,364 992898     [      main] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_STARTED Logfile: 7001
2016-02-03 15:41:06,092 994626     [l.Client-0] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_IMPORTED Step: 1 of 5  Logfile: 7001
2016-02-03 15:44:09,728 1178262    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - PIXELDATA_PROCESSED Step: 2 of 5  Logfile: 7001
2016-02-03 15:44:12,105 1180639    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - THUMBNAILS_GENERATED Step: 3 of 5  Logfile: 7001
2016-02-03 15:44:12,138 1180672    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - METADATA_PROCESSED Step: 4 of 5  Logfile: 7001
2016-02-03 15:44:12,165 1180699    [.Client-17] INFO   ormats.importer.cli.LoggingImportMonitor - OBJECTS_RETURNED Step: 5 of 5  Logfile: 7001
2016-02-03 15:44:12,435 1180969    [.Client-15] INFO   ormats.importer.cli.LoggingImportMonitor - IMPORT_DONE Imported file: /media/GBW-0004_CMEVIB_OMERO/0002_PAVE/Ann_Geens/Big files/Ctrl7_run2_1704_stack.tif
Imported pixels:
3201
Other imported objects:
Fileset:2251
Image:3201
2016-02-03 15:44:12,436 1180970    [.Client-15] INFO      ome.formats.importer.cli.ErrorHandler - Number of errors: 0
2016-02-03 15:44:12,486 1181020    [      main] INFO       ome.formats.OMEROMetadataStoreClient - Call context: {omero.group:4}
rdecoster
 
Posts: 21
Joined: Mon Feb 01, 2016 11:55 am

Re: in-place import and FILE_UPLOAD_STARTED

Postby cblackburn » Wed Feb 03, 2016 4:57 pm

Hi Raf,

rdecoster wrote:What happens when you do an implace import with hard linking (-- --transfer=ln) at the moment when the log spits FILE_UPLOAD_STARTED ...
I had a 4,7G tif file which was uploading for 15 minutes. One would think that with hardlinking it would just take a split second to upload. Probably more than upload happens. Could someone explain what happens at this stage?


Yes, more than just an upload happens during this phase of the import. In order to ensure the integrity of the uploaded file it is checksummed before upload (client-side) and then again after upload (server-side). This means parsing the file twice before the server-side import starts. There are some details of the import workflow here:

http://www.openmicroscopy.org/site/supp ... t-overview

However, the checksum algorithm you use can be configured to speed up this process and so a faster algorithm may be more applicable for hard-linked imports. The defaul algorithm on a vanilla system is SHA1-160 and this is relatively slow. The fastest checksum is File-Size-64 though this really does just check the file size. See:

http://www.openmicroscopy.org/site/supp ... #checksums

and

http://www.openmicroscopy.org/site/supp ... ng-started

Some of these advanced import options are also available via the
Code: Select all
  --skip {all,checksum,minmax,thumbnails,upgrade}
                                        Optional step to skip during import

option, see:

http://www.openmicroscopy.org/site/supp ... mport.html

I'd be interested to hear of your experience using theseoptions.

Cheers,

Colin
cblackburn
 
Posts: 85
Joined: Mon May 25, 2009 9:03 pm

Re: in-place import and FILE_UPLOAD_STARTED

Postby rdecoster » Wed Feb 03, 2016 8:08 pm

Since we are talking about a hard link on file system level, one could skipp checksum all together without any problems, no?
I will set it to the fastest checksum available at present.
I now had a 3 files tifs to import: roughy 12GB, 9GB and 5GB. It took 1h48.
I'll run the test again with the other checksum algorithm and let you know.

thx for your response.
Best,
Raf
rdecoster
 
Posts: 21
Joined: Mon Feb 01, 2016 11:55 am

Re: in-place import and FILE_UPLOAD_STARTED

Postby rdecoster » Wed Feb 03, 2016 8:36 pm

Same set of files now get's imported in 13 min! I'm impressed ... :)

Is the import checksum test related to the FilenameExclusion checksum? Do they use the same algorithm?

Code: Select all
2016-02-03 20:53:00,695 67191      [      main] INFO   ts.importer.exclusions.FilenameExclusion - Checksum match for filename: Ctrl7_run2_1704_stack.tif


Cheers,
Raf
rdecoster
 
Posts: 21
Joined: Mon Feb 01, 2016 11:55 am

Re: in-place import and FILE_UPLOAD_STARTED

Postby cblackburn » Thu Feb 04, 2016 8:25 am

Hi Raf,

rdecoster wrote:Same set of files now get's imported in 13 min! I'm impressed ... :)


I glad to hear that!

rdecoster wrote:Is the import checksum test related to the FilenameExclusion checksum? Do they use the same algorithm?

Code: Select all
2016-02-03 20:53:00,695 67191      [      main] INFO   ts.importer.exclusions.FilenameExclusion - Checksum match for filename: Ctrl7_run2_1704_stack.tif



Yes, they use the same algorithm. The algorithm used at the upload stage is stored in the database along with the checksum so that any future checks are comparing like with like.

Cheers,

Colin
cblackburn
 
Posts: 85
Joined: Mon May 25, 2009 9:03 pm

Re: in-place import and FILE_UPLOAD_STARTED

Postby ehrenfeu » Wed Aug 17, 2016 3:30 pm

Hi Colin et al,

I'm wondering whether it would make sense to have the checksumming disabled by default for hardlink imports (or even inplace imports in general). We just realized now that we kind of wasted hours running a large hardlink-import, additionally creating quite some load on our storage for doing rather useless checksums.

Cheers, and thanks a lot for this useful thread!! :)
~Niko
User avatar
ehrenfeu
 
Posts: 90
Joined: Fri May 11, 2012 8:21 am
Location: Basel, Switzerland

Re: in-place import and FILE_UPLOAD_STARTED

Postby cblackburn » Wed Aug 17, 2016 4:11 pm

Hi Niko,

ehrenfeu wrote:I'm wondering whether it would make sense to have the checksumming disabled by default for hardlink imports (or even inplace imports in general). We just realized now that we kind of wasted hours running a large hardlink-import, additionally creating quite some load on our storage for doing rather useless checksums.


It's certainly something that has merit and that we should consider. I'll raise it for discussion with some of the team and feedback to you as soon as I can.

Cheers, and thanks a lot for this useful thread!! :)


No problem!

Cheers,

Colin
cblackburn
 
Posts: 85
Joined: Mon May 25, 2009 9:03 pm

Re: in-place import and FILE_UPLOAD_STARTED

Postby ehrenfeu » Thu Aug 18, 2016 7:20 am

Thanks, Colin! 8-)
User avatar
ehrenfeu
 
Posts: 90
Joined: Fri May 11, 2012 8:21 am
Location: Basel, Switzerland


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest