We're Hiring!

import speed

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

Re: import speed

Postby helmerj » Fri Feb 04, 2011 1:33 pm

cxallan wrote:Hi Juergen,

That's sort of telling then. I'd like to make sure we're not barking up completely the wrong tree here. So, it would be good to have a look at the importer logs. PostgreSQL CPU usage is sort of expected, the INSERT volume is reasonable and a lot of that is going to be I/O wait rather than actual processing overhead. Looking at the server logs for the INSERTs they're consistently, in this rather high load test < 100ms per image. That's less than 2 of your 18+ minutes for the entire 1402 images.

Adding --debug to importer-cli or --debug=ALL to bin/omero import and then piping that to a file for each worker would be perfect.

-Chris


Hi Chris,

I will try that. Unfortunately as of now I am not able to get the stdout of each worker into a file. Whatever I do it does end up in the console in which I did start the workers. But there must be a way. So let me try a bit more.

Cheers Juergen
User avatar
helmerj
 
Posts: 41
Joined: Mon Sep 20, 2010 9:39 am
Location: Berlin, Germany

Re: import speed

Postby cxallan » Fri Feb 04, 2011 4:42 pm

That's likely because most, if not all, of the logging will be going to STDERR. If you're executing with a subshell, something like the following should work:

Code: Select all
./importer-cli ... file &> out.log


If you're using popen(), popen3() or similar you'll need to open a file handle for both STDOUT and STDERR redirection.
cxallan
Site Admin
 
Posts: 509
Joined: Fri May 01, 2009 8:07 am

Re: import speed

Postby helmerj » Fri Feb 04, 2011 5:37 pm

Hi

I tried that already...

1> file.log
2> file.log
1&2> file.log
&> file.log

all dont work

I also tried to get the output from the worker console but all it does is giving me the command it calls but not the out put of that...

Cheers juergen
User avatar
helmerj
 
Posts: 41
Joined: Mon Sep 20, 2010 9:39 am
Location: Berlin, Germany

Re: import speed

Postby jmoore » Fri Feb 04, 2011 8:30 pm

Hi Juergen,

If you look at "bin/omero import --help", you'll see two parameters:
Code: Select all
  ---file [FILE]      File for storing the standard out of the Java process
  ---errs [ERRS]      File for storing the standard err of the Java process

which DropBox uses to capture this output.
Cheers,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: import speed

Postby helmerj » Sat Feb 05, 2011 8:26 pm

Hi!

So I made it with the help of my colleague. These are the log files, each import job done by the 6 workers started its own log file, so the 7th log is the second of the first worker...

Download them here:

http://home.arcor.de/helmerj/omero_logs.tar.bz2

Cheers Juergen
User avatar
helmerj
 
Posts: 41
Joined: Mon Sep 20, 2010 9:39 am
Location: Berlin, Germany

Re: import speed

Postby cxallan » Thu Feb 10, 2011 5:42 pm

Hi Juergen,

Had a good look at your numbers. If you'd like to see the performance summaries you can do so with a little script under components/tools/OmeroPy/src/omero/util/importperf.py which will tell you exactly how long each step took to complete.

An example:

Code: Select all
$ ~/code/ome.git/components/tools/OmeroPy/src/omero/util/importperf.py 1583.stderr.log
Import(20100825 rbilly_A01_w1E3AF3878-C79B-4C1C-B5C7-E98DA18167AF.ome.tif) 0 start: 2011-02-05 20:55:39.28 end: 2011-02-05 20:55:49.14 elapsed: 9.867sec
setId() start: 2011-02-05 20:55:39.28 end: 2011-02-05 20:55:42.29 elapsed: 3.016sec
Post process start: 2011-02-05 20:55:42.42 end: 2011-02-05 20:55:42.49 elapsed: 0.0690000000031sec
Save to DB start: 2011-02-05 20:55:42.49 end: 2011-02-05 20:55:44.88 elapsed: 2.389sec
Image I/O start: 2011-02-05 20:55:44.97 end: 2011-02-05 20:55:45.78 elapsed: 0.810000000012sec
Overlays start: 2011-02-05 20:55:47.68 end: 2011-02-05 20:55:47.68 elapsed: 0.00100000000384sec
Thumbnailing start: 2011-02-05 20:55:47.68 end: 2011-02-05 20:55:49.14 elapsed: 1.46799999999sec
Import(20100825 rbilly_A01_w29CE10270-E29D-4FB6-A3AE-A0A539164A1D.ome.tif) 1 start: 2011-02-05 20:55:49.16 end: 2011-02-05 20:55:54.86 elapsed: 5.7sec
setId() start: 2011-02-05 20:55:49.16 end: 2011-02-05 20:55:50.93 elapsed: 1.769sec
Post process start: 2011-02-05 20:55:50.95 end: 2011-02-05 20:55:50.95 elapsed: 0.00100000000384sec
Save to DB start: 2011-02-05 20:55:50.95 end: 2011-02-05 20:55:51.74 elapsed: 0.792000000001sec
Image I/O start: 2011-02-05 20:55:51.79 end: 2011-02-05 20:55:52.30 elapsed: 0.513999999996sec
Overlays start: 2011-02-05 20:55:53.91 end: 2011-02-05 20:55:53.91 elapsed: 0.00099999998929sec
Thumbnailing start: 2011-02-05 20:55:53.91 end: 2011-02-05 20:55:54.86 elapsed: 0.952000000005sec
...


What's probably most telling is that you're spending a lot of time thumbnailing, and that really comes down to the amount of work that has to be done during thumbnail creation. Thumbnail creation, when required to be done on large numbers of images, is expecting to be done in a batch fashion. This batch thumbnail creation happens with all our imports of HCS data because none of them are ever detected as separate imports, where yours are.

I have a ticket in the next sprint (which will be completed inside the next 2 weeks) to provide command line importer functionality so that you can turn thumbnail creation during import off. This will mean that thumbnails will be created when you first view the image and not at import. The ticket:

http://trac.openmicroscopy.org.uk/ome/ticket/4339

Once we get that done it's going to start to become harder and harder to eek performance out of things with the strategy you're using and we might be best served helping you associate your OME-TIFF files so that the imports will be detected and happen in batches automatically.
cxallan
Site Admin
 
Posts: 509
Joined: Fri May 01, 2009 8:07 am

Re: import speed

Postby helmerj » Fri Feb 11, 2011 8:02 am

Hi,

it would be great if that would be possible. Due the shear number of files, any time we can save would have a huge impact. Right now we have moved our testing to a more powerful machine, which will be the working horse during our beta testing phase (8 core mac pro 8GB RAM). We can import 5000 images in batches of 100 images in 16 minutes, which would mean we can import 80.000 images in just over 4 hours. I will have to test if the upscaling truly is linear but that is the ball park. Speed will slightly improve once we move the postgresql server to an external machine and use a raid file system but I am afraid that those changes will have only minor impact on the overall speed of import.

In addition we did notice, that java classes are loaded for each and every image resulting in an overhang.

Code: Select all
800ms - ome.formats.importer.cli.ErrorHandler  - SCANNING: Depth:0
Num:    5 Tot:  100 File: F5B3CE43.ome.tif
700ms - loci.common.services.ServiceFactory  - Added interface
interface loci.formats.services.MDBService and implementation class
loci.formats.services.MDBServiceImpl


Not being too familiar with you code we were wondering what are those steps for and why do they take so long?

As a matter of fact the images are all the same. The only difference being the filename (on disk and meta-data), UUID and of course pixel-data. I woudl ove to import them as a set letting the importer know that they are HCS data. Question is how would I do that?

Is there a OMERO routine I could call after import that would generate missing thumbnails? I think it would be good to not do that during first view, being a rather disappointing user experience leaving the user waiting for the generation of 80.000 to 100.000 thumbnails...

Thanks for opening that ticket!!!

Cheers Juergen
User avatar
helmerj
 
Posts: 41
Joined: Mon Sep 20, 2010 9:39 am
Location: Berlin, Germany

Re: import speed

Postby ajpatterson » Tue Feb 22, 2011 12:12 pm

Hello Juergen,

We have a couple of examples of how to structure HCS data:

http://www.ome-xml.org/browser/Document ... p4w2ws.ome
http://www.ome-xml.org/browser/Document ... 6.ome.tiff
http://www.ome-xml.org/browser/Document ... p4w2ws.ome
http://www.ome-xml.org/browser/Document ... 6.ome.tiff

There are other general OME-XML samples in:
http://www.ome-xml.org/browser/Document ... s/2010-06/

These are very small files designed to show the structure of the metadata and be small enough to read in a text editor. They are 6 by 4 pixel images, that have a black and white pattern in the first 2 columns, followed by a grey scale pattern in the next 4 columns.

They have the entire tiny 4 well plate stored in a single tiff file. In you real examples you would be using multiple tiff files. From talking to people the best general approach would seam to be to store all the images associated with a single well in a single tiff file.

If you have any questions about the structure please get back to me.

Cheers,

Andrew
--
Andrew Patterson
Software Developer, Open Microscopy Environment
Wellcome Trust Centre for Gene Regulation and Expression
University of Dundee
ajpatterson
 
Posts: 50
Joined: Fri May 01, 2009 11:18 am

Re: import speed

Postby mlinkert » Wed Feb 23, 2011 9:10 pm

Hi Juergen,

In addition we did notice, that java classes are loaded for each and every image resulting in an overhang.

Code: Select all
    800ms - ome.formats.importer.cli.ErrorHandler  - SCANNING: Depth:0
    Num:    5 Tot:  100 File: F5B3CE43.ome.tif
    700ms - loci.common.services.ServiceFactory  - Added interface
    interface loci.formats.services.MDBService and implementation class
    loci.formats.services.MDBServiceImpl


Not being too familiar with you code we were wondering what are those steps for and why do they take so long?


Thanks for catching this. I've just put a fix in Bio-Formats so that the loci.formats.services.* classes are only loaded once per JVM instance. See here:

http://git.openmicroscopy.org/?p=biofor ... f09d41009f

As noted in the commit message, this gives a sizable performance improvement in use cases like yours, where many thousands of files are being imported in a single JVM instance. This fix will eventually make it into the importer (no later than the next stable release), at which point your imports should be several minutes faster.

Regards,
-Melissa
User avatar
mlinkert
Team Member
 
Posts: 353
Joined: Fri May 29, 2009 2:12 pm
Location: Southwest Wisconsin

Previous

Return to User Discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron