Page 3 of 3

Re: import speed

PostPosted: Fri Feb 04, 2011 1:33 pm
by helmerj
cxallan wrote:Hi Juergen,

That's sort of telling then. I'd like to make sure we're not barking up completely the wrong tree here. So, it would be good to have a look at the importer logs. PostgreSQL CPU usage is sort of expected, the INSERT volume is reasonable and a lot of that is going to be I/O wait rather than actual processing overhead. Looking at the server logs for the INSERTs they're consistently, in this rather high load test < 100ms per image. That's less than 2 of your 18+ minutes for the entire 1402 images.

Adding --debug to importer-cli or --debug=ALL to bin/omero import and then piping that to a file for each worker would be perfect.

-Chris


Hi Chris,

I will try that. Unfortunately as of now I am not able to get the stdout of each worker into a file. Whatever I do it does end up in the console in which I did start the workers. But there must be a way. So let me try a bit more.

Cheers Juergen

Re: import speed

PostPosted: Fri Feb 04, 2011 4:42 pm
by cxallan
That's likely because most, if not all, of the logging will be going to STDERR. If you're executing with a subshell, something like the following should work:

Code: Select all
./importer-cli ... file &> out.log


If you're using popen(), popen3() or similar you'll need to open a file handle for both STDOUT and STDERR redirection.

Re: import speed

PostPosted: Fri Feb 04, 2011 5:37 pm
by helmerj
Hi

I tried that already...

1> file.log
2> file.log
1&2> file.log
&> file.log

all dont work

I also tried to get the output from the worker console but all it does is giving me the command it calls but not the out put of that...

Cheers juergen

Re: import speed

PostPosted: Fri Feb 04, 2011 8:30 pm
by jmoore
Hi Juergen,

If you look at "bin/omero import --help", you'll see two parameters:
Code: Select all
  ---file [FILE]      File for storing the standard out of the Java process
  ---errs [ERRS]      File for storing the standard err of the Java process

which DropBox uses to capture this output.
Cheers,
~Josh.

Re: import speed

PostPosted: Sat Feb 05, 2011 8:26 pm
by helmerj
Hi!

So I made it with the help of my colleague. These are the log files, each import job done by the 6 workers started its own log file, so the 7th log is the second of the first worker...

Download them here:

http://home.arcor.de/helmerj/omero_logs.tar.bz2

Cheers Juergen

Re: import speed

PostPosted: Thu Feb 10, 2011 5:42 pm
by cxallan
Hi Juergen,

Had a good look at your numbers. If you'd like to see the performance summaries you can do so with a little script under components/tools/OmeroPy/src/omero/util/importperf.py which will tell you exactly how long each step took to complete.

An example:

Code: Select all
$ ~/code/ome.git/components/tools/OmeroPy/src/omero/util/importperf.py 1583.stderr.log
Import(20100825 rbilly_A01_w1E3AF3878-C79B-4C1C-B5C7-E98DA18167AF.ome.tif) 0 start: 2011-02-05 20:55:39.28 end: 2011-02-05 20:55:49.14 elapsed: 9.867sec
setId() start: 2011-02-05 20:55:39.28 end: 2011-02-05 20:55:42.29 elapsed: 3.016sec
Post process start: 2011-02-05 20:55:42.42 end: 2011-02-05 20:55:42.49 elapsed: 0.0690000000031sec
Save to DB start: 2011-02-05 20:55:42.49 end: 2011-02-05 20:55:44.88 elapsed: 2.389sec
Image I/O start: 2011-02-05 20:55:44.97 end: 2011-02-05 20:55:45.78 elapsed: 0.810000000012sec
Overlays start: 2011-02-05 20:55:47.68 end: 2011-02-05 20:55:47.68 elapsed: 0.00100000000384sec
Thumbnailing start: 2011-02-05 20:55:47.68 end: 2011-02-05 20:55:49.14 elapsed: 1.46799999999sec
Import(20100825 rbilly_A01_w29CE10270-E29D-4FB6-A3AE-A0A539164A1D.ome.tif) 1 start: 2011-02-05 20:55:49.16 end: 2011-02-05 20:55:54.86 elapsed: 5.7sec
setId() start: 2011-02-05 20:55:49.16 end: 2011-02-05 20:55:50.93 elapsed: 1.769sec
Post process start: 2011-02-05 20:55:50.95 end: 2011-02-05 20:55:50.95 elapsed: 0.00100000000384sec
Save to DB start: 2011-02-05 20:55:50.95 end: 2011-02-05 20:55:51.74 elapsed: 0.792000000001sec
Image I/O start: 2011-02-05 20:55:51.79 end: 2011-02-05 20:55:52.30 elapsed: 0.513999999996sec
Overlays start: 2011-02-05 20:55:53.91 end: 2011-02-05 20:55:53.91 elapsed: 0.00099999998929sec
Thumbnailing start: 2011-02-05 20:55:53.91 end: 2011-02-05 20:55:54.86 elapsed: 0.952000000005sec
...


What's probably most telling is that you're spending a lot of time thumbnailing, and that really comes down to the amount of work that has to be done during thumbnail creation. Thumbnail creation, when required to be done on large numbers of images, is expecting to be done in a batch fashion. This batch thumbnail creation happens with all our imports of HCS data because none of them are ever detected as separate imports, where yours are.

I have a ticket in the next sprint (which will be completed inside the next 2 weeks) to provide command line importer functionality so that you can turn thumbnail creation during import off. This will mean that thumbnails will be created when you first view the image and not at import. The ticket:

http://trac.openmicroscopy.org.uk/ome/ticket/4339

Once we get that done it's going to start to become harder and harder to eek performance out of things with the strategy you're using and we might be best served helping you associate your OME-TIFF files so that the imports will be detected and happen in batches automatically.

Re: import speed

PostPosted: Fri Feb 11, 2011 8:02 am
by helmerj
Hi,

it would be great if that would be possible. Due the shear number of files, any time we can save would have a huge impact. Right now we have moved our testing to a more powerful machine, which will be the working horse during our beta testing phase (8 core mac pro 8GB RAM). We can import 5000 images in batches of 100 images in 16 minutes, which would mean we can import 80.000 images in just over 4 hours. I will have to test if the upscaling truly is linear but that is the ball park. Speed will slightly improve once we move the postgresql server to an external machine and use a raid file system but I am afraid that those changes will have only minor impact on the overall speed of import.

In addition we did notice, that java classes are loaded for each and every image resulting in an overhang.

Code: Select all
800ms - ome.formats.importer.cli.ErrorHandler  - SCANNING: Depth:0
Num:    5 Tot:  100 File: F5B3CE43.ome.tif
700ms - loci.common.services.ServiceFactory  - Added interface
interface loci.formats.services.MDBService and implementation class
loci.formats.services.MDBServiceImpl


Not being too familiar with you code we were wondering what are those steps for and why do they take so long?

As a matter of fact the images are all the same. The only difference being the filename (on disk and meta-data), UUID and of course pixel-data. I woudl ove to import them as a set letting the importer know that they are HCS data. Question is how would I do that?

Is there a OMERO routine I could call after import that would generate missing thumbnails? I think it would be good to not do that during first view, being a rather disappointing user experience leaving the user waiting for the generation of 80.000 to 100.000 thumbnails...

Thanks for opening that ticket!!!

Cheers Juergen

Re: import speed

PostPosted: Tue Feb 22, 2011 12:12 pm
by ajpatterson
Hello Juergen,

We have a couple of examples of how to structure HCS data:

http://www.ome-xml.org/browser/Document ... p4w2ws.ome
http://www.ome-xml.org/browser/Document ... 6.ome.tiff
http://www.ome-xml.org/browser/Document ... p4w2ws.ome
http://www.ome-xml.org/browser/Document ... 6.ome.tiff

There are other general OME-XML samples in:
http://www.ome-xml.org/browser/Document ... s/2010-06/

These are very small files designed to show the structure of the metadata and be small enough to read in a text editor. They are 6 by 4 pixel images, that have a black and white pattern in the first 2 columns, followed by a grey scale pattern in the next 4 columns.

They have the entire tiny 4 well plate stored in a single tiff file. In you real examples you would be using multiple tiff files. From talking to people the best general approach would seam to be to store all the images associated with a single well in a single tiff file.

If you have any questions about the structure please get back to me.

Cheers,

Andrew

Re: import speed

PostPosted: Wed Feb 23, 2011 9:10 pm
by mlinkert
Hi Juergen,

In addition we did notice, that java classes are loaded for each and every image resulting in an overhang.

Code: Select all
    800ms - ome.formats.importer.cli.ErrorHandler  - SCANNING: Depth:0
    Num:    5 Tot:  100 File: F5B3CE43.ome.tif
    700ms - loci.common.services.ServiceFactory  - Added interface
    interface loci.formats.services.MDBService and implementation class
    loci.formats.services.MDBServiceImpl


Not being too familiar with you code we were wondering what are those steps for and why do they take so long?


Thanks for catching this. I've just put a fix in Bio-Formats so that the loci.formats.services.* classes are only loaded once per JVM instance. See here:

http://git.openmicroscopy.org/?p=biofor ... f09d41009f

As noted in the commit message, this gives a sizable performance improvement in use cases like yours, where many thousands of files are being imported in a single JVM instance. This fix will eventually make it into the importer (no later than the next stable release), at which point your imports should be several minutes faster.

Regards,
-Melissa