Page 1 of 1

Olympus .oir subfiles

PostPosted: Tue Sep 05, 2017 8:13 am
by andalman
I'm using OME recently added support of .oir files -- thank you so much for adding support.

When a .oir file reachs a size larger than 1Gb, Olympus automatically begins splitting the file into sub-files, i.e. 'datafile.oir' will contain time points 0 through 399, then datafile_00001 will contain time points 400 through 799, etc...).

It would be great if Bioformats automatically detected these sub-files and treated them as representing one large data set, but in the meantime, I'm trying to work around this by simply opening each of the sub-files individually.

The problem is I'm having trouble reading the sub-files successfully. I'm working through python and I'd ideally like to use the very nice PIMS package BioformatsReader. The reader has no problem reading primary .oir data files, but when it attempts to open a sub-file I always get an exception. The exception occurs when setId is called on the ChannelSeparator reader. For some reason, this causes an out-of-memory exception no matter how large the heap.

If I attempt to use the python-bioformats package instead of pims, I can successfully read both primary datafiles and sub-files. But in this case, the returned data is floating point (instead of int16), and I'm not sure how to appropriate scale the data.

Any thoughts on:
1) why the reader PIMS is using is crashing, or
2) why python-bioformats returns floating point data, and how to determine the appropiate way to rescale the data?

As an separate aside, both python-bioformats and Bioformats command line toolkit often throw the following exception when working with .oir files:
[Fatal Error] :1:35: Character reference "&#0" is an invalid XML character.

Thanks,
Aaron

Re: Olympus .oir subfiles

PostPosted: Tue Sep 05, 2017 2:40 pm
by sbesson
Hi Aaaron

When a .oir file reachs a size larger than 1Gb, Olympus automatically begins splitting the file into sub-files, i.e. 'datafile.oir' will contain time points 0 through 399, then datafile_00001 will contain time points 400 through 799, etc...).

It would be great if Bioformats automatically detected these sub-files and treated them as representing one large data set, but in the meantime, I'm trying to work around this by simply opening each of the sub-files individually.


Interestingly, we have no example of such filesets in our data repository including the Olympus OIR samples we used to add support to the format a few months ago.

Do you have any representative files below 2GB that you would be able to share with us by uploading them at http://qa.openmicroscopy.org.uk/qa/upload/?

The problem is I'm having trouble reading the sub-files successfully. I'm working through python and I'd ideally like to use the very nice PIMS package BioformatsReader. The reader has no problem reading primary .oir data files, but when it attempts to open a sub-file I always get an exception. The exception occurs when setId is called on the ChannelSeparator reader. For some reason, this causes an out-of-memory exception no matter how large the heap.


For cross-reference purposes, the corresponding PIMS issue was raised in https://github.com/soft-matter/pims/issues/274.

Any thoughts on:
1) why the reader PIMS is using is crashing, or


Assuming the issue happens independently of the memory passed to the JVM as suggested above, can you reproduce the OOM using one of your sample files and the Bio-Formats command line tools?

2) why python-bioformats returns floating point data, and how to determine the appropiate way to rescale the data?


Which version of `python-bioformats` have you been you using for reading the data and do you have an example script showing the set of command to reproduce? Without knowing too much about it, this might be an issue associated with the file format detection and might be worth liaising with the CellProfiler team.

Best,
Sebastien

Re: Olympus .oir subfiles

PostPosted: Tue Sep 05, 2017 6:15 pm
by andalman
Thank you for the reply.

Per your request, I've now uploaded example files using the QA uploader. I've uploaded two files:
f10010.oir and the first subfile f10010_00001 (subfiles aren't named with a file extension, but they are oir). Both files are the same size (~1Gb). They were generated by Olympus's 2P Apollo scope.

Regarding replicating the PIMS error with the bioformats command line tools, I can do this. First I add .oir file extension to the sub-file. Then I run the latest version bfconvert:
Code: Select all
./bfconvert f10010_00001_ext_added.oir f10010_00001.tif

OIRReader initializing /data2/Data/MPzfish/drn_hb/f10010/f10010_00001_extadded.oir
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at loci.common.RandomAccessInputStream.readString(RandomAccessInputStream.java:559)
at loci.formats.in.OIRReader.readXMLBlock(OIRReader.java:538)
at loci.formats.in.OIRReader.initFile(OIRReader.java:250)
at loci.formats.FormatReader.setId(FormatReader.java:1397)
at loci.formats.ImageReader.setId(ImageReader.java:839)
at loci.formats.tools.ImageConverter.testConvert(ImageConverter.java:385)
at loci.formats.tools.ImageConverter.main(ImageConverter.java:884)


Note that bfconvert also crashes on all primary oir files I've tried, but with a different exception:
Code: Select all
./bfconvert f10010.oir f10010.tif

Exception in thread "main" java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 35; Character reference "&#0" is an invalid XML character.


I can successfully read primary files (f10010.oir) with python-bioformats, Fiji, and PIMS. If I load the first image (t=0, z=0) from the example primary file using Fiji or PIMS, the mean value in the image is 85 (min/max: 0,1023). However if I load the first image from the primary file using python-bioformats using the following python code snippet, then the mean value of the image is only 0.0012992 (min/max: 0.0 0.01561):
Code: Select all
import javabridge
import bioformats
javabridge.start_vm(class_path=bioformats.JARS)
fn = 'f10010.oir'
imgr = bioformats.ImageReader(fn)
img = imgr.read(c=0,z=0,t=0)
print img.mean(), img.min(), img.max()

(Perhaps I can use some meta data to determine the appropriate rescaling?)

With sub-files, I can only successfully read them using python-bioformats. PIMS throws the exception as described in the previous post and FIJI throws a distinct exception related to indexing. I can use the same code snippet as above to read f10010_00001 once I add the .oir file extension.

Thanks for your help.

Re: Olympus .oir subfiles

PostPosted: Tue Sep 05, 2017 6:19 pm
by andalman
Forgot to mention, I'm using the newest version of python-bioformats. In response to an issue I recently posted to github, python-bioformats began using bioformats 5.5 instead of 5.1.

-Aaron

Re: Olympus .oir subfiles

PostPosted: Tue Sep 05, 2017 7:34 pm
by andalman
One correction regarding the reading of oir sub-files:

I said in my last post that I could read oir sub-files with python-bioformats but not with PIMS or FIJI. However, I need to correct this.

It turns out that sub-file behavior is not consistent. I orginally test FIJI and python-bioformats with different example files, f10010_00001 and f10187_00001 respectively.

After further testing, it seems python-bioformats and FIJI (ImageJ) generally yield consistent results, but the result depends on the specific example file:

With f10187_00001 (with .oir added) both python-bioformats and FIJI successfully read the file. This is also true for the second subfile f10187_00002 of this dataset.

With f10010_00001 (with .oir added) they both give the following exception:
Exception in thread "Thread-0" java.lang.IllegalArgumentException: 0 must not be null and positive.
at ome.xml.model.primitives.PositiveInteger.<init>(PositiveInteger.java:48)
at loci.formats.MetadataTools.populatePixelsOnly(MetadataTools.java:291)
at loci.formats.MetadataTools.populateMetadata(MetadataTools.java:251)
at loci.formats.MetadataTools.populatePixels(MetadataTools.java:151)
at loci.formats.MetadataTools.populatePixels(MetadataTools.java:97)
at loci.formats.in.OIRReader.initFile(OIRReader.java:399)
at loci.formats.FormatReader.setId(FormatReader.java:1397)


And with f10188_00001, python-bioformats gives the following exceptio (I haven't tested FIJI):
Exception in thread "Thread-0" java.io.EOFException: Attempting to read beyond end of file.
at loci.common.NIOFileHandle.readInt(NIOFileHandle.java:378)
at loci.common.RandomAccessInputStream.readInt(RandomAccessInputStream.java:507)
at loci.formats.in.OIRReader.readXMLBlock(OIRReader.java:505)
at loci.formats.in.OIRReader.initFile(OIRReader.java:250)
at loci.formats.FormatReader.setId(FormatReader.java:1397)
Caused by: java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:498)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:355)
at loci.common.NIOFileHandle.readInt(NIOFileHandle.java:376)
... 4 more


It appears some of the kinks with reading .oir files (and sub-files) still need to be worked out. I would very much appreciate your assistence getting this fixed. I'm in the process of uploading the other example files I mentioned. All the files were produced by the same Olympus microscope+software.

Re: Olympus .oir subfiles

PostPosted: Thu Sep 07, 2017 3:49 pm
by sbesson
See viewtopic.php?f=13&t=8362&p=18504#p18502 for a continuation of this topic