We're Hiring!

OMERO and Google Cloud

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

OMERO and Google Cloud

Postby sherey » Thu Feb 23, 2017 1:45 pm

Hi all,

I am brand new to this group, thanks for the add :)

I am part of an US-based / NCI-funded "cancer genomics cloud" pilot project, and we have recently added a large set of images (radiology images in DICOM format and pathology images in SVS format) to our data repository in Google Cloud Storage (GCS). We are looking for ways that we could collaborate with the OME community to develop and encourage usage of this cloud-based resource. All of the images are in an open-access bucket. You can find more information in our documentation: http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/TCGA-images.html

I saw a few other posts by one or two groups using AWS -- I'd be interested in hearing more about your experience, or any other thoughts or recommendations you might have about using OMERO with data in GCS.

thanks!

Sheila
sherey
 
Posts: 3
Joined: Thu Feb 23, 2017 12:44 pm

Re: OMERO and Google Cloud

Postby jmoore » Fri Feb 24, 2017 2:46 pm

sherey wrote:I am brand new to this group, thanks for the add :)


The more, the merrier!

I am part of an US-based / NCI-funded "cancer genomics cloud" pilot project, and we have recently added a large set of images (radiology images in DICOM format and pathology images in SVS format) to our data repository in Google Cloud Storage (GCS).


Very nice! Do you have an estimate of the size of the collections? I can see the width/height metadata for the Aperio files, but nothing similar for DICOM.

We are looking for ways that we could collaborate with the OME community to develop and encourage usage of this cloud-based resource. All of the images are in an open-access bucket. You can find more information in our documentation: http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/TCGA-images.html


Options that I can think of which may be more or less useful to different members of our combined communities:

  • script access: having a simple wrapper (Python, etc) which takes a bucket name, downloads the image locally, and provides some minimal minimal toolbox might be a quick way to allow someone to run an analysis. A base docker image like https://hub.docker.com/r/fiji/fiji/ or https://hub.docker.com/r/openmicroscopy ... ts-octave/ may simplify this. This would prevent tools from needing to be rewritten for working with buckets. (Some investigation has taken place on that like https://github.com/dpwrussell/bfs3 but almost all tools still assume local access.)
  • The next more advanced step would be a BucketReader: this would allow tools to work with your data without needing to download it. It would also allow spinning up an OMERO in the cloud (though this could be done as well by downloading the buckets at some cost).
  • downloadable OMERO index: Once a single OMERO exists where your images have been imported, then that database could be used as a "visual index". A download of such an index would include thumbnails that users could use to preview the images and links back to the original data. This is part of the strategy we're employing with the Image Data Repository (IDR): http://biorxiv.org/content/early/2016/11/24/089359

I saw a few other posts by one or two groups using AWS -- I'd be interested in hearing more about your experience, or any other thoughts or recommendations you might have about using OMERO with data in GCS.


I'll let others from the community comment here, but there are a couple of proposals open at the moment for working on "Horizontal Scaling" for OMERO that you might want to take a look at. See https://github.com/openmicroscopy/design/issues?utf8=%E2%9C%93&q=is%3Aissue%20%22horizontal%20scaling%22

All the best,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: OMERO and Google Cloud

Postby sherey » Fri Feb 24, 2017 6:18 pm

Thanks Josh,

Regarding dataset sizes -- the DICOM dataset consists of ~20,000 series with a total of close to 1.5M images combined, but they seem pretty small /low-res to me as the entire dataset is only about 0.5 TB. The largest single .dcm file is about 17 MB, median size is ~0.5 MB. They are the TCGA images from the TCIA (cancerimagingarchive.net) if that means something to you. The SVS images on the other hand are much larger -- over 30k images, totaling about 17 TB (largest image ~5 GB, median ~250 MB).

I will look into / think about your other suggestions. Are you aware of "gcsfuse" which lets you mount a bucket and make it look like a file system (on a Linux or Mac OS X machine)? Latency can be an issue with that, though and it might be simpler to have a wrapper that downloads files sinc that's basically what gcsfuse is doing under the hood.

cheers,

Sheila
sherey
 
Posts: 3
Joined: Thu Feb 23, 2017 12:44 pm

Re: OMERO and Google Cloud

Postby jmoore » Mon Feb 27, 2017 11:44 am

Hi Sheila,

sherey wrote:Regarding dataset sizes -- the DICOM dataset consists of ~20,000 series with a total of close to 1.5M images combined, but they seem pretty small /low-res to me as the entire dataset is only about 0.5 TB. The largest single .dcm file is about 17 MB, median size is ~0.5 MB. They are the TCGA images from the TCIA (cancerimagingarchive.net) if that means something to you. The SVS images on the other hand are much larger -- over 30k images, totaling about 17 TB (largest image ~5 GB, median ~250 MB).


Thanks for the info.

I will look into / think about your other suggestions.


Great. Keep us posted.

Are you aware of "gcsfuse" which lets you mount a bucket and make it look like a file system (on a Linux or Mac OS X machine)? Latency can be an issue with that, though and it might be simpler to have a wrapper that downloads files since that's basically what gcsfuse is doing under the hood.


I've never used gcsfuse specifically, but the fuse pattern would certainly enable a number of workflows.

Cheers,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany


Return to User Discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron