Page 1 of 3

MessageSize and server OOM (possibly) related problems

PostPosted: Tue Nov 05, 2013 2:06 pm
by dpwrussell
Public Continuation of discussions and OMERO QAs 7651 and 7659

Basically, there is a client side Ice::MemoryLimitException (See 7651) After this client error, the server continue to operate.

There is also a server side java.lang.OutOfMemoryError: Java heap space, which is likely related. I've uploaded 2 server OOM logs in a tar.bzip2 file here: https://www.openmicroscopy.org/qa2/qa/feedback/7693/

Existing relevant settings:

<property name="Ice.MessageSizeMax" value="131072"/>
<option>-Xmx2048M</option>
<option>-XX:MaxPermSize=1024M</option>

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Tue Nov 05, 2013 3:42 pm
by jmoore
Here are the OOMs in the provided log files:
Code: Select all
OOM.log:2013-10-08 07:39:34,525
OOM2.log:2013-10-14 17:34:59,807
OOM2.log:2013-10-14 17:35:08,463
OOM2.log:2013-10-14 17:42:03,802
OOM2.log:2013-10-14 17:45:36,377


Ignoring the one from the 8th, I noticed that this starts during a delete operation (perhaps by another user). Helio, can you try to describe specifically the steps that led to the crash? Also, when were the times of the other failures?

Cheers,
~Josh

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Tue Nov 05, 2013 4:25 pm
by hroque
The deletion operation was made by me at the same time. The deletion operation concluded successfully while the import failed.
I've tried these imports at different times during the day, but mostly in the evening, and leaving it running during night (have also done it during the day and it also fail). I've tried this with Importer and Insight-importer.
This import stays quite a lot of hours analyzing the data before trying to import something.
Sorry not being more precise but hope this helps.
Helio

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Wed Nov 06, 2013 11:49 am
by jmoore
Looking through the related QA feedback items, I can only assume that for now, it actually is going to require more memory to get this dataset in. If you can provide a heap dump of the OOM, then we can investigate how to prevent this in the future. You can activate heap dumps via:
Code: Select all
bin/omero admin deploy heap-dump

or
Code: Select all
bin/omero admin deploy heap-dump-tmp


See https://github.com/openmicroscopy/openmicroscopy/blob/v.4.4.9/etc/grid/templates.xml#L184

This will restart the server. On the next restart or the next call to:
Code: Select all
bin/omero admin deploy

(with no options), then heap dumps will be disabled to save disk space.

Cheers,
~Josh

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Wed Nov 06, 2013 4:09 pm
by dpwrussell
Ok, I've changed tthe settings to:
Code: Select all
<option>-Xmx16384M</option>
<option>-XX:MaxPermSize=1024M</option>
<property name="Ice.MessageSizeMax" value="524288"/>


I've left XX:MaxPermSize alone as obviously it's not the Classes themselves that's overloading memory.

To recap:

This should deal with the Ice::MemoryLimitException because the metadata is being sent as a single message that exceeds Ice's current limit? I guess something to look at would be breaking up large metadata into multiple messages or whatever Ice magic can give you.

The OOM still remains somewhat of a mystery, I've activated heap-dump so next time it happens we'll hopefully have more data. Although I guess now it may never happen if it's a concurrent problem and not a leak. At least not until more users start doing more stuff like this at the same time.

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Thu Nov 07, 2013 7:48 am
by jmoore
Thanks, Douglas. Sounds like a plan. FYI: in OMERO5, there will be a server-side import queue, so there should not be an uncontrolled number of "saveToDB" actions at any one time.

Cheers,
~Josh

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Thu Nov 07, 2013 9:42 am
by hroque
Hi all,

Redid the import and it failed again. Starting importing last evening around ~19h and it failed somewhere during the night. Server wasn't down from what I can tell.
Here is the error:

Code: Select all
Ice.ConnectionLostException
    error = 0
   at IceInternal.Outgoing.invoke(Outgoing.java:147)
   at omero.api._ServiceFactoryDelM.getAdminService(_ServiceFactoryDelM.java:627)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:705)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:677)
   at ome.formats.OMEROMetadataStoreClient.initializeServices(OMEROMetadataStoreClient.java:413)
   at ome.formats.OMEROMetadataStoreClient.createRoot(OMEROMetadataStoreClient.java:1049)
   at ome.formats.importer.ImportLibrary.importImage(ImportLibrary.java:769)
   at org.openmicroscopy.shoola.env.data.OMEROGateway.importImage(OMEROGateway.java:6736)
   at org.openmicroscopy.shoola.env.data.OmeroImageServiceImpl.importCandidates(OmeroImageServiceImpl.java:230)
   at org.openmicroscopy.shoola.env.data.OmeroImageServiceImpl.importFile(OmeroImageServiceImpl.java:1475)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.importFile(ImagesImporter.java:77)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.access$000(ImagesImporter.java:53)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter$1.doCall(ImagesImporter.java:102)
   at org.openmicroscopy.shoola.env.data.views.BatchCall.doStep(BatchCall.java:144)
   at org.openmicroscopy.shoola.util.concur.tasks.CompositeTask.doStep(CompositeTask.java:226)
   at org.openmicroscopy.shoola.env.data.views.CompositeBatchCall.doStep(CompositeBatchCall.java:126)
   at org.openmicroscopy.shoola.util.concur.tasks.ExecCommand.exec(ExecCommand.java:165)
   at org.openmicroscopy.shoola.util.concur.tasks.ExecCommand.run(ExecCommand.java:276)
   at org.openmicroscopy.shoola.util.concur.tasks.AsyncProcessor$Runner.run(AsyncProcessor.java:91)
   at java.lang.Thread.run(Thread.java:695)

   at org.openmicroscopy.shoola.env.data.OMEROGateway.importImage(OMEROGateway.java:6790)
   at org.openmicroscopy.shoola.env.data.OmeroImageServiceImpl.importCandidates(OmeroImageServiceImpl.java:230)
   at org.openmicroscopy.shoola.env.data.OmeroImageServiceImpl.importFile(OmeroImageServiceImpl.java:1475)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.importFile(ImagesImporter.java:77)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter.access$000(ImagesImporter.java:53)
   at org.openmicroscopy.shoola.env.data.views.calls.ImagesImporter$1.doCall(ImagesImporter.java:102)
   at org.openmicroscopy.shoola.env.data.views.BatchCall.doStep(BatchCall.java:144)
   at org.openmicroscopy.shoola.util.concur.tasks.CompositeTask.doStep(CompositeTask.java:226)
   at org.openmicroscopy.shoola.env.data.views.CompositeBatchCall.doStep(CompositeBatchCall.java:126)
   at org.openmicroscopy.shoola.util.concur.tasks.ExecCommand.exec(ExecCommand.java:165)
   at org.openmicroscopy.shoola.util.concur.tasks.ExecCommand.run(ExecCommand.java:276)
   at org.openmicroscopy.shoola.util.concur.tasks.AsyncProcessor$Runner.run(AsyncProcessor.java:91)
   at java.lang.Thread.run(Thread.java:695)
Caused by: Ice.ConnectionLostException
    error = 0
   at IceInternal.Outgoing.invoke(Outgoing.java:147)
   at omero.api._ServiceFactoryDelM.getAdminService(_ServiceFactoryDelM.java:627)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:705)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:677)
   at ome.formats.OMEROMetadataStoreClient.initializeServices(OMEROMetadataStoreClient.java:413)
   at ome.formats.OMEROMetadataStoreClient.createRoot(OMEROMetadataStoreClient.java:1049)
   at ome.formats.importer.ImportLibrary.importImage(ImportLibrary.java:769)
   at org.openmicroscopy.shoola.env.data.OMEROGateway.importImage(OMEROGateway.java:6736)
   ... 12 more

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Fri Nov 08, 2013 7:27 am
by jmoore
Could I get your ~/omero/log/omeroinsight.log file as well as the server logs and the heap dump if available? Thanks, ~Josh

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Fri Nov 08, 2013 10:11 am
by hroque
Not sure how to upload the files here!
It does not seem to work. Is there a size limit?
anyway here is a link to the log.

https://www.dropbox.com/s/mrd347eyjylyy ... nsight.log

Re: MessageSize and server OOM (possibly) related problems

PostPosted: Fri Nov 08, 2013 4:11 pm
by jmoore
Thanks for the log, Helio.
Code: Select all
...SNIP...
2013-11-06 18:58:01,300 INFO  [   ome.formats.importer.ImportCandidates] ( Thread-33) 74589 file(s) parsed into 1 group(s) with 2 call(s) to setId in 183097ms. (304743ms total) [1 unknowns]
2013-11-06 18:58:01,590 INFO  [       ome.formats.importer.ImportConfig] ( Thread-33) OMERO Version: 4.4.8-ice33-b256
2013-11-06 18:58:01,590 INFO  [       ome.formats.importer.ImportConfig] ( Thread-33) Bioformats version: 4.4.8 revision: 660f607 date: 1 May 2013
...SNIP...
Caused by: Ice.ConnectionLostException
    error = 0
   at IceInternal.Outgoing.invoke(Outgoing.java:147)
   at omero.api._ServiceFactoryDelM.getAdminService(_ServiceFactoryDelM.java:627)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:705)
   at omero.api.ServiceFactoryPrxHelper.getAdminService(ServiceFactoryPrxHelper.java:677)
   at ome.formats.OMEROMetadataStoreClient.initializeServices(OMEROMetadataStoreClient.java:413)
   at ome.formats.OMEROMetadataStoreClient.createRoot(OMEROMetadataStoreClient.java:1049)
   at ome.formats.importer.ImportLibrary.importImage(ImportLibrary.java:769)
   at org.openmicroscopy.shoola.env.data.OMEROGateway.importImage(OMEROGateway.java:6736)
   ... 12 more
Exception in thread "Thread-33"

The above looks like an error in the long-running connection code which we fixed in 4.4.9. Could you try upgrading to the latest release (4.4.9) and see if that gets you past this issue?

Thanks for your patience!
~Josh