We're Hiring!

Advice to Debug Upload Failure

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

Advice to Debug Upload Failure

Postby austinMLB » Tue Dec 11, 2018 6:06 pm

Hi, All,
I've got a folder with 144 OME Tiff files. Each of about 158MB. If I try to upload all of these through the Inisght uploader, or through a script I have written, the upload (usually?) eventually freezes and fails to make any further progress. My database is on an NFS mounted drive. My data was either local to the server or on that same mounted drive, with similar results.
The server appears to be running out of some resource, though (assuming that is the case) I haven't determined which resource. I have raised the ulimit -n into the hundreds of thousands. Do you have any advice for which steps I should take to further investigate the root cause of this problems.
Thanks,
Michael

My configuration:

omero.data.dir=/media/gsn/omero_data/
omero.db.name=<redacted>
omero.db.pass=<redacted>
omero.db.user=<redacted>
omero.fs.repo.path=%group%/%user%_%userId%//%year%-%month%/%day%/%time%
omero.jvmcfg.heap_dump=tmp
omero.jvmcfg.heap_size=1g
omero.launcher.jython=/home/omero/jython/bin/jython
omero.scripts.timeout=172800000
omero.web.application_server=wsgi-tcp
omero.web.apps=["omero_fpbioimage", "omero_iviewer"]
omero.web.open_with=[["Image viewer", "webgateway", {"supported_objects": ["image"], "script_url": "webclient/javascript/ome.openwith_viewer.js"}], ["omero_fpbioimage", "fpbioimage_index", {"supported_objects": ["image"], "script_url": "fpbioimage/openwith.js", "label": "FPBioimage"}], ["omero_iviewer", "omero_iviewer_index", {"supported_objects": ["images", "dataset", "well"], "script_url": "omero_iviewer/openwith.js", "label": "OMERO.iviewer"}]]
omero.web.viewer.view=omero_iviewer.views.index

My Diagnostics:

================================================================================
OMERO Diagnostics (admin) 5.4.9-ice36-b101
================================================================================

Commands: java -version 1.8.0 (/usr/bin/java)
Commands: python -V 2.7.12 (/home/omero/omerowebvenv/bin/python -- 2 others)
Commands: icegridnode --version 3.6.4 (/usr/bin/icegridnode)
Commands: icegridadmin --version 3.6.4 (/usr/bin/icegridadmin)
Commands: psql --version 9.6.10 (/usr/bin/psql)

Server: icegridnode running
Server: Blitz-0 active (pid = 7019, enabled)
Server: DropBox active (pid = 7054, enabled)
Server: FileServer active (pid = 7063, enabled)
Server: Indexer-0 active (pid = 7066, enabled)
Server: MonitorServer active (pid = 7068, enabled)
Server: OMERO.Glacier2 active (pid = 7140, enabled)
Server: OMERO.IceStorm active (pid = 7142, enabled)
Server: PixelData-0 active (pid = 7167, enabled)
Server: Processor-0 active (pid = 7184, enabled)
Server: Tables-0 active (pid = 7222, enabled)
Server: TestDropBox inactive (enabled)

Log dir: /home/omero/OMERO.server-5.4.9-ice36-b101/var/log exists
Log files: Blitz-0.log 21.0 MB errors=0 warnings=6
Log files: DropBox.log 3.0 KB errors=0 warnings=1
Log files: FileServer.log 0.0 KB
Log files: Indexer-0.log 17.0 KB
Log files: MonitorServer.log 2.0 KB
Log files: OMEROweb.lock 0.0 KB
Log files: OMEROweb.log 297.0 KB errors=0 warnings=3
Log files: PixelData-0.log 6.0 KB
Log files: Processor-0.log 332.0 KB
Log files: Tables-0.log 2.0 KB
Log files: TestDropBox.log n/a
Log files: master.err 3.0 KB errors=1 warnings=2
Log files: master.out 0.0 KB
Log files: Total size 22.43 MB


Environment:OMERO_HOME=(unset)
Environment:OMERO_NODE=(unset)
Environment:OMERO_MASTER=(unset)
Environment:OMERO_USERDIR=(unset)
Environment:OMERO_TMPDIR=(unset)
Environment:PATH=/home/omero/omerowebvenv/bin:/home/omero/bin:/home/omero/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin
Environment:PYTHONPATH=(unset)
Environment:ICE_HOME=(unset)
Environment:LD_LIBRARY_PATH=(unset)
Environment:DYLD_LIBRARY_PATH=(unset)

OMERO SSL port:4064
OMERO TCP port:4063
OMERO data dir:'/media/gsn/omero_data/' Exists? True Is writable? True
OMERO temp dir:'/home/omero/omero/tmp' Exists? True Is writable? True (Size: 1700749)

JVM settings: Blitz-${index} -Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions -XX:HeapDumpPath=/tmp
JVM settings: Indexer-${index} -Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions -XX:HeapDumpPath=/tmp
JVM settings: PixelData-${index} -Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions -XX:HeapDumpPath=/tmp
JVM settings: Repository-${index} -Xmx1g -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=1g -XX:+IgnoreUnrecognizedVMOptions -XX:HeapDumpPath=/tmp
~
austinMLB
 
Posts: 19
Joined: Wed Jul 25, 2018 3:26 pm

Re: Advice to Debug Upload Failure

Postby mtbc » Wed Dec 12, 2018 5:38 am

I don't remember hearing of such an issue before. Your fileset is large but not unusually so. Is there any clue in how long it takes before the freeze? I wonder if some timeout could be relevant, whether in the OMERO stack or some intervening router or firewall: a consistent "round number" would be suspicious. If you would like to zip up your server's var/log/ directory and submit it to https://www.openmicroscopy.org/qa2/qa/upload/ then we would be glad to comb it for clues: your resource exhaustion theory is plausible.

Is your script using the CLI import or are you using the OMERO managed repository API directly? Perhaps there might also be opportunity for adding something usefully diagnostic among your code. Are you setting any non-default options for the import? I don't think we've seen any odd behavior with various transfer methods, etc., but there's always a first time. The CLI "parallel" options are still relatively new but Insight does not yet offer similar so I guess they aren't the culprit.

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: Advice to Debug Upload Failure

Postby austinMLB » Thu Dec 13, 2018 1:23 am

Thanks for the feedback, Mark. I attempted to upload my zip. If it wasn't successful, I'll look into that more. The problem is not consistently failing on a single file or anything of that nature, so I don't think it is a problematic image. I feel it likely my configuration of the machine or one of the services, but I haven't found many clues. In the logs, I actually started two uploads. The first failed very without actually uploading any files, then the second failed after uploading several files. At that point, I was unable to ssh into the machine. Any thoughts would be great appreciated. Thanks, Michael
austinMLB
 
Posts: 19
Joined: Wed Jul 25, 2018 3:26 pm

Re: Advice to Debug Upload Failure

Postby mtbc » Thu Dec 13, 2018 2:36 pm

Dear Michael,

We got the upload fine but I am thinking that it may not be the log we need: Blitz-0.log starts from 1pm yesterday which I guess may postdate your latest stalled import?

I am starting to think that we need some documentation on how to run jstack on the server and client side of imports to help judge what is going on when they stall. Do you happen to already have some familiarity with it?

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: Advice to Debug Upload Failure

Postby austinMLB » Thu Dec 13, 2018 3:01 pm

Thanks, Mark.

What I had intended to do was
1. Move my old log directory out of the way
2. Start Omero (Server and Web)
3. Trigger my script
4. Once the machine became unresponsive, restart it
5. Zip the new log directory.
Assuming I successfully did those steps as intended, does that sound like it would have given you the right log files?

As for jstack, while I have Java experience, I haven't used jstack a lot. I'm more familiar with jvisualvm. But, I could give those types of tools a try. Thanks for that recommendation. Unfortunately, it may be Monday before I make any progress on this.

I appreciate your help,
Michael
austinMLB
 
Posts: 19
Joined: Wed Jul 25, 2018 3:26 pm

Re: Advice to Debug Upload Failure

Postby mtbc » Thu Dec 13, 2018 3:04 pm

Dear Michael,

Aha, that makes sense: so the log thus ends when the server becomes unresponsive? We'll see what we can figure out to ask next!

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: Advice to Debug Upload Failure

Postby jmoore » Fri Dec 14, 2018 11:34 am

Hi Michael,

your steps 1-5 sound like a good plan. In the meantime, I do notice that your Blitz server only has 1GB of heap, which seems low. Have you experimented with other values? How much does the OS have?

As for jstack, it's a simple tool that will show you a single snapshot of what the server is doing. If you get the process id for the Blitz server (e.g. via `bin/omero admin ice server pid Blitz-0`) then `jstack $PID` will should you the stack traces. If that doesn't work, you can send `kill -QUIT $PID` which will print the same output in var/log/master.out.

Regards,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Advice to Debug Upload Failure

Postby austinMLB » Wed Jan 02, 2019 3:58 pm

Josh and Mark,
Thanks for the advice on this. Because I wasn't able to determine the cause, I have since moved our Omero installation to different hardware. We will need to do more extensive testing, but we have not yet seen these problems on the new machine. It is possible the problem was hardware-related in some way. For the new hardware, I do have some questions about tuning the configuration, but I will post that as a different question.
Thanks again for your time,
Michael
austinMLB
 
Posts: 19
Joined: Wed Jul 25, 2018 3:26 pm


Return to User Discussion

Who is online

Users browsing this forum: Google [Bot] and 0 guests

cron