Page 1 of 1

'Failed to shutdown some components'

PostPosted: Wed Jan 11, 2017 2:52 pm
by achessel
Hi all,

Following a brief 'disk full' event on the disk the omero server is running out of, the server is in a weird state. But it does not seem to want to be shutdown, saying it can't reach node 'master'.

Is there a way to force it down cleanishly, or can I just kill the process? Anything I should do before I start it up again if I just kiil it, or can I expect some things to be broken? (I don't think anything was going on when the original error happened)

Thanks
A.

Re: 'Failed to shutdown some components'

PostPosted: Thu Jan 12, 2017 9:47 am
by mtbc
"can't reach node 'master'" is a new one for us I'm afraid. I'd recommend simply killing the icebox and icegridnode processes with SIGTERM (give them some seconds before resorting to SIGKILL) then once you are sure the processes have gone then removing the .lock files from inside your binary repository, e.g., probably those reported by, find /OMERO -name .lock

I think you should then expect to be able to start up the server processes just fine. Still, weirdness can happen when one runs out of space, so the cautious may wish to back up before restarting the server, and if you do see any oddness, or require any elaboration of the above, then please don't hesitate to ask.

Cheers, and good luck,
Mark

Re: 'Failed to shutdown some components'

PostPosted: Thu Jan 12, 2017 10:19 am
by kennethgillen
achessel wrote:Following a brief 'disk full' event on the disk the omero server is running out of, the server is in a weird state. But it does not seem to want to be shutdown, saying it can't reach node 'master'.


I'd also recommend adding some monitoring to your server, something like Check_MK, Munin, or some other system of which there are plenty to choose from. [1]

OME have experience of monitoring OMERO servers with Check_MK amd Munin, and others in the community may well have experience with other tools.

[1] https://github.com/kahun/awesome-sysadmin#monitoring

All the best,

Kenny

Re: 'Failed to shutdown some components'

PostPosted: Fri Jan 13, 2017 3:47 pm
by achessel
Thanks for the info.
I sent SIGTERM to icegridnode, which killed everything omero related except icegridnode (?), so I then had to SIGKILL it.
But server is back up and seem fine. It would definitely help to have more monitoring, I'll look it up. But we currently are functioning without a dedicated IT person in the lab which is not helping...