Page 1 of 2

Moving ManagedRepository directory II

PostPosted: Fri Jun 26, 2015 9:35 pm
by ClayB
New machines, similar problem.

After copying the ManagedRepository to a new location, setting the new location with the "bin/omero config" command, and restarting the server, when I tried to import an image, I get this message:

Code: Select all
2015-06-26 12:32:54,384 12185      [      main] ERROR        ome.formats.importer.ImportLibrary - Error on import
java.lang.RuntimeException: Cannot exclusively use the managed repository.


Looking deeper, I found that the confirmation message of the data directory move was NOT in the Blitz-0.log. I tried the config command again and no message was printed on the screen. However, looking into the Blitz-0.log file I saw:

Code: Select all
2015-06-26 14:03:37,375 ERROR [      o.s.blitz.repo.AbstractRepositoryI] (2-thread-1) Failed during repository takeover


(Most recent 5000 lines of Blitz-0.log file have been attached.)

There were no image files imported into the ManagedRepository before the move was attempted. (On a sister server that is having the same problem now, a test image was loaded and copied over to the new location.) The directories from / to the OMERO directory I want to hold the ManagedRepository files are owned by (Linux)root, all with drwxr-xr-x permissions. The OMERO directory is owned by the (Linux) omero account:

Code: Select all
drwxr-xr-x 3 omero ccc 4096 Jun 26 11:47 OMERO


Any ideas on how to get this set up as I need or what I did wrong (this time)?

Re: Moving ManagedRepository directory II

PostPosted: Mon Jun 29, 2015 10:41 am
by jmoore
Hi Clay,

A couple of questions before a full response:
  • From where to where are you moving the data.dir?
  • What type of file systems are involved?
  • Can you describe the exact steps you took?

Cheers,
~Josh

Re: Moving ManagedRepository directory II

PostPosted: Mon Jun 29, 2015 3:29 pm
by ClayB
jmoore wrote:A couple of questions before a full response:

  • From where to where are you moving the data.dir?

    The move is from the original location specified during installation (/mnt/app_hdd/omero/omero_server) to /cluster_share/tools/imaging/OMERO.

  • What type of file systems are involved?

    The original site is NFS while the target file system is Lustre.

  • Can you describe the exact steps you took?
    a. Create OMERO directory in /cluster_share

    > mkdir /cluster_share/tools/imaging/OMERO

    b. Copy current ManagedRepository to shared area

    > cp -r omero_server/ManagedRepository /cluster_share/tools/imaging/OMERO

    c. Configure OMERO server to point at new MR location

    > OMERO.server/bin/omero config set omero.managed.dir /cluster_share/tools/imaging/OMERO/ManagedRepository

    d. Restart OMERO server (with new location of MR)

    > OMERO.server/bin/omero admin restart

Re: Moving ManagedRepository directory II

PostPosted: Mon Jun 29, 2015 3:51 pm
by ClayB
ClayB wrote:The original site is NFS while the target file system is Lustre.


My bad. Just checked with the sysadmain. The original site file-system is EXT4.

The move is necessary since the compute nodes in the cluster don't have access to /mnt/app_hdd, but do have access to everything in the /cluster_share system.

Re: Moving ManagedRepository directory II

PostPosted: Tue Jun 30, 2015 9:46 am
by jmoore
ClayB wrote:
jmoore wrote:A couple of questions before a full response:

  • From where to where are you moving the data.dir?

    The move is from the original location specified during installation (/mnt/app_hdd/omero/omero_server) to /cluster_share/tools/imaging/OMERO.


Makes sense. And there's to change to ${omero.data.dir} itself, correct?



  • What type of file systems are involved?

    The original site is EXT4 while the target file system is Lustre.


Thanks. I was worried that we were running into NFS issues.


  • Can you describe the exact steps you took?
    a. Create OMERO directory in /cluster_share

    > mkdir /cluster_share/tools/imaging/OMERO

    b. Copy current ManagedRepository to shared area

    > cp -r omero_server/ManagedRepository /cluster_share/tools/imaging/OMERO

    c. Configure OMERO server to point at new MR location

    > OMERO.server/bin/omero config set omero.managed.dir /cluster_share/tools/imaging/OMERO/ManagedRepository

    d. Restart OMERO server (with new location of MR)

    > OMERO.server/bin/omero admin restart



Thanks for the detailed steps, Clay! I've tried to reproduce with the following:

Code: Select all
# default.sh
NAME=ome9
OMERO=`pwd`/dist/bin/omero
rm -rf `pwd`/dist/var
$OMERO admin stop

set -e
set -u

$OMERO version

dropdb $NAME
createdb $NAME
$OMERO db script --password ome -f- | psql $NAME

rm -rf /tmp/$NAME
mkdir /tmp/$NAME
cd /tmp/$NAME

mkdir data
$OMERO config set omero.data.dir `pwd`/data
$OMERO admin start
$OMERO admin waitup
$OMERO -s root@localhost -w ome fs repos


and

Code: Select all
# copied.sh
set -e
set -u

NAME=ome9
OMERO=`pwd`/dist/bin/omero

$OMERO admin stop

cd /tmp/$NAME
COPIED=`pwd`/copied.dir/OMERO
mkdir -p $COPIED
$OMERO config set omero.managed.dir $COPIED/ManagedRepository
cp -r data/ManagedRepository $COPIED

$OMERO admin start
$OMERO admin waitup
$OMERO -s root@localhost -w ome fs repos


But on doing so, I see this in my logs:
Code: Select all
/opt/ome9$ grep "updated to" dist/var/log/Blitz-0.log
2015-06-30 11:30:11,841 WARN  [      o.s.blitz.repo.AbstractRepositoryI] (2-thread-3) Data directory moved: /tmp/ome9/data/ManagedRepository updated to /tmp/ome9/copied.dir/OMERO/ManagedRepository


There may be similar issues with Lustre. Could you attach your logs zipped? (I'm wondering if there are any other WARNs or ERRORs)

It might also be useful to have a jstack output from the Blitz process:
Code: Select all
jstack $(bin/omero admin ice server pid Blitz-0)


If this is related to the filesystem & locking, then likely you will need to move /cluster_share/tools/imaging/OMERO/ManagedRepository/.omero onto a non-Lustre file system unless there's someone who can fix locking directly in Lustre itself.

ClayB wrote:The move is necessary since the compute nodes in the cluster don't have access to /mnt/app_hdd, but do have access to everything in the /cluster_share system.


That also makes sense. If this isn't a file locking issue as it is with NFS, then perhaps you could either:

  • create the ManagedRepository directory yourself and set the property before your first startup?
  • use a symlink from the old location to the new? (omero_server/ManagedRepository -> /cluster/ ....) and not set the propery?

Thanks for helping us to track this down.
~Josh.

Re: Moving ManagedRepository directory II

PostPosted: Tue Jun 30, 2015 4:20 pm
by ClayB
jmoore wrote:Makes sense. And there's to change to ${omero.data.dir} itself, correct?


The etc/grid/config.xml file shows

Code: Select all
<property name="omero.managed.dir" value="/cluster_share/tools/imaging/OMERO/ManagedRepository" />


Attached are first parts of the Blitz-0.log file. (I had to 'split' the file and then used BZIP2 to compress each piece [having to split the 00 log once more] get in under the 256MB file limit. Remainder of log file and 'jstack' output in the following message.)

We can (probably) reinstall and reset the ManagedRepository before start up. We are trying to set up an automated installation, so we'd need to modify the script(s) for this if that is the solution we need to work. The symlink variation might be a better solution for that.

For the current installation, I was hoping to ultimately store the image files in-place. I guess this means that I don't really need to move the ManagedRepository, but it does mean that ALL files would need to be stored in the shared location and imported in-place. This is likely going to be harder to enforce than setting things up right from the start.

I've tried to do an import in-place, but the error is preventing this since it can't get exclusive access to the ManagedRepository directory. (I'm hoping that solving the MR placement will fix this issue, too.)

Re: Moving ManagedRepository directory II

PostPosted: Tue Jun 30, 2015 4:24 pm
by ClayB
rest of files

Re: Moving ManagedRepository directory II

PostPosted: Wed Jul 01, 2015 9:53 am
by jmoore
Hi Clay,

Ah, finally a smoking gun! Thanks for the logs:

Code: Select all
2015-06-29 13:58:37,381 INFO  [        ome.services.util.ServiceHandler] (2-thread-2)  Rslt:    java.io.IOException: Function not implemented
2015-06-29 13:58:37,383 ERROR [      o.s.blitz.repo.AbstractRepositoryI] (2-thread-2) Failed during repository takeover
java.io.IOException: Function not implemented
        at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) ~[na:1.7.0_79]
        at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:91) ~[na:1.7.0_79]
        at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022) ~[na:1.7.0_79]
        at java.nio.channels.FileChannel.lock(FileChannel.java:1052) ~[na:1.7.0_79]
        at ome.services.blitz.repo.FileMaker.getLine(FileMaker.java:95) ~[blitz.jar:na]
        at ome.services.blitz.repo.AbstractRepositoryI$GetOrCreateRepo.doWork(AbstractRepositoryI.java:310) ~[blitz.jar:na]


And indeed I find comments along the lines of "The issue is that parallel distributed file systems such as Lustre and NFS do not implement lock0, but ... seems to rely on it."

I'd try one of two things first:

Cheers,
~Josh.

Re: Moving ManagedRepository directory II

PostPosted: Wed Jul 01, 2015 2:59 pm
by ClayB
jmoore wrote:I'd try one of two things first:


Not sure I've gotten enough info from the second option URL, so let me try the first option first.

I can leave the ManagedRepository/.omero in it's original directory with a symlink in the parallel file system pointing back to that from the copied location. Once that's done, do I need to try the "config set" command again or just restart the server?

--clay

Re: Moving ManagedRepository directory II

PostPosted: Thu Jul 02, 2015 7:55 am
by jmoore
If your managed.dir configuration points at the /cluster_share and that directory contains a .omero symlink going to a local filesystem, a restart should suffice.

~Josh.