We're Hiring!

Upgrade from 4.4 to 5.0

Having a problem deploying OMERO? Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

The OMERO.server installation documentation begins here and you can find OMERO.web deployment documentation here.

Re: Upgrade from 4.4 to 5.0

Postby ppouchin » Thu Apr 17, 2014 3:28 pm

Hi again,

I've been thinking about this upgrade, and I'm wondering how I should set things up in the future...

Until now, I have used the Dropbox feature in conjunction with Samba & LDAP to let users put their original files on the server. That way, no client was required to upload files, create a back up (useful for scientists willing to grab their data before they leave the lab) or access the original files (useful for incompatible software).

However, with the managed repository, I might try to do things differently...
Would it be possible to give read-only access (through Samba) to the user's folder in this repository ?

The problem I see is that omero users come from LDAP, but their IDs are appended to their names in the managed repository which makes it hard to map them to a restricted share.
And although usernames should always be unique in our case (since they come from LDAP and are linked to a DN for authentication), it seems impossible to alter "omero.fs.repo.path" to reflect this property.

But maybe I should drop the idea of giving direct access to the original files...



About the images migration, I also wondered...
Would the API function "isFSImage()" really allow me to determine if an image is managed by OMERO 5 or if it was imported before the migration (legacy images) ?
If that's the case, I could upgrade the server (and the workflow) soon, and try to make a set of scripts to copy the annotations to duplicate images and delete processed "legacy" images afterwards.

I could try to match images based on the size and the content (hash ?) instead of the name.
For each user, I would store a table with multiple columns: dimensions, hash, old ids (array ?), new ids (array ?), processed (boolean). Then, I would copy annotations from old ids to new ids, and if everything goes well, mark them as "processed". Finally, I would delete the old images that have been processed.

Or maybe I should try to make a more general script to "merge" annotations from duplicate images (based on hash), and then another script to just delete legacy images...

Would that approach work ?
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Upgrade from 4.4 to 5.0

Postby mtbc » Fri Apr 18, 2014 9:16 am

Thank you for your interesting and thoughtful questions. It does help us to know how users are, and would like to, deploy and use OMERO.

With regard to mapping users' managed repository directories, perhaps http://trac.openmicroscopy.org.uk/ome/ticket/12160 is relevant: I would be happy to add you to the cc: on that ticket should you wish.

The code for `isFSImage()` looks good to me: I think you can trust it. In terms of merging images based on hash, it would be necessary to be aware of groups too, perhaps avoiding merging images if they are in different groups, though you could move the set of annotations all into the same group in cases where the group permissions actually allow those annotations (e.g., if from other users).

Unfortunately many people have now disappeared for Easter: it could well be into next week before you get a good answer, but in the meantime I hope I helped a little.

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: Upgrade from 4.4 to 5.0

Postby mtbc » Fri Apr 18, 2014 9:22 am

Thinking further, there are images that get into the system in ways other than import: e.g., via the script that generates kymographs. They might also not be "FS images" exactly, perhaps even if generated under 5.0 they look like they are legacy pre-FS, I don't know. If such images are something you have to deal with, it would probably be worth including them in your migration testing to make sure they are handled appropriately.

Cheers,
Mark
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

Re: Upgrade from 4.4 to 5.0

Postby ppouchin » Fri Apr 18, 2014 10:51 am

Ok. Thank you for your advice.

I will try to be careful with group permissions... Few users belong to more than 1 group, but I should indeed process images user by user, group by group, size by size, hash by hash (and try to make sure a collision means a match through metadata, or pixel values...).

And although I don't think anyone here would have images that were not imported, I should try to find a mechanism to mark "non-FS images" as processed if their annotations have been copied to "FS images"...
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Upgrade from 4.4 to 5.0

Postby ppouchin » Tue Apr 22, 2014 8:36 am

Oh... And one last question (or two) about the managed repository: if I ever changed the repository path, nothing would change for the images that have already been imported, right ? So, another script would be needed to move already imported files ?

Also, if you need the OMERO user ID to make sure the path is unique, "username/userID" should work too, shouldn't it... ?
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Upgrade from 4.4 to 5.0

Postby jmoore » Wed Apr 23, 2014 12:42 pm

Just catching up with this thread...

ppouchin wrote:Would it be possible to give read-only access (through Samba) to the user's folder in this repository ?
...
The problem I see is that omero users come from LDAP, but their IDs are appended to their names in the managed repository which makes it hard to map them to a restricted share.

Definitely an interesting idea, and the goal is to have the ticket Mark mentioned (12160) fixed for 5.0.2 if possible. If things pan out, would you be up for sharing examples of your smb.conf for others to mimic?

But maybe I should drop the idea of giving direct access to the original files...

Nothing in the design prevents read-only access, but we've been careful in making suggestions on doing so across the board. Looking forward to your suggestions!


About the images migration, I also wondered...
Would the API function "isFSImage()" really allow me to determine if an image is managed by OMERO 5 or if it was imported before the migration (legacy images) ?

The key definer for a post-FS as opposed to a pre-FS image is whether or not it has a fileset. This is just what ticket 12176 is about: writing a script to generate a fileset for one or more images. The Pixels file for the pre-FS images would need to also be cleaned up / archived.

If that's the case, I could upgrade the server (and the workflow) soon, and try to make a set of scripts to copy the annotations to duplicate images and delete processed "legacy" images afterwards.

I could try to match images based on the size and the content (hash ?) instead of the name.
For each user, I would store a table with multiple columns: dimensions, hash, old ids (array ?), new ids (array ?), processed (boolean). Then, I would copy annotations from old ids to new ids, and if everything goes well, mark them as "processed". Finally, I would delete the old images that have been processed.

Or maybe I should try to make a more general script to "merge" annotations from duplicate images (based on hash), and then another script to just delete legacy images...

Would that approach work ?


Where are the "dupcliate images" coming from?



ppouchin wrote:Oh... And one last question (or two) about the managed repository: if I ever changed the repository path, nothing would change for the images that have already been imported, right ? So, another script would be needed to move already imported files ?

The path template restriction should only be for new imports. After that, the files should be findable in their original location, even if you change the template. If you mean moving the ManagedDirectory entirely, then no, that should at most need a single modification to the DB.

Also, if you need the OMERO user ID to make sure the path is unique, "username/userID" should work too, shouldn't it... ?

Yes. The initial implementation was meant to be something surefire but was never made flexible enough.

Cheers,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Upgrade from 4.4 to 5.0

Postby ppouchin » Wed Apr 23, 2014 5:04 pm

jmoore wrote:If things pan out, would you be up for sharing examples of your smb.conf for others to mimic?


Of course! I'm always willing to share whatever I get to work...
I think that if I can use "%user%" as the root of the repo path, I could use something like this to let users have a read-only access to the data:
Code: Select all
[omero-images]
        browseable = yes
        read only = yes
        path = /home/omero/OMERO.data/ManagedRepository/%u
        comment = Managed repository

This should be relatively safe and should work as long as the ManagedRepository is world-readable...

jmoore wrote:Nothing in the design prevents read-only access, but we've been careful in making suggestions on doing so across the board. Looking forward to your suggestions!

Indeed, access rights can lead to complicated problems... In our lab, I'm not afraid of biologists logging on the server and copying data from others (they can do so on the microscope anyway...) and so I will keep the managed repository as world readable, but I suppose that if admins want to restrict the access, they have to change that... Which denies access to any user who does not belong to the "omero" group (running the server)...

jmoore wrote:The key definer for a post-FS as opposed to a pre-FS image is whether or not it has a fileset. This is just what ticket 12176 is about: writing a script to generate a fileset for one or more images. The Pixels file for the pre-FS images would need to also be cleaned up / archived.

jmoore wrote:Where are the "dupcliate images" coming from?


Well, I think the simplest way to get our old images back into OMERO 5 (and stop data duplication) would be to re-import the images altogether, since they were stored outside OMERO, in the "Dropbox" folder anyway.
So at one point, a given image should be present twice in OMERO: the annotated image (that was imported before OMERO 5) and the newly imported image.

Earlier in the thread, I mentioned that I could use the images names to detect the location of the file to import, but then I wondered if it would be enough, as I'm not sure that users kept those names...
And since sometimes users do import the same image twice, I thought a script to batch-copy annotations to identical images (content-wise) could be a nice option.

But I reckon I don't know how I will import 2.5 TB of data. Probably user by user, but some of them do have several hundreds of GB of images. So reading the name of an old image, triggering a new import of the file and copying the annotations before deleting the image may still be the best way to go.

Either I find a way to re-import all the images (without making OMERO crash) and make a script to copy annotations from old images (and delete them), or I try to process each image one by one, but with the chance of "losing" some of them if they were renamed...


jmoore wrote:The path template restriction should only be for new imports. After that, the files should be findable in their original location, even if you change the template. If you mean moving the ManagedDirectory entirely, then no, that should at most need a single modification to the DB.


Ok, so if I change the template and want to apply it to old imports, can I copy/move the files to the new "repo.path" and use something like "setPathToFile" or "setParentFilePath", or would it be more complicated ? (to move a file from "%user%_%userId%" to "%user%/%userId%" for example)
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Upgrade from 4.4 to 5.0

Postby jmoore » Thu Apr 24, 2014 10:49 pm

ppouchin wrote:
jmoore wrote:Where are the "dupcliate images" coming from?


Well, I think the simplest way to get our old images back into OMERO 5 (and stop data duplication) would be to re-import the images altogether, since they were stored outside OMERO, in the "Dropbox" folder anyway.
So at one point, a given image should be present twice in OMERO: the annotated image (that was imported before OMERO 5) and the newly imported image.


Ah, I see. So there's certainly nothing from stopping this. I would guess the real determiner is how much metadata has been attached to the image post import. The strategy we were going to investigate would be to essentially ask the server for a a location (/OMERO/ManagedRepository/$user/..../$timestamp) and then upload the files there. Call it a "retro-active import".

Earlier in the thread, I mentioned that I could use the images names to detect the location of the file to import, but then I wondered if it would be enough, as I'm not sure that users kept those names...
And since sometimes users do import the same image twice, I thought a script to batch-copy annotations to identical images (content-wise) could be a nice option.

That is an interesting twist that I hadn't thought about. But of course, multiple retro-active imports would also work. The real question there would be how long it takes us to come up with a template for how to do those. (ticket 12176)

But I reckon I don't know how I will import 2.5 TB of data. Probably user by user, but some of them do have several hundreds of GB of images. So reading the name of an old image, triggering a new import of the file and copying the annotations before deleting the image may still be the best way to go.

In-place import is likely a solution to (at least a part of) the space issue. If it's feasible, it would help in either the duplicate or the retro version of this.


ppouchin wrote:
jmoore wrote:The path template restriction should only be for new imports. After that, the files should be findable in their original location, even if you change the template. If you mean moving the ManagedDirectory entirely, then no, that should at most need a single modification to the DB.


Ok, so if I change the template and want to apply it to old imports, can I copy/move the files to the new "repo.path" and use something like "setPathToFile" or "setParentFilePath", or would it be more complicated ? (to move a file from "%user%_%userId%" to "%user%/%userId%" for example)


Moving existing imports would be more complicated. I'll try to keep you posted on how much more complicated, ASAP. (ticket 12160)
~J.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Upgrade from 4.4 to 5.0

Postby ppouchin » Fri Jun 13, 2014 1:33 pm

Hi,

I really need to go forward with all this, but I'm dealing with other things for the moment... and the 4.4 data is not going to disappear anytime soon... :roll:

jmoore wrote:
ppouchin wrote:
jmoore wrote:The path template restriction should only be for new imports. After that, the files should be findable in their original location, even if you change the template. If you mean moving the ManagedDirectory entirely, then no, that should at most need a single modification to the DB.


Ok, so if I change the template and want to apply it to old imports, can I copy/move the files to the new "repo.path" and use something like "setPathToFile" or "setParentFilePath", or would it be more complicated ? (to move a file from "%user%_%userId%" to "%user%/%userId%" for example)


Moving existing imports would be more complicated. I'll try to keep you posted on how much more complicated, ASAP. (ticket 12160)
~J.



I installed OMERO 5.0.2 when you released it, and it's nice to be able to change the template path so easily.

I've set it to:
Code: Select all
%user%//%year%-%month%/%day%/%time%


Therefore, since my OMERO users are from my LDAP server (and are unique), I'm able to use Samba to easily give access to the ManagedRepository:
Code: Select all
[omero]
        browseable = no
        read only = yes
        path = /data/omero/OMERO.data/ManagedRepository/%u
        comment = Managed repository


I don't use the user's home folder for this, because it's the "Dropbox" (although I've set it to read-only until I can migrate the old images to OMERO 5).
In the future, I think I will re-use the Dropbox feature for in-place imports with "ln_rm".
That way, scientists will be able to push their images quickly (or automatically) using a "drag'n'drop" on the microscope, but will still have the possibility to access them "read-only" after import, without any risk regarding overwriting or deleting original files...


However, between 5.0.1 and 5.0.2, some images were imported, and I'd like to move them to the new template path (call it OCD). All I need to do is to tell OMERO that there is no more "userId" in their path.
Would that be ok if I changed the values in the database (apparently: "path" in the "pixels" table and "templateprefix" in the "fileset" table) ?
Or is there a less barbarian way to do that?
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Upgrade from 4.4 to 5.0

Postby mtbc » Fri Jun 13, 2014 7:18 pm

I'm not recommending it! But, definitely also the "path" column in the "originalfile" table.
User avatar
mtbc
Team Member
 
Posts: 282
Joined: Tue Oct 23, 2012 10:59 am
Location: Dundee, Scotland

PreviousNext

Return to Installation and Deployment

Who is online

Users browsing this forum: No registered users and 1 guest