We're Hiring!

OMERO saves and lists files using the import time stamp

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

OMERO saves and lists files using the import time stamp

Postby stefanm » Wed Jun 22, 2016 1:54 pm

Hello,

we are testing OMERO 5.2.4 at the moment. What I noticed is, that image files imported using OMERO.insight are saved in the repository with a new change/modification date identical to the time, when the files were imported (not acquired!). Also the path within the managed repository reflects that point in time. Of course the acquisition time-stamp is retained in the metadata - at least for the data formats that actually hold an acquisition time-stamp. However, in a lab book all that matters is the acquisition time.

An example: I imported a file acquired on 2012-03-08 19:38:28. The change/modification date of the imported file in the repository was set to 2016-06-15 15:56:40.636 and consequently the path within the repository was

"ManagedRepository/stefanm/2016-06/15/15-56-40.636"

Moreover downloading the file later generates another change/modification date reflecting the export time.

When I discussed the behaviour with scientists over here, they simply said it's a show stopper. The main source for searching for data is their lab book and there the acquisition date is critical. And they expect that the file modification date reflects that time point. This is even more so when I think about them leaving the lab taking their data with them on a disk (that we would need to export first out of OMERO).

I suspect that OMERO.fs using the native file system directly could be one solution. But I also think that in the managed repository the file modification date should not change as long as the data/image file was unchanged.

I know that importing files via OMERO.insight is strictly speaking not the same as a copying an image in a file system (where normally the modification time is preserved). On the other hand if at the moment we would like to preserve the modification time we would need to keep a copy of the original data, which destroys the big advantage that OMERO5 now handles the native data format directly.

Moreover, we were considering of importing a lot of old data (several thousand files, some more than 4 years old) into OMERO. All the images would be time-stamped with their import date, which at least for us - clearly makes little or no sense.

Am I missing something or could that be solved in future version of OMERO?

Best regards
Stefan
stefanm
 
Posts: 13
Joined: Fri Feb 28, 2014 3:20 pm
Location: Germany

Re: OMERO saves and lists files using the import time stamp

Postby jmoore » Thu Jun 23, 2016 9:29 am

stefanm wrote:Hello,


Hi Stefan,

we are testing OMERO 5.2.4 at the moment. What I noticed is, that image files imported using OMERO.insight are saved in the repository with a new change/modification date identical to the time, when the files were imported (not acquired!). Also the path within the managed repository reflects that point in time. Of course the acquisition time-stamp is retained in the metadata - at least for the data formats that actually hold an acquisition time-stamp. However, in a lab book all that matters is the acquisition time.

An example: I imported a file acquired on 2012-03-08 19:38:28. The change/modification date of the imported file in the repository was set to 2016-06-15 15:56:40.636 and consequently the path within the repository was

"ManagedRepository/stefanm/2016-06/15/15-56-40.636"



This has both a historical and a social component. Earlier versions of Java didn't provide a method for setting the modification time, and so to some extent, we didn't pursue options or feedback around timestamps. It's now possible, and so that's certainly something will need to do. See https://trello.com/c/WN1Ihwhf/107-preserve-file-acquisition-times -- thanks for getting this started.

On the other hand, there's the question of trust and intent. Even a tool like rsync doesn't automatically copy the modification time, an extra option is required for that. This reflects that the operation is fundamentally a copy, as is `bin/omero import`. My assumption, though, is that we could provide a similar functionality to `rsync -a`. (I can imagine that there may need be some limitations put in place; for example, system administrators not necessarily wanting users to be able to have complete freedom with regard to file provenance.)


Moreover downloading the file later generates another change/modification date reflecting the export time.


When downloading from the CLI and/or Java, we could similarly attempt to set the value if the appropriate flag has been set, but I don't know if this will be possible from the web (See https://bugzilla.mozilla.org/show_bug.cgi?id=178506)


When I discussed the behaviour with scientists over here, they simply said it's a show stopper. The main source for searching for data is their lab book and there the acquisition date is critical. And they expect that the file modification date reflects that time point. This is even more so when I think about them leaving the lab taking their data with them on a disk (that we would need to export first out of OMERO).


That being the case, I wonder if we'd not be better advised to record the modification time in the database itself, so it's queryable, etc. Even if OMERO properly implements a save-modification-time flag, there's always the possibility that when migrating between servers, the values will be lost.

I suspect that OMERO.fs using the native file system directly could be one solution. But I also think that in the managed repository the file modification date should not change as long as the data/image file was unchanged.


At the moment, fs does provide a (partial) workaround, in-place import. It doesn't cover all the regular import workflows, but if you could use the `ln` (hardlink) or `ln_s`(symlink) options, then the original modification times would be preserved.

I know that importing files via OMERO.insight is strictly speaking not the same as a copying an image in a file system (where normally the modification time is preserved).


What OS / filesystem do you have in mind? I typically understand a copy (i.e. `cp`) to produce a new timestamp.

On the other hand if at the moment we would like to preserve the modification time we would need to keep a copy of the original data, which destroys the big advantage that OMERO5 now handles the native data format directly.


Agreed. I definitely see that "bin/omero import X && rm -rf X" currently poses a loss of information for you, and it's very much worth finding a way for that not to happen.

Moreover, we were considering of importing a lot of old data (several thousand files, some more than 4 years old) into OMERO. All the images would be time-stamped with their import date, which at least for us - clearly makes little or no sense.


For this kind of bulk operation, I'd definitely suggest looking into in-place import which may solve multiple problems (storage, timestamps, etc). And then for newer acquisitions, the hopefully short period of time between acquisition and import to OMERO would be less of an issue.

Am I missing something or could that be solved in future version of OMERO?

Best regards
Stefan


Let's hope so!
Cheers,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: OMERO saves and lists files using the import time stamp

Postby stefanm » Tue Jul 05, 2016 1:17 pm

jmoore wrote:
Hi Stefan,


Hi John,

sorry it took some time to get back to your very timely response.


This has both a historical and a social component. Earlier versions of Java didn't provide a method for setting the modification time, and so to some extent, we didn't pursue options or feedback around timestamps. It's now possible, and so that's certainly something will need to do. See https://trello.com/c/WN1Ihwhf/107-preserve-file-acquisition-times -- thanks for getting this started.

On the other hand, there's the question of trust and intent. Even a tool like rsync doesn't automatically copy the modification time, an extra option is required for that. This reflects that the operation is fundamentally a copy, as is `bin/omero import`. My assumption, though, is that we could provide a similar functionality to `rsync -a`. (I can imagine that there may need be some limitations put in place; for example, system administrators not necessarily wanting users to be able to have complete freedom with regard to file provenance.)

A flag would be fine, something that admins could control ... probably on a per user base, user configurable maybe.....



When downloading from the CLI and/or Java, we could similarly attempt to set the value if the appropriate flag has been set, but I don't know if this will be possible from the web (See https://bugzilla.mozilla.org/show_bug.cgi?id=178506)


Not directly, but offering zipped-archives as an option for (mass) download could provide the correct modification date.

That being the case, I wonder if we'd not be better advised to record the modification time in the database itself, so it's queryable, etc. Even if OMERO properly implements a save-modification-time flag, there's always the possibility that when migrating between servers, the values will be lost.


That would be another option and probably a route that would be desirable, perhaps in addition to preserving the modification time as long as possible.


At the moment, fs does provide a (partial) workaround, in-place import. It doesn't cover all the regular import workflows, but if you could use the `ln` (hardlink) or `ln_s`(symlink) options, then the original modification times would be preserved.


I assumed that it would, but that could lead to all sorts of inconsistencies. Identical data popping up under different dates.

stefanm wrote:I know that importing files via OMERO.insight is strictly speaking not the same as a copying an image in a file system (where normally the modification time is preserved).


What OS / filesystem do you have in mind? I typically understand a copy (i.e. `cp`) to produce a new timestamp.


Correct for linux/unix (and I am so accustomed to using "-p" with cp, that I almost forgot that feature), but Windows and MacOS do transfer the modification date over to the copied file (their creation date could be updated though). I tend to think that the modification date should only change when the file content changes, not when that file is duplicated at a new location. In effect even "mv" under linux does a copy (preserving modification date) followed by an unlink/rm when files are moved across file system boundaries.


Agreed. I definitely see that "bin/omero import X && rm -rf X" currently poses a loss of information for you, and it's very much worth finding a way for that not to happen.


That's good to know! If I could be of any help, let me know.

For this kind of bulk operation, I'd definitely suggest looking into in-place import which may solve multiple problems (storage, timestamps, etc). And then for newer acquisitions, the hopefully short period of time between acquisition and import to OMERO would be less of an issue.


Thanks for taking the time to respond and again for the hint above.

Best regards,
Stefan
stefanm
 
Posts: 13
Joined: Fri Feb 28, 2014 3:20 pm
Location: Germany


Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest