We're Hiring!

Reporting on Disk Usage within the Binary Repository

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

Reporting on Disk Usage within the Binary Repository

Postby davemason » Wed Apr 02, 2014 1:16 pm

We hope to soon deploy OMERO 5 for a multi user facility, and will likely be charging users based on storage space used (or at least keeping an eye on space used, for quota management).

I'm interested in how/if people report on space used by individual users and furthermore if it's possible to change [omero.fs.repo.path] to include a %GroupID% as it's first variable. Something like:

Code: Select all
omero.fs.repo.path=%GroupID%/%user%_%userId%/%year%-%month%/%day%/%time%


That way I could just poll the group's usage from the root of a directory tree in linux with:
Code: Select all
du --max-depth=1 /OMERO.data/ManagedRepository


Any thoughts appreciated,

Dave
davemason
 
Posts: 47
Joined: Thu Mar 06, 2014 3:00 pm
Location: Liverpool, UK

Re: Reporting on Disk Usage within the Binary Repository

Postby jmoore » Wed Apr 02, 2014 8:01 pm

Hi Dave,

we're also interested in hearing about anyone who's begun or plans to implement such reporting. In the meantime, the variable "groupId" can be used in "omero.fs.repo.path", though I don't know of anyone currently using it. Let us know if that gets you what you need, or if there are other variables that would be more useful.

Best wishes,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Reporting on Disk Usage within the Binary Repository

Postby davemason » Thu Apr 03, 2014 8:54 am

Cheers Josh,

I'll try implementing the altered path in a virtual machine and let you know. My only concern was the line in the documentation:
the first path component must be %user%_%userId%

From: [https://www.openmicroscopy.org/site/support/omero5/sysadmins/fs-upload-configuration.html?highlight=userid]

Although this will still resolve unique upload directories so I can't think it'd be a problem.

Dave
davemason
 
Posts: 47
Joined: Thu Mar 06, 2014 3:00 pm
Location: Liverpool, UK

Re: Reporting on Disk Usage within the Binary Repository

Postby jmoore » Thu Apr 03, 2014 9:02 am

Hi Dave,

my apologies. You're of course right. We added that restriction (visible in ManagedRepositoryI.java) to be as safe as possible. I think at the moment, you're only choice would be to put "groupId" UNDER the user directory. We'll look into the impact of removing this restriction.

Sorry for the confusion.
~Josh.

See https://trac.openmicroscopy.org.uk/ome/ticket/12160 for more information.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Reporting on Disk Usage within the Binary Repository

Postby ppouchin » Fri Apr 11, 2014 8:58 am

Hi,

I'm starting to check these options to know what I should do when I'll migrate to OMERO 5, and I was wondering something similar: would it be possible to only use "%user%" ?

I know it may not always be unique, but when using an LDAP for example, it should be (since even removing and re-adding a user with the same username would still lead OMERO to assume they're one and the same, based on the DN... forcing the administrator to find a way).
ppouchin
 
Posts: 98
Joined: Thu Dec 02, 2010 2:08 pm

Re: Reporting on Disk Usage within the Binary Repository

Postby jmoore » Fri Apr 11, 2014 9:43 am

At the moment, no. %user_id% is added for exactly this reason. A user may then end up with two directories:

  • "%old_user_name_%user_id"
  • "%new_user_name_%user_id"
but it will work fine.

We'll certainly include the "user/group rename" issue into the upcoming work.

Cheers,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Reporting on Disk Usage within the Binary Repository

Postby davemason » Fri Apr 11, 2014 11:15 am

By means of an update on the original issue (for anyone that is interested). I've been playing around and have come up with a workable solution for polling disk usage on a per-group level. My test machine is Ubuntu 12.04.4, so YMMV depending upon distro (specifically, the "sort" command needs to be able to sort by version number IE. 1,2,3...10,11 instead of the default behaviour of 1,10,11,2,3...).

The script below relies upon searching for the Group ID in the "omero.fs.repo.path". This should be set to the following (spaces added for clarity):
Code: Select all
%userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time%

Briefly:
- Find the maximum number of groups
- For each group ID (starting at #3 - the first user added group), search for the value at position 2 in the directory structure. Report the sum of all of the folders for that group.
- Update the headers of an existing log file to include all of the current group names (if no log exists, create a new one with headers).
- Append a date stamp and the usage info for each group to the bottom of the log

Exporting as a CSV, you can just drop this file into excel (or similar) and quickly report on monthly usage per group, even if new groups are added mid-month.

Happy to dump this into a git repository if anyone wants to branch the latest. Otherwise this should give you a rough idea:

Code: Select all
#!/bin/bash
# Script to report on daily disk usage from OMERO binary repository and
# write  the output to a log file (groups as columns, days as rows)
# - Dave Mason, University of Liverpool, CCI. April 2014
#
# Requires the binary data repository to be organised with the following root:
# %userName%_%userId% / %groupId% / %groupName% / %year%%month%%day% / %time%

# Set the location of the Binary Data Store:
REPO_DIR=/data/OMERO.data/ManagedRepository

# Where to store the log file:
#LOG_PATH=~/logOmeroUsage.csv
# Consider using a monthly log file:
LOG_PATH=~/$(date +%Y-%m)-logOmeroUsage.csv
# Set the working directory to come back to at the end
WORK_DIR="$(pwd)"
# Find the number of groups
cd $REPO_DIR
NUM_GROUPS="$(find . -maxdepth 2|awk -F"/" '{print $3}'|grep -ve repository|sort -u --version-sort|tail -n 1)"

# Get a comma separated list of directory sizes in order of gID
SIZE_VALUES="$(for ((n=3;n<=$NUM_GROUPS;n++)); do find -iname "$n" -exec du -c '{}' +|tail -n 1|awk -F" " '{print $1}' ; done|xargs|sed 's/ /,/g')"

# Update the titles to represent the most up to date group list (if no log exists - create one with titles)
LOG_HEADER=$(echo DateStamp,$(find . -mindepth 3 -maxdepth 3|grep -ve repository|awk -F"/" '{OFS="";print $3,"_",$4}'|sort -u --version-sort|xargs|sed 's/ /,/g'))
if [ -f $LOG_PATH ];
then
# Have to use a temporary file here as you can't cat the same file into itself
   echo $LOG_HEADER|cat - $LOG_PATH|awk 'NR!=2'>$LOG_PATH.temp
   mv $LOG_PATH.temp $LOG_PATH
else
   echo $LOG_HEADER>$LOG_PATH
fi

# Date stamp and append to the log
echo $(date +%Y%m%d-%H%M),$SIZE_VALUES>>$LOG_PATH
# Return to the starting directory
cd $WORK_DIR
davemason
 
Posts: 47
Joined: Thu Mar 06, 2014 3:00 pm
Location: Liverpool, UK

Re: Reporting on Disk Usage within the Binary Repository

Postby jmoore » Mon Apr 14, 2014 10:10 am

Hi Dave,

thanks for the script! Seems like it could be of use to others. We'll keep you posted on getting the group_id earlier in the path.

Cheers,
~Josh.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Reporting on Disk Usage within the Binary Repository

Postby wmoore » Mon Apr 14, 2014 10:26 am

Hi Dave,

FYI - there's a PR open for webadmin to display disk usage by group in the pie chart - see https://github.com/openmicroscopy/openm ... /pull/2270

Regards,

Will.
User avatar
wmoore
Team Member
 
Posts: 674
Joined: Mon May 18, 2009 12:46 pm

Re: Reporting on Disk Usage within the Binary Repository

Postby davemason » Mon Apr 14, 2014 4:14 pm

Will;

Thanks for the link. I had a look and I think it would be a nice feature. For extra brownie points, it would be useful if the data could be downloaded as a file, although without historical data I wonder how useful it would be if you have to login and manually report every period.

Best,
Dave
davemason
 
Posts: 47
Joined: Thu Mar 06, 2014 3:00 pm
Location: Liverpool, UK


Return to User Discussion

Who is online

Users browsing this forum: Google [Bot] and 1 guest

cron