We're Hiring!

Very large postgres database

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

Re: Very large postgres database

Postby ingvar » Tue Nov 06, 2012 12:11 pm

Hi Josh,

Made some progress on this. One on my instances are now back to a sensible size. Started work on the second.

jmoore wrote:
ingvar wrote:Deleting from the event table is hard work. If I try to delete to much I get a timeout. I ended up using statements like:
DELETE FROM event * WHERE id > 2600000 and id < 2800000;
these take 10-15 min each, so looking at 18 h or so in total.


Adding indexes on the event columns would likely speed this up. Most of the time is likely spent checking the FK constraints.

Considering that event table has something like 150 FK's that sounds likely.
I guess that in theory it might be possible to set up the database to do cascading deletes, but there might by bad side effects.


Cascading deletes on Events would delete your entire DB. :)

That is what I call a bad side effect :)


Finally convinced systems that checks can be done less frequently than every 3 s, so should not have to go through this again.

Just in case someone runs into a similar situation:
a. Delete all rows from eventlog - harmless unless used for billing as Josh mentioned earlier.
b. Prune the event table. Any row that does not have an FK constraint is OK to delete AFAIK. I had problems with timeouts when attempting to delete to much in one go, so I did this in two stages. First by hand to get it down to 200000 rows or so, by using commands like:
delete from event * where id > 18000000 and id < 19000000;
The delete statements fail if there is a row with an FK constraint, and the delete command rerun with a smaller section.

I then by used this query:
SELECT
tc.constraint_name, tc.table_name, kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM
information_schema.table_constraints AS tc
JOIN information_schema.key_column_usage AS kcu ON
tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage AS ccu ON
ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY'
AND ccu.table_name='event'
AND ccu.column_name='id';

which will list all FK constraints for the table event. From the output you can setup a statement (I show the end result for the session table as the one for event is very long.)
DELETE FROM session WHERE NOT EXISTS (
SELECT NULL FROM count_session_annotationlinks_by_owner WHERE count_session_annotationlinks_by_owner.session_id = session.id UNION ALL
SELECT NULL FROM event WHERE event.session = session.id UNION ALL
SELECT NULL FROM sessionannotationlink WHERE sessionannotationlink.parent = session.id UNION ALL
SELECT NULL FROM share WHERE share.session_id = session.id
);

c. do the same for the session table. As the session table have fewer FK constraints, this statement can probably be run without initial pruning.

d. reindex and vacuum each of the three tables, e. g.,
REINDEX TABLE event;
VACUUM FULL event;

The Postgres manual advice against 'VACUUM FULL' in most instances. The situation here is one of the exceptions, I need to reclaim disk space, and I do not expect the db to grow to the same size again.

Cheers,
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Previous

Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest