We're Hiring!

Very large postgres database

General user discussion about using the OMERO platform to its fullest. Please ask new questions at https://forum.image.sc/tags/omero
Please note:
Historical discussions about OMERO. Please look for and ask new questions at https://forum.image.sc/tags/omero

There are workflow guides for various OMERO functions on our help site - http://help.openmicroscopy.org

You should find answers to any basic questions about using the clients there.

Very large postgres database

Postby ingvar » Tue Oct 30, 2012 3:25 pm

Hello,

I recently noticed that my Omero postgres database is growing quite fast, approaching 10 GB. Three tables account for 3.3 GB: event, eventlog, and session. I can not see where the other 6.5 GB are, is there some internal backup copy. The rest of the tables are all 40kB or less, so only add up to 1 MB or so.
Our systems group likes to test that my service is alive every 3 seconds or so, which more or less adds up to the 6 million rows in the event table since mid-June that I have.
Is it safe to prune these tables, e.g, remove all rows that are more than a week old? Even better, is there a way to configure this so that these tables are pruned automatically?
I am still on my hybrid 4.3.3 version, 2nd attempt to migrate did not go to well.

Kind Regards,
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Re: Very large postgres database

Postby jmoore » Thu Nov 01, 2012 1:01 pm

ingvar wrote:Hello,


Hi Ingvar,

I recently noticed that my Omero postgres database is growing quite fast, approaching 10 GB. Three tables account for 3.3 GB: event, eventlog, and session. I can not see where the other 6.5 GB are, is there some internal backup copy.


No, there shouldn't be anything OMERO-specific that's taking up space. Can you send me the output of:

Code: Select all
SELECT
    table_name,
    pg_size_pretty(table_size) AS table_size,
    pg_size_pretty(indexes_size) AS indexes_size,
    pg_size_pretty(total_size) AS total_size
FROM (
    SELECT
        table_name,
        pg_table_size(table_name) AS table_size,
        pg_indexes_size(table_name) AS indexes_size,
        pg_total_relation_size(table_name) AS total_size
    FROM (
        SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
        FROM information_schema.tables
    ) AS all_tables WHERE table_name like '%public%'
    ORDER BY total_size DESC
) AS pretty_sizes;


See https://wiki.postgresql.org/wiki/Disk_Usage for more information. You may need vacuuming or similar.

The rest of the tables are all 40kB or less, so only add up to 1 MB or so.
Our systems group likes to test that my service is alive every 3 seconds or so,


This sounds extreme....

which more or less adds up to the 6 million rows in the event table since mid-June that I have.
Is it safe to prune these tables, e.g, remove all rows that are more than a week old?


Removing (all) EventLogs is always safe, assuming you don't want them for auditing purposes. You may be hard-pressed to remove the sessions and events unless you use a more elaborate query.

Even better, is there a way to configure this so that these tables are pruned automatically?
I am still on my hybrid 4.3.3 version, 2nd attempt to migrate did not go to well.


Not at the moment, no.

Kind Regards,
Ingvar


Cheers,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Very large postgres database

Postby ingvar » Thu Nov 01, 2012 2:13 pm

Hi Josh,

I failed to do your query, partially due to that I am on postgres 8.4, and you used some 9.x functions, but even after changing to functions available in 8.4 I got some error.

Is there an allowed extension format for raw text files, I get a not allowed message for .txt, .dat, and no extension when I tried to upload a file with the output from \dtiv+
Most of the space that was unaccounted for is in the indecies for event, eventLog, and session.

I tried vacuumdb earlier, that had no noticable effect, but is likely to be needed after I delete old rows from the event, eventLofg, and session tables.

Completely agree with the exessiveness of the checks, but all I got was a "company policy" reply when I queried the wisdom of this.

Anyway the highlights from the output is:
public | event | table | omero | | 1675 MB |
public | event_external_id_key | index | omero | event | 512 MB |
public | event_pkey | index | omero | event | 408 MB |
public | eventlog | table | omero | | 558 MB |
public | eventlog_action | index | omero | eventlog | 171 MB |
public | eventlog_entityid | index | omero | eventlog | 135 MB |
public | eventlog_entitytype | index | omero | eventlog | 291 MB |
public | eventlog_external_id_key | index | omero | eventlog | 171 MB |
public | eventlog_pkey | index | omero | eventlog | 135 MB |
public | session | table | omero | | 1171 MB |
public | session_external_id_key | index | omero | session | 209 MB |
public | session_pkey | index | omero | session | 216 MB |
public | session_uuid_key | index | omero | session | 490 MB |

Cheers,
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Re: Very large postgres database

Postby jmoore » Thu Nov 01, 2012 2:21 pm

ingvar wrote:I failed to do your query, partially due to that I am postgres 8.4, and you used some 9.x functions, but even after changing to functions avaiable in 8.4 I got some error.


Ah, ok. Sorry about that.

Most of the space that was unaccounted for is in the indecies for event, eventLog, and session.


Ok, so nothing's missing. Good to know.

I tried vacuumdb earlier, that had no noticable effect, but is likely to be needed after I delete the tables.


Agreed.

Completely agree with the exessiveness of the checks, but all I got was a "company policy" reply when I queried the wisdom of this.


What check are you having them do?

Cheers,
~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Very large postgres database

Postby ingvar » Thu Nov 01, 2012 3:04 pm

All they do is to access a page in Omero.web, but why that is needed every 3 seconds for each server + once more on the load balancer is beyond me. Real user accesses are orders of magnitude less frequent at the moment.

After removing the excess rows, I assume that I will need to reindex. Do I reindex before vacuuming or the other way around.

Cheers,
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Re: Very large postgres database

Postby jmoore » Thu Nov 01, 2012 3:44 pm

VACCUUM ANALYZE and then optionally REINDEX seems to be the general suggestion, though I'm not sure the order is important.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Very large postgres database

Postby ingvar » Thu Nov 01, 2012 4:01 pm

Deleting the EventLog went OK, but event and Session failed violating foreign key constraints:

EMDBslice=# delete from event * where time < '2012-07-01';
ERROR: update or delete on table "event" violates foreign key constraint "fknamespace_creation_id_event" on table "namespace"
DETAIL: Key (id)=(1) is still referenced from table "namespace".

EMDBslice=# delete from session * where defaulteventtype='User' and started < '2012-07-01';
ERROR: update or delete on table "session" violates foreign key constraint "fkevent_session_session" on table "event"
DETAIL: Key (id)=(105) is still referenced from table "event".

Any idea of a way out.
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Re: Very large postgres database

Postby jmoore » Thu Nov 01, 2012 4:13 pm

That definitely won't work. You'll have to identify the sessions/events that have no other objects linking to them and delete those. If you were using a particular user or similar, you could search for that.
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Re: Very large postgres database

Postby ingvar » Fri Nov 02, 2012 3:45 pm

Hi Josh,

Deleting from the event table is hard work. If I try to delete to much I get a timeout. I ended up using statements like:
DELETE FROM event * WHERE id > 2600000 and id < 2800000;
these take 10-15 min each, so looking at 18 h or so in total.
If any rows in the section has a foreign key restraints, I bracket the delete statement further.

One of my collegues prepared a statement to consider all foreign keys in into to event table, that statement is quite large, about 150 lines long. I will try that when I managed to remove the bulk of the rows.

I guess that in theory it might be possible to set up the database to do cascading deletes, but there might by bad side effects.

Cheers,
Ingvar
ingvar
 
Posts: 23
Joined: Wed Nov 09, 2011 11:32 am

Re: Very large postgres database

Postby jmoore » Sat Nov 03, 2012 8:22 pm

ingvar wrote:Deleting from the event table is hard work. If I try to delete to much I get a timeout. I ended up using statements like:
DELETE FROM event * WHERE id > 2600000 and id < 2800000;
these take 10-15 min each, so looking at 18 h or so in total.


Adding indexes on the event columns would likely speed this up. Most of the time is likely spent checking the FK constraints.

I guess that in theory it might be possible to set up the database to do cascading deletes, but there might by bad side effects.


Cascading deletes on Events would delete your entire DB. :)

~Josh
User avatar
jmoore
Site Admin
 
Posts: 1591
Joined: Fri May 22, 2009 1:29 pm
Location: Germany

Next

Return to User Discussion

Who is online

Users browsing this forum: No registered users and 1 guest