Database : When to break large delete queries up?

lundi 2 février 2015

When to break large delete queries up?

I've got an auto generated join table (three columns, two of them keys to other tables), which was recently corrupted during an aborted migration. As a result there are around 1 million duplicate rows that need to be removed. I've been reading into how to best do this and I've run into conflicting advice. I've managed to get a list of the duplicates which need to be deleted using this query:


SELECT MIN(id)
FROM my_join_table
WHERE site_id=42    -- The migrations targeted a single site.
GROUP BY content_id
HAVING COUNT(*)>1;

However, when I tried to run a simple delete query with the above as an inner query, the command timed out after several hours.


DELETE FROM my_join_table
WHERE id IN ( <insert above query here> );

So my question is two part:

Would breaking that large (~1 Million ids) set up into smaller delete queries be more efficient?

If so, what's a good candidate for smaller set sizes? From what I can see online, no one suggests breaking up sets of fewer than 100 elements, but I'm hoping that I can get away with 1,000 or even 10,000.

Database

lundi 2 février 2015

When to break large delete queries up?

Aucun commentaire:

Enregistrer un commentaire