I've got an auto generated join table (three columns, two of them keys to other tables), which was recently corrupted during an aborted migration. As a result there are around 1 million duplicate rows that need to be removed. I've been reading into how to best do this and I've run into conflicting advice. I've managed to get a list of the duplicates which need to be deleted using this query:
SELECT MIN(id)
FROM my_join_table
WHERE site_id=42 -- The migrations targeted a single site.
GROUP BY content_id
HAVING COUNT(*)>1;
However, when I tried to run a simple delete query with the above as an inner query, the command timed out after several hours.
DELETE FROM my_join_table
WHERE id IN ( <insert above query here> );
So my question is two part:
- Would breaking that large (~1 Million ids) set up into smaller delete queries be more efficient?
- If so, what's a good candidate for smaller set sizes? From what I can see online, no one suggests breaking up sets of fewer than 100 elements, but I'm hoping that I can get away with 1,000 or even 10,000.
Aucun commentaire:
Enregistrer un commentaire