1

I need to delete a huge chunk of my data in my production database, which runs about 100GB in size. If possible, i would like to minimize my downtime.

My selection criteria for deleting is likely to be

DELETE * FROM POSTING WHERE USER.ID=5 AND UPDATED_AT<100

What is the best way to delete it?

  • Build an index?
  • Write a sequential script that deletes via paginating through the rows 1000 at a time?
Warner
  • 23,440
  • 2
  • 57
  • 69
  • Did @Warner's answer work for you? My current employer is trying to prune 250GB * 20 of data held in InnoDB tables and its eating replication alive. – David Aug 28 '10 at 19:22

1 Answers1

2

You can minimize almost all downtime by preventing the table from being locked.

Use a SELECT INSERT to migrate the data you want to a temporary table.

Rename the tables.

You'll need to consider any INSERTS during the time the SELECT INSERT was running.

Finally, drop the table that contains the data you don't want.

Be aware that if you do not have innodb_file_per_table enabled in your configuration that the disk space allocated will not be freed.

Additionally, special considerations will be necessary for your application and specific situation.

Here's a procedure that I wrote using this method..

# Temp table, recreating forty_first_transaction_
#
CREATE TABLE working_table_temp_ LIKE working_table_;

# Increment if there's an auto_increment field.
#
ALTER TABLE working_table_temp_ AUTO_INCREMENT = 15000000;

# Testing inserts and select while this query was running resulted success.
# Verified no active lock.
#
INSERT INTO working_table_temp_ SELECT * FROM working_table_ WHERE id > $NUM;

RENAME TABLE working_table_ TO working_table_old_;
RENAME TABLE working_table_temp_ TO working_table_;

# Verify that all rows were caught
#
# The last row < 15000000 in both tables should be identical.
#
SELECT * FROM working_table_ where id < 15000000 order by id desc limit 5;
SELECT * FROM working_table_old_ where id < 15000000 order by id desc limit 5;

# If not, we need to move them !
#
# This query will need to be modified.
#
INSERT INTO working_table_ SELECT * FROM working_table_old_ WHERE id > 138376577;

# Verify application functionality, if necessary.
#
# LAST CHANCE FOR BACKOUT !!!
#
# Once verified, get rid of the old data to free up some space.
#
DROP TABLE working_table_old_;
^D

df -h




## BACKOUT ##

RENAME TABLE working_table_ TO working_table_new_;
RENAME TABLE working_table_old_ TO working_table_;
Warner
  • 23,440
  • 2
  • 57
  • 69