I have an extremely large table from which I would like to delete old rows. Example of table:
| customer_id | first_purchase_date | last_purchase_date |
|<primary key>| | <index> |
** I am using this example table for argument's sake. The table in question is not a customer table. The real table has grown to 28 GB in size over the past 2 months and is used to calculate something that requires only 2 weeks of historical data.
What I would like to do is delete customers from this table that have not purchased anything in the past year. I.e. delete from table where last_purchase_date < now() - interval 1 year;
Simpy deleting like this would be too costly on the database. I know that a partition can be used to truncate the old rows but I am not sure how to implement it effectively.
In addition, if a customer were to purchase something then that row could potentially move to a different partition by updating the last_purchase_date. Would this not be expensive as well?
Thank you in advance for any direction!
You are right in thinking that partitioning is the way forward, because:
Data that loses its usefulness can often be easily removed from a
partitioned table by dropping the partition (or partitions) containing
only that data. Conversely, the process of adding new data can in some
cases be greatly facilitated by adding one or more new partitions for
storing specifically that data.
And if this doesn't work for you, it's still possible to
In addition, MySQL 5.7 supports explicit partition selection for
queries. For example, SELECT * FROM t PARTITION (p0,p1) WHERE c < 5
selects only those rows in partitions p0 and p1 that match the WHERE
condition. In this case, MySQL does not check any other partitions of
table t; this can greatly speed up queries when you already know which
partition or partitions you wish to examine. Partition selection is
also supported for the data modification statements DELETE, INSERT,
REPLACE, UPDATE, and LOAD DATA, LOAD XML.
Since you want to delete stuff based on date and not the primary key, what you need is a RANGE partition scheme.
First find the oldest date and create partitions based on that
ALTER TABLE sales
PARTITION BY RANGE( TO_DAYS(last_purchase_date)) (
PARTITION p0 VALUES LESS THAN (TO_DAYS('2018-12-31')),
PARTITION p1 VALUES LESS THAN (TO_DAYS('2017-12-31')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2016-12-31')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2015-12-31')),
..
PARTITION p10 VALUES LESS THAN MAXVALUE));
Choose an appropriate number of partitions but don't worry too much because you can always change partitions later. When you partition, you might even find that the delete step isn't really needed after all.