I need to move millions of rows between identical mysql db's on two different rds instances. The approach I thought about is this:
- use data-pipeline to export data from the first instance to amazon-s3
- use data-pipeline to import data from amazon-s3 to the second instance
My problem is that I need to delete the data on the first instance at the end. Since we're talking about huge amounts of data I thought about creating a stored procedure to delete the rows in batches. Is there a way to achieve that in aws? Or are there any other solutions?
One other thing is that i need only to move some rows from a specific table, not the whole table or the whole database.
You can use AWS DMS service which is the easiest method to move the huge amount of data. Please follow the below steps.
First, you need to change some settings in Parameter group on both RDS instances.
'log_bin' = 'ON'
'binlog_format' = 'ROW'
'binlog_checksum' = 'NONE'
'log_bin_use_v1_row_events' = 'ON'
Take a dump of the database's schema from the first RDS instance.
Restore it on the second RDS.
Now start to configure the DMS.
Setup the Endpoints first.
Then create a task to import data from Source(first RDS) to
Destination(second RDS).
In the migration type, if you want to load existing data choose to
Migrate existing data or if you trying to sync real time data then
select ongoing changes.
Under task setting, select Target table preparation mode = Do
nothing.
Check Enable logging check box it'll help to debug in case of any
errors.
Once the task is started you can able to see the process in the
dashboard.
Using TRUNCATE TABLE instead of delete statement if you want to delete all the data in one table. It will save you a lot of time.
Data-pipeline is more for a recurring process. Seems like a lot of extra hassle if you just want to do a one-time operation. Maybe easier to launch an instance with decent network throughput, attach a big enough EBS volume to hold your data and use command line tools like mysqldump to move the data.
As far as cleanup goes, probably faster to come up with a query that copies the rows you want to keep to a temp table (or everything but the rows you don't want) and then use rename to swap out the temp table for the original. Then drop the original table.