-->

Data loading is slow while using “Insert/Update” s

2020-06-07 05:14发布

问题:

Data loading is slow while using "Insert/Update" step in pentaho 4.4.0

I am using pentaho 4.4.0. While using the "Insert/Update" step in kettle the speed of the data load is too slow compared to mysql. This step will scan through the entire records in table before inserting. If the record exist it will do a update. So what shall be done to optimize the performance while doing "Insert/Update" . and the process speed is 4 r/s, so totally my records will be above 1 lakh... The process takes 2 and half hours to complete the entire process.

回答1:

Based on your comments it sounds like you want the Merge rows (diff) step followed by a Synchronize after merge. Check the Pentaho wiki to see how these steps work.

Another thing that makes a big difference is how many of the rows result in an upsert vs how many total rows. If the number of rows resulting in writes is more than roughly 40%, @carexcer's last comment may be a better approach. If it's less, definitely try the Merge rows (diff) step.

4 - 25 rows per second sounds way slow. Be sure the fields you marked as keys are indexed, whichever step you choose.

If most of the rows result in an upsert, you may be better off with a full refresh. If that's the case, check out the MySQL bulk loaders. Pentaho has both a batch and streaming bulk loader, though I don't know how good they are.



回答2:

Try to set a big value on the field Transaction Size (Commit).

Depending on the number of rows you will upsert, set more or less on that field.

This improve so much the performance in my case.

500 would be a little value if you will upsert, in example, 100.000 rows, because it will have to be commited 200 times.

Less commits, faster execution.



回答3:

useServerPrepStmts - false

useCursorFetch - true

useCompression - true

You can try editing the Connection and put these parameters there. This will increase the performance.

Double Click on the Database Connection -> Options tab -> Add above parameters.



回答4:

With mysql all other options dont work well I have try with some cheat

Insert into A(a,b) values (1,2);
Insert into A(a,b) values (2,2);
Insert into A(a,b) values (3,2);

Change to

Insert into A(a,b) values (1,2), (2,2), (3,2);

I'ts perfect way