Has anyone ever try to use DB2 import from within a Perl program?
My application is inserting 10 million of rows and apparently connecting using DBI and doing the insert row by row takes forever.
DB2 import/load from the command line works great, however is there a better way rather than to call a system call from the Perl program to invoke:
use IPC::System::Simple qw( systemx );
use autodie;
systemx( "db2 connect ..." );
systemx( "db2 import ..." );
etc?
thanks!
The fastest way to do it with DBI is to prepare your sql insert statement and then execute it multiple times. DBIx::Classes populate method does this internally.
I have actually had exactly the problem you seem to be having. It seems that on some systems you have to explicitly perform a DB2 Connect prior to being able to import.
In fact, I found my scripts behaved the most consistently when I had a DB2 Connect string immediately prior to my import statement, but that may be a system dependent issue.
Suggested solution:
I started out with a connection statement like the following. Checking that this statement will work also allows for a confirmation that your DB path is valid.
db2 connect to $DB_NAME user $DB_USER using $DB_PASS
I ended up saving the string above as
$connnection_starter
since you will be using it in several places.I then made a
system
call like the following:system($con_starter . "; db2 import from $temp_file_path of del commitcount 5000 $insert_update_setting into tablespace.$table_name");
The
commitcount
value is not necessarily needed in your case, (although it is generally well suited to very large imports) but I would suggest using it since it causes DB2 to log a message on the status of the import every 5000 records if you are running the script via the command-line/shell.Error Checking
You can pull in the value of
$?
to see if an error was produced during the import command, since it should have a return of0
if the behavior was correct.Annoyingly, the other return codes are not very helpful. I set up my import command to log the exact statement in case of a failure, so that I could manually review that at a later point:
Hopefully that helps!
I can think of two basic approaches.
Option 1. Import via perl DBI. Use the prepare method on your $dbh to prepare your insert statement. Turn off autocommit (usually at
connect()
), and go crazy. For memory reasons, you may want toCOMMIT
every so often (every 1000 inserts, perhaps). This is the tradeoff: the more often you commit, the less memory you use, but the more sync overhead you'll face with the disk. By default, with autocommit on, you're effectively committing after every insert, thus the slowdown.Option 2. You can call the DB2 CLP against a DDL file that has the commands for connecting, importing, etc., in it. This reduces the overhead of multiple processes being spawned.
Personally, I'd try #1 first.
I don't have any experience with DB2. Based on my experience with Oracle and MySQL, I would expect a row-by-row update to be slowest, a multi-row insert to be faster (if supported by DB2 and DBI) and the vendor-provided bulk tool to be fastest. Run some benchmarks to be sure.