Loading files into Teradata

2019-08-23 03:21发布

问题:

Daily, I currently have to:

  1. download 50 files, 40 are in .csv (comma separated values) format, and 10 in .txt.
  2. In each of the .csv files, there are 5 columns (call them B, A, D, C, E), and a differernt 5 columns in the .txt files (call them G, H, I, J, K).
  3. I then have to manually open in Excel and rearange the .csv columns into (A, B, C, D, E) for each of the 40 files. The .txt file are already in the required format.
  4. I then load these all into 15 teradata tables, using BTEQ on unix

There are approx 3000 rows of data

I have, with the help of stackoverflow and my trial and error, created VBA which:

  1. Downloads all 50 files,
  2. rearranges columns correctly for the csv files,
  3. then on a new sheet that contains "INSERT INTO DBNAME.TABLENAME VALUES (A, B, C, D, E, F)" for the .csv files and "INSERT INTO DBNAME.TABLENAME VALUES (G, H, I, J, K)" for the .txt files, swaps A to K for actual values from all 50 files
  4. Create ODBC connection to teradata to loop through all 3000 created statements.

It works, but it's painfully slow because it loops through 3000 or so rows. Because the teradata database is accessed by a large number of people at all times, everytime an error occurrs, my insert statements have to go to the back of the queue, adding to the time taken to complete the procedure.

Is it possible to create VBA that will batch load the 50 files rather than my current method to create 3000 insert statements and loop? I have read about a process that imports into Teradata by using 'Jet Provider', but I know very little about this and my efforts so far have failed.

Anyone know about this method or any other quicker method? I want to avoid BTEQ, MLOAD and can't FLOAD (because tables I load into are not empty). Thanks everyone.

回答1:

Teradata is easily capable of handling massive inserts, no matter how many users. However, Teradata is very bad at small inserts. You need to bundle them into large volumes that are loaded in one go. There is tooling for that delivered with Teradata (mload). Then you get top performance.

As an alternative, maybe less speedy, make sure that you are using bind variables and not explicit values in your SQL. Parsing can take a lot of time. You might want to consider tools such as Kettle - ETL tool or Invantive Control - Excel add-in (warning, I work there) to load from Excel into Teradata. They can also load in parallel, improving performance.