-->

dashDB and DB2 Load operation

2019-09-18 20:40发布

问题:

I am currently trying to use a dashDB database with the db2cli utility and ODBC (values are from Connect/Connection Information on the dashDB web console). At this moment I can perfectly do SELECT or INSERT statements and fetch data from custom tables which I have created, thanks to the command:

db2cli execsql -connstring "DRIVER={IBM DB2 ODBC DRIVER - IBMDBCL1}; DATABASE=BLUDB; HOSTNAME=yp-dashdb-small-01-lon02.services.eu-gb.bluemix.net; PORT=50000; PROTOCOL=TCPIP; UID=xxxxxx; PWD=xxxxxx" -inputsql /tmp/input.sql

Now I am trying to do a DB2 LOAD operation through the db2cli utility, but I don't know how to proceed or even if it is possible to do so. The aim is to import data from a file without cataloging the DB2 dashDB database on my side, but only through ODBC. Does someone know if this kind of operation is possible (with db2cli or another utility)?

回答1:

This is not possible. LOAD is not an SQL statement, therefore it cannot be executed via an SQL interface such as ODBC, only using the the DB2 CLP, which in turn requires a cataloged database.

ADMIN_CMD() can be invoked via an SQL interface, however, it requires that the input file be on the server -- it won't work with a file stored on your workstation.

If JDBC is an option, you could use the CLPPlus IMPORT command.



回答2:

The latest API version referenced from the DB2 on Cloud (ex DashDB) dashboard is available here. It requires first to call the /auth/tokens endpoint to generate an auth token based on your Bluemix credentials to be used to authorize the API calls.

I've published recently a npm module - db2-rest-client - to simplify the usage of these operations. For example, to load data from a .csv file you can use the following commands:

# install the module globally
npm i db2-rest-client -g

# call the load job
export DB_USERID='<USERID>'
export DB_PASSWORD='<PASSWORD>'
export DB_URI='https://<HOSTNAME>/dbapi/v3'
export DEBUG=db2-rest-client:cli
db2-rest-client load --file=mydata.csv --table='MY_TABLE' --schema='MY_SCHEMA'

For the load job, a test on Bluemix dedicated with a 70MB source file and about 4 million rows took about 4 minutes to load. There are also other CLI options as executing export statement, comma separated statements and uploading files.



回答3:

You can try loading data using REST API.

Example:

curl --user dashXXX:XXXXXX -H "Content-Type: multipart/form-data" -X POST -F loadFile1=@"/home/yogesh/Downloads/datasets/order_details_0.csv" "https://yp-dashdb-small-01-lon02.services.eu-gb.bluemix.net:8443/dashdb-api/load/local/del/dashXXX.ORDER_DETAILS?hasHeaderRow=true&timestampFormat=YYYY-MM-DD%20HH:MM:SS.U"


回答4:

I have used the REST API and have not seen any size limitations. In ver 1.11 of dashDB local (warehouse db) external tables have been included. As long as file is on the container it can be loaded. Also the DB2 Load locks the table until load is finished where a external table load won't



回答5:

There are a number of ways to get data into Db2 Warehouse on Cloud. From a command line you can use Lift CLI https://lift.ng.bluemix.net/ which provides the best performance for large data sets

You can also use EXTERNAL TABLEs https://www.ibm.com/support/knowledgecenter/ean/SS6NHC/com.ibm.swg.im.dashdb.sql.ref.doc/doc/r_create_ext_table.html which are also high performance and have lots of options

This is a quick example using a local file (not on the server) hence the REMOTESOURCE YES option

db2 "create table foo(i int)"
echo "1" > /tmp/foo.csv
db2 "insert into foo select * from external '/tmp/foo.csv' using (REMOTESOURCE YES)"
db2 "select * from foo"

I          
-----------
          1

1 record(s) selected.

for large files, you can use gzip, either on the fly

db2 "insert into foo select * from external '/tmp/foo.csv' using (REMOTESOURCE GZIP)"

or from gzip'ed files

gzip /tmp/foo.csv
db2 "insert into foo select * from external '/tmp/foo2.csv.gz' using (REMOTESOURCE YES)"