Batch downloading of table using cx_oracle

2019-08-18 09:50发布

问题:

I need to download a large table from an oracle database into a python server, using cx_oracle to do so. However, the ram is limited on the python server and so I need to do it in a batch way.

I know already how to do generally a whole table

usr = ''
pwd = ''
tns = '(Description = ...'
orcl = cx_Oracle.connect(user, pwd, tns)
curs = orcl.cursor()
printHeader=True
tabletoget = 'BIGTABLE'
sql = "SELECT * FROM " + "SCHEMA." + tabletoget
curs.execute(sql)
data = pd.read_sql(sql, orcl)
data.to_csv(tabletoget + '.csv' 

I'm not sure what to do though to load say a batch of 10000 rows at a time and then save it off to a csv and then rejoin.

回答1:

You can use cx_Oracle directly to perform this sort of batch:

curs.arraysize = 10000
curs.execute(sql)
while True:
    rows = cursor.fetchmany()
    if rows:
        write_to_csv(rows)
    if len(rows) < curs.arraysize:
        break

If you are using Oracle Database 12c or higher you can also use the OFFSET and FETCH NEXT ROWS options, like this:

offset = 0
numRowsInBatch = 10000
while True:
    curs.execute("select * from tabletoget offset :offset fetch next :nrows only",
            offset=offset, nrows=numRowsInBatch)
    rows = curs.fetchall()
    if rows:
        write_to_csv(rows)
    if len(rows) < numRowsInBatch:
        break
    offset += len(rows)

This option isn't as efficient as the first one and involves giving the database more work to do but it may better for you depending on your circumstances.

None of these examples use pandas directly. I am not particularly familiar with that package, but if you (or someone else) can adapt this appropriately, hopefully this will help!