I am using the python script that inserts a batch of data into Teradata using the teradata python module with a script similar to the one below.
It uses an ODBC connection and occasionally I get the following error:
[HY001][Teradata][ODBC Teradata Driver] Memory allocation error
. What does this error mean? And any suggestions on how to fix this?
connection.executemany(
'INSERT INTO {}.{} ("{}") VALUES ({})'
.format(database, table_name,
'","'.join(column_names),
','.join(['?']*len(columns_names))),
records_for_insert,
batch=True
)
The Teradata team was able to help figure out what is causing this error.
ODBC limits the request size to 1MB (if the request size exceeds 1MB, it throws an error with error code 22001
).
According to the Teradata team:
"The ODBC bug found by engineering deals with the request message which has a maximum 1MB size. If the data size is below 1MB, the ODBC driver appends 12 bytes (KeepResp parcel). The memory allocation error is returned if the additional bytes cause the 1MB request message limit to be exceeded. The solution for this case is to return the 1MB error, instead of the memory allocation error, which is handled by the application. The fix is tracked by ODBC-17861 and targeted for ODBC 15.10.01.07 due late November."
Two ways, I tend to avoid Teradata 1 MB limits. (1) by dividing my batch into smaller batches that are less than 1 MB or (2) change method to 'rest'. Example, using teradata module:
import teradata
import pandas as pd
import numpy as np
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="odbc",system="DBName", username="UserName",
password="Password", driver="DriverName") as connect:
#We can divide our huge_df to small chuncks. E.g. 100 churchs
chunks_df = np.array_split(huge_df, 100)
#Import chuncks to Teradata
for i,_ in enumerate(chunks_df):
data = [tuple(x) for x in chuncks_df[i].to_records(index=False)]
connect.executemany("INSERT INTO DATABASE.TABLEWITH15COL")
values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",data,batch=True)
print('Batch',i+1,'Completed')
If you are using 'rest' as method, we would need host ip_address, instead of driver argument. e.g.
import teradata
import pandas as pd
# HOST_IP could be found with *nslookup viewpoint*
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="rest",system="DBName", username="UserName",
password="Password", host="HOST_IP_ADDRESS") as connect:
data = [tuple(x) for x in huge_df.to_records(index=False)]
with connect.cursor() as e:
e.executemany("INSERT INTO DATABASE.TABLEWITH15COL")
values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)",data,batch=True)
I hope this will help you find ways to overcome that limitation.Here is a link to Teradata Module Documentation, were you to choose using it.