I manage a number of ETL jobs and among them I have connections to APIs like Facebook graph and Google Doubleclick, which use large numeric unique identifiers. We use an Oracle database to stage some of this data and combine it with our own data. The problem I keep running in to is that when I go to insert these large numeric IDs into a number column via SQLAlchemy (which uses cx_Oracle) significant digits get truncated.
Example: 1234567890726531 becomes 1234567890726530
Workaround: To get around this, I've been using VARCHAR2 data types to hold the text representation of the IDs, since that retains all of the significant digits.
I believe it is related to these bug threads:
- regression in numeric precision behavior in 6.0 #68
- v6.0.2 Issue 68 Redux? #98
(I was unable to replicate the cursor code mentioned in these threads to test that solution in my situation)
Some sample code (or the relevant parts extracted from it) for the Facebook example is as follows:
[...code that populate the "buffer" list]
schema,table_name = 'some_schema','some_table'
engine = create_engine(enginestr)
metadata = MetaData()
table = Table(table_name,
metadata,
schema=schema,
autoload=True,
autoload_with=self.engine)
buf=[]
for i in buffer:
d={
'id':i[1]['id'],
'id_char':i[1]['id'],
'name':i[1]['name'],
'status':i[1]['status'],
'page_id':i[0]['id']
}
buf+=[d]
engine.execute(table.insert(),buf)
A buffer entry looks like this:
[(<Page> {
"id": "FacebookPageName"
}, <LeadgenForm> {
"id": "123456789012345",
"leadgen_export_csv_url": "https://www.facebook.com/ads/lead_gen/export_csv/?id=123456789012345&type=form&source_type=graph_api",
"locale": "en_US",
"name": "Leadgen Form Name",
"status": "ACTIVE"
})]
The table is essentially this:
create table some_schema.some_table (
id number primary key,
name varchar2(512 char),
status varchar2(30 char),
updated timestamp with time zone default systimestamp,
id_char varchar2(64 char)
);
After running the code above, the result of this SQL should be nothing
select to_char(t.id) ,t.id_char from some_schema.some_table t where t.id<>t.id_Char;
however, it does return results (slightly altered to preserve privacy)
NUMERIC_ID ID_CHAR
1234567890726530 1234567890726531
1234567890585300 1234567890585296
1234567890103880 1234567890103882
1234567890185790 1234567890185794
1234567890359660 1234567890359664
1234567890793130 1234567890793131
1234567890250270 1234567890250272
1234567890467220 1234567890467223
1234567890905240 1234567890905241
1234567890369260 1234567890369255
This DML updates it to show as expected as shows me the the problem lies on the python side:
update some_schema.some_table t set t.id = t.id_char where t.id<>t.id_Char;
Is there a better way to handle:
- Table reflection
- Data type conversion
- Inserting
- Anything
Such that my large numeric values do not get truncated? Right now using the string-type containers for the IDs seems to be working, but isn't the best in terms of storage space per row, which becomes a concern with larger datasets.
Supplemental Information:
- Python Version: 3.6.2
- SQLAlchemy Version: 1.2.0
- cx-Oracle Version: 6.1
Edit:
At the suggestion of Anthony Tuininga, I tried directly inserting the records via cx-Oracle and it did not cause the rounding issue above. This leads me to conclude that my issue lies either in my implementation of SQLAlchemy or the SQLAlchemy library itself.
buf=[]
columns = ('id','id_char','name','status','page_id')
for i in buffer:
d=(
i[1]['id'],
i[1]['id'],
i[1]['name'],
i[1]['status'],
i[0]['id']
)
buf+=[d]
from ouplax.database import KEY
import cx_Oracle
config = {
'server' : 'TNSName',
'username' : 'username',
'schema' : 'some_schema',
'table_name' : 'some_table',
'columns' : ','.join(columns),
'binds' : ','.join( [':{}'.format(i) for i in range(1,len(columns)+1)] )
}
k = KEY(server=config['server'],username=config['username'],keyHeader='PYSQL') #Object for storing/retrieving credentials
connection = cx_Oracle.connect(config['username'],k.getpass(),server)
cursor = cx_Oracle.Cursor(connection)
stmt = 'truncate table {schema}.{table_name}'.format(**config)
print(stmt)
cursor.execute(stmt)
stmt = 'insert into {schema}.{table_name} ({columns}) values ({binds})'.format(**config)
print(stmt)
cursor.prepare(stmt)
cursor.executemany(None, buf)
connection.commit()
cursor.close()
connection.close()