SQLAlchemy Large Number Truncation/Rounding Issue

2019-08-09 16:28发布

问题:

I manage a number of ETL jobs and among them I have connections to APIs like Facebook graph and Google Doubleclick, which use large numeric unique identifiers. We use an Oracle database to stage some of this data and combine it with our own data. The problem I keep running in to is that when I go to insert these large numeric IDs into a number column via SQLAlchemy (which uses cx_Oracle) significant digits get truncated.

Example: 1234567890726531 becomes 1234567890726530

Workaround: To get around this, I've been using VARCHAR2 data types to hold the text representation of the IDs, since that retains all of the significant digits.

I believe it is related to these bug threads:

  • regression in numeric precision behavior in 6.0 #68
  • v6.0.2 Issue 68 Redux? #98

(I was unable to replicate the cursor code mentioned in these threads to test that solution in my situation)

Some sample code (or the relevant parts extracted from it) for the Facebook example is as follows:

[...code that populate the "buffer" list]
schema,table_name = 'some_schema','some_table'
engine = create_engine(enginestr)
metadata = MetaData()
table  = Table(table_name,
                 metadata,
                 schema=schema,
                 autoload=True,
                 autoload_with=self.engine)
buf=[]
for i in buffer:
    d={
        'id':i[1]['id'],
        'id_char':i[1]['id'],
        'name':i[1]['name'],
        'status':i[1]['status'],
        'page_id':i[0]['id']
    }
    buf+=[d]
engine.execute(table.insert(),buf)

A buffer entry looks like this:

[(<Page> {
      "id": "FacebookPageName"
  }, <LeadgenForm> {
      "id": "123456789012345",
      "leadgen_export_csv_url": "https://www.facebook.com/ads/lead_gen/export_csv/?id=123456789012345&type=form&source_type=graph_api",
      "locale": "en_US",
      "name": "Leadgen Form Name",
      "status": "ACTIVE"
  })]

The table is essentially this:

create table some_schema.some_table (
id number primary key,
name varchar2(512 char),
status varchar2(30 char),
updated timestamp with time zone default systimestamp,
id_char varchar2(64 char)
);

After running the code above, the result of this SQL should be nothing

select to_char(t.id) ,t.id_char from some_schema.some_table t where t.id<>t.id_Char;

however, it does return results (slightly altered to preserve privacy)

NUMERIC_ID  ID_CHAR
1234567890726530    1234567890726531
1234567890585300    1234567890585296
1234567890103880    1234567890103882
1234567890185790    1234567890185794
1234567890359660    1234567890359664
1234567890793130    1234567890793131
1234567890250270    1234567890250272
1234567890467220    1234567890467223
1234567890905240    1234567890905241
1234567890369260    1234567890369255

This DML updates it to show as expected as shows me the the problem lies on the python side:

update some_schema.some_table t set t.id = t.id_char where t.id<>t.id_Char;

Is there a better way to handle:

  • Table reflection
  • Data type conversion
  • Inserting
  • Anything

Such that my large numeric values do not get truncated? Right now using the string-type containers for the IDs seems to be working, but isn't the best in terms of storage space per row, which becomes a concern with larger datasets.

Supplemental Information:

  • Python Version: 3.6.2
  • SQLAlchemy Version: 1.2.0
  • cx-Oracle Version: 6.1

Edit:

At the suggestion of Anthony Tuininga, I tried directly inserting the records via cx-Oracle and it did not cause the rounding issue above. This leads me to conclude that my issue lies either in my implementation of SQLAlchemy or the SQLAlchemy library itself.

buf=[]
columns = ('id','id_char','name','status','page_id')
for i in buffer:
    d=(
        i[1]['id'],
        i[1]['id'],
        i[1]['name'],
        i[1]['status'],
        i[0]['id']
    )
    buf+=[d]

from ouplax.database import KEY
import cx_Oracle
config = {
    'server'     : 'TNSName',
    'username'   : 'username',
    'schema'     : 'some_schema',
    'table_name' : 'some_table',
    'columns'    : ','.join(columns),
    'binds'      : ','.join( [':{}'.format(i) for i in range(1,len(columns)+1)] )
}
k = KEY(server=config['server'],username=config['username'],keyHeader='PYSQL') #Object for storing/retrieving credentials
connection = cx_Oracle.connect(config['username'],k.getpass(),server)
cursor = cx_Oracle.Cursor(connection)
stmt = 'truncate table {schema}.{table_name}'.format(**config)
print(stmt)
cursor.execute(stmt)
stmt = 'insert into {schema}.{table_name} ({columns}) values ({binds})'.format(**config)
print(stmt)
cursor.prepare(stmt)
cursor.executemany(None, buf)
connection.commit()
cursor.close()
connection.close()

回答1:

I stumbled upon this problem myself and filed a bug report in SQLAlchemy repo (link). It seems the issue will be resolved in SQLAlchemy 1.2.11 release.