I'm currently working on some ETL code to incrementally load data from a source Oracle system to a Postgresql analysis environment with Python as the glue.
I am leveraging cx_Oracle as the driver to pull the Oracle data, but I am running into some strange issues with Oracle Date and Python datetime and year 0.
As far as I understand, neither Oracle nor Python support year 0. I've isolated a particular row and field with an offending Date:
SELECT
CAR_RENTAL_DATE,
EXTRACT(YEAR FROM CAR_RENTAL_DATE)
FROM
TRXNS_ALL
WHERE
TRX_ID = 50079327
Here are the results from the sqlplus CLI:
CAR_RENTA EXTRACT(YEARFROMCAR_RENTAL_DATE)
--------- --------------------------------
30-JUN-00 0
According to this post, it is actually possible to create an Oracle Date with year 0, but its not officially supported in the docs: http://rwijk.blogspot.com/2008/10/year-zero.html
The issue I am having is the Python representation that cx_Oracle is returning to me:
cur.execute('select CAR_RENTAL_DATE, extract(year from CAR_RENTAL_DATE) FROM TRXNS_ALL where TRX_ID = 50079327')
res = cur.fetchone()
print repr(res[0]), type(res[0])
Which results in the following:
datetime.datetime(0, 6, 30, 0, 0) <type 'datetime.datetime'>
Strangely (but expectedly), creating the same Python datetime gives me an error:
dt = datetime.datetime(0, 6, 30, 0, 0)
ValueError: year is out of range
When I attempt to insert this datetime to a Postgresql timestamp field with psycopg2, the insertion fails, as expected.
What is really going on here? How do Oracle and Python both support year 0 despite their documentation?
What is a good way to handle this aside from iterating over every tuple value returned by cx_Oracle to detect and handle year=0?