Amazon Redshift is based on ParAccel which is based on Postgres. From my research it seems that the preferred way to perform hexadecimal string to integer conversion in Postgres is via a bit field, as outlined in this answer.
In the case of bigint, this would be:
select ('x'||lpad('123456789abcdef',16,'0'))::bit(64)::bigint
Unfortunately, this fails on Redshift with:
ERROR: cannot cast type text to bit [SQL State=42846]
What other ways are there to perform this conversion in Postgres 8.1ish (that's close to the Redshift level of compatibility)? UDFs are not supported in Redshift and neither are array, regex functions or set generating functions...
A likely explanation is that the cast from
text
tobit(n)
relies on undocumented behavior, I repeat the quote from Tom Lane:And Amazon derivate is obviously not allowing this undocumented feature. Not surprising, since it is based off of Postgres 8.1 where there was no cast at all.
Previously quoted in this closely related answer:
Convert hex in text representation to decimal number
It looks like they added a function for this at some point: STRTOL
For example
Returns:
3735928559
Assuming that you want a simple digit-by-digit ordinal position conversion (i.e. you're not worried about two's compliment negatives, etc) I think this should work on an 8.1-equivalent DB:
The function form is optional, it just makes it easier to avoid repeating the argument a bunch of times. It should get inlined anyway. Efficiency will probably be awful, but most of the tools available to do this smarter don't seem to be available on versions that old, and this at least works:
If you can use
regexp_split_to_array
andgenerate_subscripts
it might be faster. Or slower. I haven't tried. Another possible trick is to use a digit mapping array instead of theCASE
, like:which you can use with:
Personally, I'd do it client-side instead, rather than wrangling the limited capabilities of an old PostgreSQL fork, especially one you can't load your own sensible user-defined C functions on, or use PL/Perl, etc.
In real PostgreSQL I'd just use this:
hex2dec.c:
Makefile:
hex2dec.control:
hex2dec--1.0.sql:
Usage:
The performance difference is ... noteworthy. Given sample data:
conversion from hex to decimal takes about 1.3 from a warm cache using the C extension, which isn't great for a million rows. Reading them without any transformation takes 0.95s. It took 36 seconds for the SQL based hex2dec approach to process the same rows. Frankly I'm really impressed that the SQL approach was as fast as that, and surprised the C ext was that slow.