Convert bytea to double precision in PostgreSQL

2019-06-19 07:53发布

问题:

I have a database where one of the tables stores a blob (bytea) of all kinds of generic data collected from another system. The bytea field can have anything in it. In order to know how to interpret the data, the table also has a format field. I wrote a Java application to read the bytea field from the database as a byte[] and then I can easily convert it to double[] or int[] or whatever the format field says by using ByteBuffer and the various views (DoubleBuffer, IntBuffer, etc.).

Now I have the situation where I need to do some manipulation of the data on the database itself within a trigger function in order to maintain integrity with another table. I can find conversions for just about any data type imaginable, but I can't find anything for going from bytea (or even bit) to double precision and back. A bytea can be broken up, converted to bits, and then converted to an int or bigint, but not a double precision. For example, x'deadbeefdeadbeef'::bit(64)::bigint will convert to -2401053088876216593 with no problems, but x'deadbeefdeadbeef'::bit(64)::double precision fails with "ERROR: cannot cast type bit to double precision" instead of giving the IEEE 754 answer of -1.1885959257070704E148.

I found this answer https://stackoverflow.com/a/11661849/5274457, which basically implements the IEEE standard to convert bits to double, but is there really not a basic conversion function in PostgreSQL to do this? Plus, I need to go backwards as well from double precision to bytea when I'm done manipulating the data and need to update the tables, which this answer doesn't provide.

Any ideas?

回答1:

Ok, I found an answer. In PostgreSQL, you can write functions using Python. In order to enable the use of Python, you have to install the specific version of Python needed by your installation of PostgreSQL and have it available in the PATH environment variable. You can find which version of Python your installation of PostgreSQL needs by looking at the installation notes. I'm currently using PostgreSQL 9.6.5 on Windows and it calls for Python 3.3. I initially tried the latest Python 3.6, but it wouldn't work. I settled with the latest Python 3.3 for Windows, which is 3.3.5.

After installing Python, you enable it in PostgreSQL by executing CREATE EXTENSION plpython3u; on your database as documented here https://www.postgresql.org/docs/current/static/plpython.html. From there, you can write any function with Python bodies.

For my specific case to convert from bytea to double precision[] and back, I wrote the following functions:

CREATE FUNCTION bytea_to_double_array(b bytea)
    RETURNS double precision[]
    LANGUAGE 'plpython3u'
AS $BODY$
  if 'struct' in GD:
    struct = GD['struct']
  else:
    import struct
    GD['struct'] = struct

  return struct.unpack('<' + str(int(len(b) / 8)) + 'd', b)
$BODY$;

CREATE FUNCTION double_array_to_bytea(dblarray double precision[])
    RETURNS bytea
    LANGUAGE 'plpython3u'
AS $BODY$
  if 'struct' in GD:
    struct = GD['struct']
  else:
    import struct
    GD['struct'] = struct

  # dblarray here is really a list.
  # PostgreSQL passes SQL arrays as Python lists
  return struct.pack('<' + str(int(len(dblarray))) + 'd', *dblarray)
$BODY$;

In my case, all the doubles are stored in little endian, so I use <. I also cache the import of the struct module in the global dictionary as described in https://stackoverflow.com/a/15025425/5274457. I used GD instead of SD because I want the import available in other functions I may write. For information about GD and SD, see https://www.postgresql.org/docs/current/static/plpython-sharing.html.

To see it in action knowing the blobs in my database are stored as little endian,

SELECT bytea_to_double_array(decode('efbeaddeefbeadde', 'hex')), encode(double_array_to_bytea(array[-1.1885959257070704E148]), 'hex');

And the answer I get is

bytea_to_double_array    | encode
double precision[]       | text
-------------------------+------------------
{-1.18859592570707e+148} | efbeaddeefbeadde

where 'efbeaddeefbeadde' is 'deadbeefdeadbeef' in little endian.