Storing long binary (raw data) strings

2019-01-12 07:01发布

问题:

We are capturing a raw binary string that is variable in size (from 100k to 800k) and we would like to store these individual strings. They do not need to be indexed (duh) and there will be no queries on the contents of the field.

The quantity of these inserts will be very large (they are for archival purposes), let's say 10,000 per day. What is the best field type for large binary strings like these? Should it be text or blob or something else?

回答1:

As far as PostgreSQL is concerned, type text is out of the question. It is slower, uses more space and is more error-prone than bytea for the purpose.
There are basically 3 approaches:

  1. use type bytea (basically the pg equivalent of the SQL blob type)

  2. use "large objects"

  3. store blobs as files in the filesystem and only store the filename in the database.

Each has it's own advantages and disadvantages.

  1. is rather simple to handle but needs the most disk space. Some decoding and encoding is required, which makes it also slow-ish. Backups grow rapidly in size!

  2. is slightly awkward in handling, but you have your own infrastructure to manipulate the blobs - if you should need that. And you can more easily make separate backups.

  3. is by far the fastest way and uses the least disk space. But it does not provide the referential integrity that you get when you store inside the database.

I have a number of implementations like that for image files: store a small thumbnail in a bytea-field for referential integrity and quick reference. Store the original image as file in the file-system. Of course, you need to put some thought into when and how to delete outdated files, how to backup the external files and such.



回答2:

You should probably use blob in most databases. text columns will often be normalized or transformed according to a text encoding or locale; and so the actual bytes may not be preserved as expected. In almost all cases, this is the only difference between a text and blob column.



回答3:

We have user VARBINARY column to store upto 300 MB of files.