Storing long binary (raw data) strings

2019-01-12 07:03发布

We are capturing a raw binary string that is variable in size (from 100k to 800k) and we would like to store these individual strings. They do not need to be indexed (duh) and there will be no queries on the contents of the field.

The quantity of these inserts will be very large (they are for archival purposes), let's say 10,000 per day. What is the best field type for large binary strings like these? Should it be text or blob or something else?

3条回答
神经病院院长
2楼-- · 2019-01-12 07:26

As far as PostgreSQL is concerned, type text is out of the question. It is slower, uses more space and is more error-prone than bytea for the purpose.
There are basically 3 approaches:

  1. use type bytea (basically the pg equivalent of the SQL blob type)

  2. use "large objects"

  3. store blobs as files in the filesystem and only store the filename in the database.

Each has it's own advantages and disadvantages.

  1. is rather simple to handle but needs the most disk space. Some decoding and encoding is required, which makes it also slow-ish. Backups grow rapidly in size!

  2. is slightly awkward in handling, but you have your own infrastructure to manipulate the blobs - if you should need that. And you can more easily make separate backups.

  3. is by far the fastest way and uses the least disk space. But it does not provide the referential integrity that you get when you store inside the database.

I have a number of implementations like that for image files: store a small thumbnail in a bytea-field for referential integrity and quick reference. Store the original image as file in the file-system. Of course, you need to put some thought into when and how to delete outdated files, how to backup the external files and such.

查看更多
霸刀☆藐视天下
3楼-- · 2019-01-12 07:39

You should probably use blob in most databases. text columns will often be normalized or transformed according to a text encoding or locale; and so the actual bytes may not be preserved as expected. In almost all cases, this is the only difference between a text and blob column.

查看更多
等我变得足够好
4楼-- · 2019-01-12 07:39

We have user VARBINARY column to store upto 300 MB of files.

查看更多
登录 后发表回答