We are capturing a raw binary string that is variable in size (from 100k to 800k) and we would like to store these individual strings. They do not need to be indexed (duh) and there will be no queries on the contents of the field.
The quantity of these inserts will be very large (they are for archival purposes), let's say 10,000 per day. What is the best field type for large binary strings like these? Should it be text
or blob
or something else?
As far as PostgreSQL is concerned, type text
is out of the question. It is slower, uses more space and is more error-prone than bytea
for the purpose.
There are basically 3 approaches:
use type bytea (basically the pg equivalent of the SQL blob type)
use "large objects"
store blobs as files in the filesystem and only store the filename
in the database.
Each has it's own advantages and disadvantages.
is rather simple to handle but needs the most disk space. Some decoding and encoding is required, which makes it also slow-ish. Backups grow rapidly in size!
is slightly awkward in handling, but you have your own infrastructure to manipulate the blobs - if you should need that. And you can more easily make separate backups.
is by far the fastest way and uses the least disk space. But it does not provide the referential integrity that you get when you store inside the database.
I have a number of implementations like that for image files: store a small thumbnail in a bytea-field for referential integrity and quick reference. Store the original image as file in the file-system. Of course, you need to put some thought into when and how to delete outdated files, how to backup the external files and such.
You should probably use blob
in most databases. text
columns will often be normalized or transformed according to a text encoding or locale; and so the actual bytes may not be preserved as expected. In almost all cases, this is the only difference between a text
and blob
column.
We have user VARBINARY column to store upto 300 MB of files.