Would you store binary data in database or in file

2019-01-02 18:04发布

This is a question which has been asked before (large-text-and-images-in-sql) but mainly for data which will be changed. In my case the data will be stored and never changed. Just seems sensible to keep everything together.

Are there any reasons why I should not store static binary data in a database?

Assuming it is a sensible thing to do, are there any advantages to storing such data in separate tables? (You might begin to realise now that I'm not a DB expert...)

Clarify: There will probably be no more than 10-20 users but these will be in the US and in the UK. The binary data will have to be transfered in any case.

11条回答
梦该遗忘
2楼-- · 2019-01-02 18:19

Addressing the issue from a principles point of view, a relational database is (mainly) there for storing structured data. If you cannot make a query condition or join on a data element it probably doesn't belong in the database. I don't see an image BLOB being used in a WHERE clause, so I'd say keep it outside the database. A CLOB on the other hand can be used in queries.

查看更多
荒废的爱情
3楼-- · 2019-01-02 18:22

The advantage of storing data in the DB is taking advantage of DB security mechanisms and reducing maintanence cost (backups, ...). The disadvantage of it is increasing DB load and consuming connections (which might be expensive for per-connection licensed database servers). If you are using SQL Server 2008, FILESTREAM might be a nice alternative.

By the way, for Web apps (or any other apps that might need streaming the data), it's usually more sensible to store data outside DB.

查看更多
步步皆殇っ
4楼-- · 2019-01-02 18:26

The biggest dissadvantage if you are storing BLOBS is memory consumption. Can you imagine what select * from x would do for thousands of records with a 45k image in each?

As Mehrdad said there are also advantages. So if you decide to go with that approach you should try to design your database so that most queries return less results with BLOB data in them. Maybe for example make one to one relationships for this purpose.

查看更多
旧人旧事旧时光
5楼-- · 2019-01-02 18:29

I'm familiar with a fairly good-sized OSS project that made the decision at its inception to store images in the MySQL database, and it's proven to be among the top 3 bad ideas they have been coping with ever since. (Exacerbated by the fact the "refactor mercilessly" is anathema, but that's another story.)

Among the serious problems this has caused:

  1. Exceeding maximum efficient database size (mysql). (The total space required for images exceeds all others by a at least 2 orders of magnitude).

  2. Image files lose their "fileness". No dates sizes etc. unless stored (redundantly) as dates (which require code for management).

  3. Arbitrary byte sequences don't process nicely all the time, for either storage or manipulation.

  4. "We'll never need to access the images externally" is a dangerous assumption.

  5. Fragility. Because the whole arrangement is unnatural and touchy, and you don't know where it will bite next (contributing to the anti-refactor mentality).

The benefits? None that I can think of, except it might have been the path of least resistance at the time.

查看更多
不流泪的眼
6楼-- · 2019-01-02 18:29

We store attachments in our system, and you cannot change an attachment, so I think we're on the same page w/data that "will be stored and never changed." We specifically decided not to store it in the database. We did this for two reasons, simplicity, and backup/recovery time.

Simplicity first: In our case these attachments are uploaded from the end-user's browser, and it's simpler to just write them to a directory (on the DB server) than it is to then stream them down the SQL pipe. There is a record of them in the DB, but the DB just contains meta-information about the attachment, and the name of the file on disk (a guid in our case)

On the backup/recovery side: These blobs will likely become one of the largest pieces of your database. Whenever you run a full backup you'll be copying these bits over and over, even though you know then can never change. To us it just seemed much simpler to have (much) smaller backups, and do an xcopy of the attachment directory to a secondary server as the backup.

查看更多
浪荡孟婆
7楼-- · 2019-01-02 18:30

The performance issue here as been address above, so I won't repeat it. But I think a good tip if you are storing things that will be streamed out a lot (such as images/documents on a web-site) is to build in a caching system.

By this I mean store all the data in your database, but when someone requests that file, check if it exists on disk (based on a known filename, in a temp folder), if not, grab it from the DB and write it to the folder, and then stream that to the user. For the next request to the same file, since it exists on disk, it can be served from there without hitting the DB. But if you need to delete these files (or your web-server goes kapput!), it doesn't matter as they will be rebuilt again from the DB as people request them. This should be much quicker than serving each request for the same file from the DB.

查看更多
登录 后发表回答