Storing PDF files as binary objects in SQL Server,

2019-03-18 02:37发布

问题:

I have to find a design decision for the following task:

I have a SQL Server database and it contains a table of orders. PDF documents will be uploaded by users through a simple file upload from a web page and assigned to an order. There is not more than one document per order (perhaps no document, never more than one). For this purpose a user opens a web page, enters an order number, gets the order displayed and clicks on an upload button. So I know to which order the uploaded document belongs to.

Now I am considering two options to store the documents on the web server:

1) Extend my table of orders by a varbinary(MAX) column and store the PDF document directly into that binary field.

2) Save the PDF file in a specific folder on disk and give it a unique name related to the order (for instance my order number which is a primary key in the database, or a GUID which I could store in an additional column of the order table). Perhaps I have to store the files in subfolders, one per month, and store the subfolder name into the order row in the database, to avoid getting too many thousand files in one folder.

After the PDF files are stored they can be downloaded and viewed via browser after entering the related order number.

I'm tending towards option (1) because the data management seems easier to me having all relevant data in one database. But I am a bit afraid that I could encounter performance issues over time since my database size will grow much faster than with solution (2). Around 90% or even 95% of the total database size would be made up only by those stored PDF files.

Here is some additional information:

  • The PDF files will have a size of around 100 Kilobyte each
  • Around 1500 orders/PDF files per month
  • Windows Server 2008 R2 / IIS 7.5
  • SQL Server 2008 SP1 Express
  • Not quite sure about the hardware, I believe one QuadCore Proc. and 4 GB RAM
  • Application is written in ASP.NET Webforms 3.5 SP1

(I am aware that I will reach the 4GB-limit of the SQL Server Express edition after around 2 years with the numbers above. But we can disregard this here, either removing old data from the database or upgrading to a full license will be a possible option.)

My question is: What are the Pro and Contras of the options and what would you recommend? Perhaps someone had a similar task and can report about his experience.

Thank you in advance for reply!

Related:

Storing Images in DB - Yea or Nay?

回答1:

With SQL Server 2008, when you have documents that are mostly 1 MB or more in size, the FILESTREAM feature would be recommended. This is based on a paper published by Microsoft Research called To BLOB or not to BLOB which analyzed the pros and cons of storing blobs in a database in great length - great read!

For documents of less than 256K on average, storing them in a VARBINARY(MAX) column seems to be the best fit.

Anything in between is a bit of a toss-up, really.

You say you'll have PDF documents mostly around 100K or so -> those will store very nicely into a SQL Server table, no problem. One thing you might want to consider is having a separate table for the documents that is linked to the main facts table. That way, the facts table will be faster in usage, and the documents don't get in the way of your other data.



回答2:

This was asked many times about storing images, but the discussion to those still applies:

  • https://stackoverflow.com/questions/805519/save-image-in-database
  • store image in database or in a system file?
  • Should I store my images in the database or folders?
  • Storing Images in DB - Yea or Nay?


回答3:

I would also create a separate table for the documents, that way the search data/key fields for document retrieval will be more cache'able. The only time your database will need to touch the document table is during an insert or download.



回答4:

I would recommend AGAINST storing the files in SQL. You are adding extra overhead when retrieving the files. IIS is really efficient at serving up files, but with SQL are the storage facility you now have introduced a bottle neck, as you now have to hop from your web server to your SQL Server and back to get the file.

When you store your files on the webserver, your process can determine the appropriate file based on the criteria you've listed, point to it and serve it. Document management systems such as Documentum and Alfresco store the files on a share, and this allows you great flexibility with respects to back up and and redundant storage.



回答5:

I am sceptical storing large blobs in SQL, assuming that sql page size is 4k (off the nut).. it has to assemble fragment of the entire file in nK blocks when serving the file back to user .. I am not sure whether this is the case or not.



回答6:

We ran in to a similar situation albeit in principle only. We needed a way by which documents stored to SharePoint could be accessed via a link on a web page. Since everything is project based with a unique project number the solution was to implement a common naming convention to the documents. s the web page is created server-side, the links are dynamically created. The code takes the base path to the SharePoint server and then adds the project number and specifics for the document.

Example:

[SharePoint Base Path][Project Numbe][Project Document Name]
[http://mysharepoint.mycompany.com/213990/213990_PC.pdf]