Background:
I am aware of this SO question about Transactional NTFS (TxF) and this article describing how to use it, but I am looking for real-world experience with a reasonably high-volume enterprise system where lots of blob data (say documents and/or photos) need to be persisted once transactionally and read many times.
- We are expecting a few tens of thousands of documents written per day and reads of several tens of thousands per hour.
- We could either store indexes within the file system or in SQL Server but must be able to scale this out over several boxes.
- We must retain the ability to back up and restore the data easily for disaster recovery.
The Question:
- Any real-world, enterprise-grade experience with Transactional NTFS (TxF)?
Related questions:
- Anyone tried distributed transactions using TxF where the same file is committed to two mirror servers at once?
- Anyone tried a distributed transaction with the file system and a database?
- Any performance concerns/reliability concerns/performance data you can share? Has anyone even done something on this scale before where transactions are a concern?
Edits: To be more clear, I have researched other technologies, including SQL Server 2008's new FILESTREAM data type, but this question is specificially targeted at the transactional file system only.
More Resources:
- An MSDN Magazine article on TxF called "Enhance Your Apps With File System Transactions".
- A webcast called "Transactional Vista: Kernel Transaction Manager and friends (TxF, TxR)". This video quotes an overhead from using TxF of 2-5%, with the performance discussion starting about 25 minutes in. This is first set of hard numbers I've found. And the video is a very good overview of how this works under the hood. At about 34:30, the speaker describes a very similar scenario to this question.
- A Channel 9 screencast called "Surendra Verma: Vista Transactional File System". He talks about performance starting around 35 minutes in. No hard numbers.
- A list of TxF articles on the B# .NET Blog.
- An Channel 9 screencast called "Transactional NTFS".
While I don't have extensive experienve with TxF, I do have experience with MS DTC. TxF itself is fairly performant. When you throw in the MS DTC to handle multiple resource managers across multiple machines, performance takes a considerable hit.
From your description, it sounds like you are storing and indexing very large volumes of unstructured data. I assume that you also need the ability to search for this data. As such, I would highly recommend looking into something like Microsoft's Dryad or Google's MapReduce and a high performance distributed file system to handle your unstructured data storage and indexing. The best examples of high-volume enterprise systems that store and index massive volumes of blob data are Internet search engines like Bing and Google.
There are quite a few resources available for managing high-throughput unstructured data, and they would probably solve your problem more effectively than SQL Server and NTFS.
I know its a bit farther out of the box than you were probably looking for...but you did mention that you had already exhausted all other search avenues around the NTFS/TxF/SQL box. ;)
I suppose "real-world, enterprise-grade" experience is more subjective than it sounds.
Windows Update uses TXF. So it is being used quite heavily in terms of frequency. Now, it isn't doing any multi-node work and it isn't going through DTC or anything fancy like that, but it is using TXF to manipulate file state. It coordinates these changes with changes to the registry (TXR). Does that count?
A colleague of mine presented this talk to SNIA, which is pretty frank about a lot of the work around TXF and might shed a little more light. If you're thinking of using TXF, it's worth a read.
Ronald: FileStream is layered on top of TxF.
JR: While Windows Update uses TxF/KTM and demonstrates it's utility, it is not a high throughput application.
Unfortunately, it appears that the answer is "No."
In nearly two weeks (one week with a 100 point bounty) and 156 views, no one has answered that they have used TxF for any high-volume applications as I described. I can't say this was unexpected, and of course I cannot prove a negative, but it appears this feature of Windows is not well known or frequently used, at least by active members of the SO community at the time of writing.
If I ever get around to writing some kind of proof of concept, I'll post here what I learn.
Have you considered filestream support in SQL Server 2008 (if you're using SQL Server 2008 of course)? I'm not sure about performance, but it offers transactionality and supports backup/restore.