After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for storing and querying schemaless data.
Rather than just building a file system on top of MySQL, why not just save the data directly to the filesystem? Indexing needs to be figured out, but modern filesystems are very stable, have great features like replication, snapshot and backup facilities, and are flexible at storing schema-less data.
However, I can't find any examples of someone using a filesystem instead of a database.
Where can I find more resources on how to implement a schemaless (or "document-oriented") database as a layer on top of a filesystem? Is anyone using a modern filesystem as a schemaless database?
You are welcome to take a look at our Solid File System, which is a virtual file system product with built-in support for file metadata and SQL-like search mechanism that searches through this data. Also please read the article that describes the benefits of storing different types of data in different kinds of storages.
Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:
pros: - - simple, intuitive.
things to think about:
richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes
speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.
inability to join queries (though that's pretty much common to NoSQL)
inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)
One thing you may want to take into consideration is Oracle's BFILE datatype, which is a pointer to a file on disk. Perhaps that might be the best of both worlds? Microsoft SQL server doesn't seem to offer this capability.
There's a big example of an implementation at Amazon's S3.
http://aws.amazon.com/s3/
This sort of implementation is where a lot of companies are moving towards, because it scales fundamentally better than a relational database can. The approach is simple, and it works, and for some problems, it's a great solution. In the case of Amazon's S3, it's particularly nice for cloud storage, if you don't want to have to worry about the hassles of storing the data yourself.