Using a Filesystem (Not a Database!) for S

2019-03-17 23:56发布

问题:

After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for storing and querying schemaless data.

Rather than just building a file system on top of MySQL, why not just save the data directly to the filesystem? Indexing needs to be figured out, but modern filesystems are very stable, have great features like replication, snapshot and backup facilities, and are flexible at storing schema-less data.

However, I can't find any examples of someone using a filesystem instead of a database.

Where can I find more resources on how to implement a schemaless (or "document-oriented") database as a layer on top of a filesystem? Is anyone using a modern filesystem as a schemaless database?

回答1:

Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

pros: - - simple, intuitive.

  • takes advantage of years of tuning and caching algorithms
  • easy backup, potentially easy clustering

things to think about:

  • richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes

  • speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.

  • inability to join queries (though that's pretty much common to NoSQL)

  • inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)

  • May not have the kind of caching algorithm you want for it's directory structure
  • tends to be less tunable, etc.
  • backup solutions may have trouble depending on how you store things - too deep, too many items per node, etc - which might obviate an obvious advantage of such a structure. locking for a LOCAL filesystem works pretty well of course if you call the right routines, but not necessarily for a network base fileesytem (those problems have been solved in various ways, but it's certainly a design issue)


回答2:

You are welcome to take a look at our Solid File System, which is a virtual file system product with built-in support for file metadata and SQL-like search mechanism that searches through this data. Also please read the article that describes the benefits of storing different types of data in different kinds of storages.



回答3:

One thing you may want to take into consideration is Oracle's BFILE datatype, which is a pointer to a file on disk. Perhaps that might be the best of both worlds? Microsoft SQL server doesn't seem to offer this capability.



回答4:

There's a big example of an implementation at Amazon's S3.

http://aws.amazon.com/s3/

This sort of implementation is where a lot of companies are moving towards, because it scales fundamentally better than a relational database can. The approach is simple, and it works, and for some problems, it's a great solution. In the case of Amazon's S3, it's particularly nice for cloud storage, if you don't want to have to worry about the hassles of storing the data yourself.