Difference between Object Storage And File Storage

2019-03-08 04:30发布

Could someone explain what difference between Object Storage and File Storage is please?

I read about Object Storage on wiki, also I read http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf, also I read amazons docs(S3), openstack swift and etc. But could someone give me an example to understand better?

All the difference is only that for 'object storage' objects we add more metadata?

For example how to store image like object using some programming language (for example python)?

Thanks.

10条回答
干净又极端
2楼-- · 2019-03-08 04:33

IMO, Object storage has nothing to do with scale because someone could build a FS which is capable of storing a huge number of files, even in a single directory.

It is also not about the access methods. HTTP access to data in filesystems has been available in many well known NAS systems.

Storage/Access by OID is a way to handle data without bothering about naming it. It could be done on files too. I believe there is an NFS protocol extension that allows this.

I would muster this: Object storage is a (new/different) ''object centric'' way of thinking of data, its access and management.

Think about these points:

What are snapshots today? They are point in time copies of a volume. When a snapshot is taken, all files in the volume are snapped too. Whether all of them like it or not, whether all of them need it or not. A lot of space can get used(wasted?) for a complete volume snapshot while only a few files needed to be snapped.

In an object storage system, you will rarely see snapshots of volumes, objects will be snapshot-ed, perhaps automatically. This is object versioning. All objects need not be versioned, each individual object can tell if it is versioned.

How are files/volumes protected from a disaster? Typically, in a Disaster Recovery(DR) setup, entire volumes/volume-sets are setup for replication to a DR site. Again, this does not bother whether individual files want to be replicated or not. The unit of disaster protection is the volume. Files are small fry.

In an object storage system, DR is not volume centric. Object metadata can decide how many copies should exist and where(geo locations/fault domains).

Similarly for other features:

  1. Tiering - Objects placed in storage tiers/classes based on its metadata independent of other unrelated objects.

  2. Life - Objects move between tiers, change the number of copies, etc, individually, instead of as a group.

  3. Authentication - Individual objects can get authenticated from different authentication domains if required.

As you can see, the change in thinking is that in an object store, everything is about an object.

Contrast this with the traditional way of thinking about and management and access larger containers like volumes(containing files) is not object storage.

The features above and their object-centric-ness fits well with the requirements of unstructured data and hence the interest.

If a storage system is object(or file) centric instead of volume centric in its thinking, (irrespective of the access protocol or the scale,) it is an object storage system.

查看更多
ら.Afraid
3楼-- · 2019-03-08 04:38

There are some very fundamental differences between File Storage and Object Storage.

File storage presents itself as a file system hierarchy with directories, sub-directories and files. It is great and works beautifully when the number of files is not very large. It also works well when you know exactly where your files are stored.

Object storage, on the other hand, typically presents itself via. a RESTful API. There is no concept of a file system. Instead, an application would save a object (files + additional metadata) to the object store via. the PUT API and the object storage would save the object somewhere in the system. The object storage platform would give the application a unique key (analogous to a valet ticket) for that object which the application would store in the application database. If an application wanted to fetch that object, all they would need to do is give the key as part of the GET API and the object would be fetched by the object storage.

Hope this is now clear.

查看更多
【Aperson】
4楼-- · 2019-03-08 04:40

Most companies with object based solutions have a mix of block/file/object storage chosen based on performance/cost reqs.

From a use case perspective:

Ultimately object storage was created to address unstructured data which is growing explosively, far quicker than structured data.

For example, if a database is structured data, unstructured would be a word doc or PDF.

How do you search 1 billion PDFs in a file system? (if it could even store that many in the first place).

How quickly could you search just the metadata of 1 billion files?

Object storage is currently used more for long term or archival, cheap and deep storage, that keeps track of more detail of what that data is. This metadata becomes very powerful when searching or mining very large data sets. Sometimes you can get what you need from the metadata without even accessing the data itself. Object storage solutions can typically replicate automatically with geographic failover built-in.

The problem is that application would have to be re-written to use object access methods rather than file hierarchy (which is simpler from a app dev perspective). It's really a change in the philosophy of data storage, and storing more actionable information about that data from a management standpoint as well as usage.

Quick example might be an MRI scan image. On Filesystem you have owner/creation date, but not much else. If it were an object, all of the information surrounding the MRI could be stored along with it in metadata, like patient name, MRI center location, the requesting Dr., insurance carrier, etc.

Block/file are more well suited for local access or OTLP where performance is more important than retention and cost.

For example, you would not want to wait minutes for a Word doc to open, but you could wait a few minutes for a data mining/business intelligence process to complete.

Another example would be a legal search where you have to search everything from 5 years ago to present. With retention policies in place to decrease the active data set and cost, how would you even do that without restoring from tape?

Object storage is a great solution for replacing long term archival methods like tape.

Setting up replication and failover for block and file can get very expensive in the enterprise and usually requires very expensive software and services.

Note: At the lower level, object storage access happens via the RESTful API which is more like a web request than accessing a file at the end of a path.

查看更多
做个烂人
5楼-- · 2019-03-08 04:40
再贱就再见
6楼-- · 2019-03-08 04:44

The simple answer is that object accessed storage systems or services utilize APIs and other object access methods for storing, retrieving and looking up data as opposed to traditional file or NAS. For example with file or NAS, you access storage using NFS (Network File System) or CIFS (e.g. windows file share) aka SMB aka SAMBA where the file has a name/handle with associated meta data determined by the file system.

The meta data includes info about create, access, modified and other dates, permissions, security, application or file type, or other attributes. Files are limited by the file system in terms of their size, as well as the number of files per file system. Likewise, file systems are limited by their total or aggregate size in terms of space capacity and the number of files in the filesystem.

Object access is different in that while file or NAS front-end or gateways or plugins are available for many solutions or services, primary access is via an API where an object can be of arbitrary size (up to the maximum of the object system) along with variable sized meta data (depends on the object system/service implementation). With most object storage systems/services you can specify anywhere from a few Kbytes of user defined meta data or GBytes. What would you use GBytes of meta data for? How about in addition to normal info, adding more data for policies, managements, where other copies are located, thumbnails or small previews of videos, audio, etc.

Some examples of object access APIs or interfaces include Amazon Web Services (AWS) simple storage services (S3) or other HTTP and REST based ones, SNIA CDMI. Different solutions will also support IOS (e.g. iphone/ipad) access, SOAP, Torrent, WebDav, JSON, XAM among others plus NFS/CIFS. In addition many of the object storage systems or services support programmatic bindings for python among others. The APIs allow you to essentially open a stream and then get or put, list and other functions supported by the API/system to determine how you will use it.

For example, I use both Rackspace Cloud files and Amazon S3 (in addition to EBS and Glacier) for backing up, storing, and archiving data. I can access the objects stored via a web browser or tools including Jungle disk (JD) which is what I backup and synchronize files with. JD handles the object management and moves data to both Rackspace as well as Amazon for me. If I were inclined, I could also do some programming using the APIs and then directly access either of those sites supplying my security credentials to do things with my stored objects.

Here is a link to object and cloud storage primer from a session I did in Holland last year that has some simple examples of objects and access. http://storageio.com/DownloadItems/Nijkerk_Nov2012/SIO_IndustryTrends_CloudObjectStorage.pdf

Using the programmatic binding, you would define your data structures or objects in your program and then use the APIs or calls for storing, retrieving, listing of data, meta data access etc. If there is a particular object storage system, software or service that you are looking to work with or need to know how to program to, go to their site and you should find their SDK or API info with examples. With objects, once you create your initial bucket or container on a service or with a product/system, you then simply create and store additional objects as you go.

Here is a link as an example to AWS S3 API/programming: http://docs.aws.amazon.com/AmazonS3/latest/API/IntroductionAPI.html

In theory object storage systems are talked about has having unlimited numbers of objects, or object size, in reality, most systems, solutions, software or services are limited by what they have either tested or currently support, which can be billions of objects, with objects sizes of 5GByte or larger. Pay attention to the limits on specific services or products as to what is actually tested, supported vs. what is architecturally possible or what is implemented on webex or powerpoint.

Again its very service and product/service/software dependent as to the number of objects, size of the objects, size of meta data, and amount of data that can be moved in/out via their APIs. However, it is generally safe to assume that object storage can be much more scalable (depending on implementation) than file systems (without using global name space, federation, file virtualization or other techniques).

Also in my book Cloud and Virtual Data Storage Networking (CRC Press) that is Intel Recommended Reading, you will find more information about cloud and object storage.

I will be adding more related material to www.objectstorage.us soon.

Cheers gs

查看更多
啃猪蹄的小仙女
7楼-- · 2019-03-08 04:52

I think the white paper explains the idea of object storage quite well. I am not aware of any standard way to use object storage devices (in the sense of a SCSI OSD) from a user application.

Object storage is in use in some large scale storage products like the storage appliances of Panasas. However, these appliances then export a file system to the end user. It is IMHO fair to say that the T10 OSD idea never really caught momentum.

Related ideas to the OSD standard can be found in cloud storage systems like S3 and RADOS.

查看更多
登录 后发表回答