Know Your Storage: Block, File & Object
Dealing with the tremendous amount of data generated today presents a big challenge for companies who create or consume such data. It’s a challenge for tech companies that are dealing with related storage issues.
“Data is growing exponentially each year, and we find that the majority of data growth is due to increased consumption and industries adopting transformational projects to expand value. Certainly, the Internet of Things (IoT) has contributed greatly to data growth, but the key challenge for software-defined storage is how to address the use cases associated with data growth,” said Michael St. Jean, principal product marketing manager, Red Hat Storage.
Every challenge is an opportunity. “The deluge of data being generated by old and new sources today is certainly presenting us with opportunities to meet our customers escalating needs in the areas of scale, performance, resiliency, and governance,” said Tad Brockway, General Manager for Azure Storage, Media and Edge.
Trinity of modern software-defined storage
There are three different kinds of storage solutions -- block, file, and object -- each serving a different purpose while working with the others.
Block storage is the oldest form of data storage, where data is stored in fixed-length blocks or chunks of data. Block storage is used in enterprise storage environments and usually is accessed using Fibre Channel or iSCSI interface. “Block storage requires an application to map where the data is stored on the storage device,” according to SUSE’s Larry Morris, Sr. Product Manager, Software Defined Storage.
Block storage is virtualized in storage area network and software defined storage systems, which are abstracted logical devices that reside on a shared hardware infrastructure and are created and presented to the host operating system of a server, virtual server, or hypervisor via protocols like SCSI, SATA, SAS, FCP, FCoE, or iSCSI.
“Block storage splits a single storage volume (like a virtual or cloud storage node, or a good old fashioned hard disk) into individual instances known as blocks,” said St. Jean.
Each block exists independently and can be formatted with its own data transfer protocol and operating system — giving users complete configuration autonomy. Because block storage systems aren’t burdened with the same investigative file-finding duties as the file storage systems, block storage is a faster storage system. Pairing that speed with configuration flexibility makes block storage ideal for raw server storage or rich media databases.
Block storage can be used to host operating systems, applications, databases, entire virtual machines and containers. Traditionally, block storage can only be accessed by individual machine, or machines in a cluster, to which it has been presented.
File-based storage uses a filesystem to map where the data is stored on the storage device. It’s a dominant technology used on direct- and networked-attached storage system, and it takes care of two things: organizing data and representing it to users. “With file storage, data is arranged on the server side in the exact same format as the clients see it. This allows the user to request a file by some unique identifier — like a name, location, or URL — which is communicated to the storage system using specific data transfer protocols,” said St. Jean.
The result is a type of hierarchical file structure that can be navigated from top to bottom. File storage is layered on top of block storage, allowing users to see and access data as files and folders, but restricting access to the blocks that stand up those files and folders.
“File storage is typically represented by shared filesystems like NFS and CIFS/SMB that can be accessed by many servers over an IP network. Access can be controlled at a file, directory, and export level via user and group permissions. File storage can be used to store files needed by multiple users and machines, application binaries, databases, virtual machines, and can be used by containers,” explained Brockway.
Object storage is the newest form of data storage, and it provides a repository for unstructured data which separates the content from the indexing and allows the concatenation of multiple files into an object. An object is a piece of data paired with any associated metadata that provides context about the bytes contained within the object (things like how old or big the data is). Those two things together — the data and metadata — make an object.
One advantage of object storage is the unique identifier associated with each piece of data. Accessing the data involves using the unique identifier and does not require the application or user to know where the data is actually stored. Object data is accessed through APIs.
“The data stored in objects is uncompressed and unencrypted, and the objects themselves are arranged in object stores (a central repository filled with many other objects) or containers (a package that contains all of the files an application needs to run). Objects, object stores, and containers are very flat in nature — compared to the hierarchical structure of file storage systems — which allow them to be accessed very quickly at huge scale,” explained St. Jean.
Object stores can scale to many petabytes to accommodate the largest datasets and are a great choice for images, audio, video, logs, backups, and data used by analytics services.
Now you know about the various types of storage and how they are used. Stay tuned to learn more about software-defined storage as we examine the topic in the future.
Join us at Open Source Summit + Embedded Linux Conference Europe in Edinburgh, UK on October 22-24, 2018, for 100+ sessions on Linux, Cloud, Containers, AI, Community, and more.