SanDisk, in a joint venture with Toshiba, sells almost half of the world’s flash technology but enterprise storage users can be forgiven if they don’t associate SanDisk with flash storage systems. Most of what SanDisk sells goes into mobile devices. You probably have flash memory in your smartphone and don’t even realize it, or at least don’t think about it. Going forward, SanDisk is determined to enter the systems business too, noted Allen Samuels, chief software architect of SanDisk’s emerging storage solutions, in his keynote at the Linux Foundation’s Vault storage conference this week in Boston.
Attracting the company to storage are a number of factors starting with the explosion of data. Everybody knows the volume of data is expanding. But what many don’t realize is that it, in part, is changing the way enterprises build data centers and deploy servers, storage, and software.
Data center planners no longer view this expanding volume as a data processing, computational problem as they did in the past. Rather, it has become a “problem of moving data through the data center,” said Samuels. How quickly you can capture it, get it to the systems that need it, and protect the data has become the mission of today’s data center. The computational part turns out to be easy by comparison. As a result, scalability has become today’s byword, and the key to scalability is parallelism, he continued.
The explosion of data also is driving data center planners to rethink the essential components of the data center itself. “You need to rethink how you build servers. Think rack scale,” he advised. And not just the usual rack; he suggested a simplified disaggregated rack characterized by shared I/O and clustered systems.
And while you are at it also rethink the fabric that connects the various pieces. The goal, according to Samuels, should be the independent scaling of compute, memory, storage, and networking.
Flash will drive storage
For storage you will need to shift your thinking to object stores, scale-out block storage, NoSQL/KV store, and the Hadoop file system. Certainly don’t count on conventional hard drives.
“The performance-oriented hard drive is vanishing,” he noted. Those are the costly 15K RPM devices aggregated in the hundreds or thousands to bring more spindles to the IOPS challenge, or the degradation of hard drive efficiency that comes with increasing I/O. Instead, capacity optimized hard disk drives (HDD) (shingled magnetic recording, Cloud drives) will move to the fore where you still need HDD.
Flash, however, will drive storage architectures as flash cost reductions continue and performance, reliability, and durability continuously improve. It easily overcomes the IOPS challenge. Expect flash also to cannibalize primary storage to take over enterprise storage. HDD will be relegated to data retention and data protection. The big challenge will revolve around balancing and orchestrating it all.
Samuels then turned his attention to new caching and tiering solutions. Heterogeneous replication, for example, leaves one copy on flash for fast computation (computation isn’t being completely overlooked). Additional copies will reside on low-cost HDD for protection, both local and remote. Similarly flash erasure coding can substantially reduce storage overhead compared to HDD RAID.
In the end organizations want open source scalable software storage solutions but current scale-out OSS software is cost-inefficient on flash. This results in a performance gap, which translates into significant expense at data center scale. For now proprietary and in-house solutions are filling the gap, Samuels noted. SanDisk, he added, is bringing its own newly announced Ceph-based solution that can achieve a 7-10x improvement for block reads, according to internal testing.
The flash-HDD cost comparisons are past. Nobody is focusing on cost per gigabyte when performance and IOPS are what counts. Concluded Samuels: “Flash is primary storage today.” And it is unlikely to change in the foreseeable future.