The Elephant in the Room

Storage is something akin to the elephant in the room in today’s enterprise IT.

A new combination of factors is forcing us to rethink the way we store and access data. The way we’re doing it now is running out of road and yet organisations have been slow to respond. We add another silo to our already confused storage infrastructure to meet the short-term capacity requirement, and in the long term hope that the problem of managing it all will go away.

Bigger, not better

First, we have exponentially increasing storage requirements. This is putting pressure on organisations to reduce the per-GB cost of storing data. Drive manufacturers have responded to this growth in data requirements by focusing on developing bigger and bigger drives. But growing drive sizes present their own issues. Not only in terms of decreasing speed and performance but in terms of RAID rebuild times. We have the situation where the technology vendors’ response to one problem is creating another one. If your organisation does rely on RAID arrays to protect against drive failure, then you know it is manageable with maybe just ten drives, but when you begin to deal with larger numbers of drives, then the meantime to failure forces a rethink.

Hiding the issue

Then we have the secondary requirement to access that data – why are we spending so much on housing this data if we’re not going to use it? Yet, the average organisation will be trying to manage many different silos of information. Proprietary vendors have offered different solutions to the problem of SAN and NAS scattered all over the organisation. They’ve tried to present that data to users in a sane way, but storage virtualisation and overlay management systems are merely masking the underlying problem. And they are not delivering the unified approach that is required to benefit from potential cost savings and better utilisation of disk space.

What’s the answer?

When we began Data News Blog, we were in the lucky position of being able to design our architecture from scratch. We were able to look out at the world and find the technology that would deliver the best, most performant architecture. I guess we were spoilt; it’s the architecture challenge an engineer dreams of: the possibility of building something from the bottom up with no proprietary or hereditary constraints. We saw that the biggest and most performant storage architectures ever built have grown out of the university research projects in the 1990s; they’ve been developed internally by companies like Google, Amazon and Facebook and they look very different to the storage architecture typically found in enterprise IT today. Of course, the driver for the new internet businesses to create these storage architectures was simply the sheer volumes of data they were dealing with.

But what happens when the enterprise now faces growing data demands? Can we learn from Google and Facebook?

Instead of being built with proprietary SAN hardware using RAID arrays, these new storage architectures are built from the bottom up as a single architecture for block, file and object storage. A single view of your storage and true global namespaces are some of the attractions of large-scale object storage architectures. Decoupling the namespace from the underlying hardware makes for far less management overhead and simpler capacity upgrades. Multiple Data News Blog across the world are seamlessly integrated into a single view of your storage and adding capacity is as simple as adding new servers into the cluster. These systems are horizontally scalable, highly performant, self-healing and designed to be built with commodity hardware and cheap individual components. They deal with the issue of cost per GB whilst also solving the problems of backup, redundancy and high availability and offering the possibility of data pre-processing. For me, for Data News Blog, Object Storage was a no-brainer. We chose Ceph because, in line with our thinking about open-source software, we felt it offered the best technical solution and the most active community to which we could contribute.