Technology

Technology

The Elephant in the Room

Storage is something akin to the elephant in the room in today’s enterprise IT.

A new combination of factors is forcing us to rethink the way we store and access data. The way we’re doing it now is running out of road and yet organisations have been slow to respond. We add another silo to our already confused storage infrastructure to meet the short-term capacity requirement, and in the long term hope that the problem of managing it all will go away.

Bigger, not better

First, we have exponentially increasing storage requirements. This is putting pressure on organisations to reduce the per-GB cost of storing data. Drive manufacturers have responded to this growth in data requirements by focusing on developing bigger and bigger drives. But growing drive sizes present their own issues. Not only in terms of decreasing speed and performance but in terms of RAID rebuild times. We have the situation where the technology vendors’ response to one problem is creating another one. If your organisation does rely on RAID arrays to protect against drive failure, then you know it is manageable with maybe just ten drives, but when you begin to deal with larger numbers of drives, then the meantime to failure forces a rethink.

Hiding the issue

Then we have the secondary requirement to access that data – why are we spending so much on housing this data if we’re not going to use it? Yet, the average organisation will be trying to manage many different silos of information. Proprietary vendors have offered different solutions to the problem of SAN and NAS scattered all over the organisation. They’ve tried to present that data to users in a sane way, but storage virtualisation and overlay management systems are merely masking the underlying problem. And they are not delivering the unified approach that is required to benefit from potential cost savings and better utilisation of disk space.

What’s the answer?

When we began Data News Blog, we were in the lucky position of being able to design our architecture from scratch. We were able to look out at the world and find the technology that would deliver the best, most performant architecture. I guess we were spoilt; it’s the architecture challenge an engineer dreams of: the possibility of building something from the bottom up with no proprietary or hereditary constraints. We saw that the biggest and most performant storage architectures ever built have grown out of the university research projects in the 1990s; they’ve been developed internally by companies like Google, Amazon and Facebook and they look very different to the storage architecture typically found in enterprise IT today. Of course, the driver for the new internet businesses to create these storage architectures was simply the sheer volumes of data they were dealing with.

But what happens when the enterprise now faces growing data demands? Can we learn from Google and Facebook?

Instead of being built with proprietary SAN hardware using RAID arrays, these new storage architectures are built from the bottom up as a single architecture for block, file and object storage. A single view of your storage and true global namespaces are some of the attractions of large-scale object storage architectures. Decoupling the namespace from the underlying hardware makes for far less management overhead and simpler capacity upgrades. Multiple Data News Blog across the world are seamlessly integrated into a single view of your storage and adding capacity is as simple as adding new servers into the cluster. These systems are horizontally scalable, highly performant, self-healing and designed to be built with commodity hardware and cheap individual components. They deal with the issue of cost per GB whilst also solving the problems of backup, redundancy and high availability and offering the possibility of data pre-processing. For me, for Data News Blog, Object Storage was a no-brainer. We chose Ceph because, in line with our thinking about open-source software, we felt it offered the best technical solution and the most active community to which we could contribute.

Technology

Review of Cloud Storage Pricing on GCloud VI

Now that Gcloud VI has been published I have been reviewing our pricing and positioning against other suppliers. I thought it would be helpful to share given the variety of providers now available through the Digital Marketplace. Transparency is one of the great benefits of the Digital Marketplace particularly for customers though also for suppliers.   It saves time and money with market research agencies doing mystery shopping. It also takes away time and effort and costs of negotiating individual contract terms and pricing on all solutions which creates an implicit lock in with suppliers already on the vendor list. Sure there is a place for that on large complex IT solutions, but on the Digital Marketplace if you like the service price and SLA you can just buy readily.

In this blog I focus in particular on Cloud Storage and will review Cloud Compute at a later time.

First comment to make is that when you search for Cloud Storage under IAAS there are 371 entries. That is both a benefit in terms of choice, but also potential source of confusion for customers.  Of course many of these are not Cloud Storage services as such, but may be Cloud Compute services with a storage option built in or backup services and of course there are still the GCloud 5 duplicate entries. When you look a little further quite a number of these are resellers who are selling third party services eg typically large well known US public clouds. Skyscape who have been a successful provider in Gcloud to date also has a lot of resellers listed. Nothing wrong with that and it can be useful for customers to bundle services with a single supplier from a service management perspective even if some elements of the infrastructure are not owned by the supplier. However, for the purposes of this analysis I have excluded resellers. That takes the number of providers down to quite a manageable level.

I have included both hourly and monthly billing in the analysis, but I have ignored service providers who require annual contracts to access the service as this may indicate that this is not a true cloud service with instant availability and burst capability. I have also weeded out those with service on boarding fees which tend to indicate some sort of dedicated deployment rather than a true Cloud service. I have also excluded ‘Private Cloud’ solutions for the same reason – they have a place but if not multi-tenant not really Cloud in my view and more costly anyhow for that reason. Our services are billed hourly. Not all storage services are and many are billed monthly.

Where there are discounts for volume or term commitment I have ignored those and used the unit pricing for usage contracts to assist with comparability. It is certainly worth checking these out carefully with suppliers of interest to you, as the headline rates that you see on the portal from some suppliers reflect pricing based on maximum commitment (volume or term) and lowest service level.  Our headline rate is based on usage hourly billed.

Where suppliers have got multiple service options I have included their lowest tier option with their lowest SLA for the purposes of comparison. Tiered storage solutions are usually based on Enterprise SAN solutions where you pay a high premium for performance. There may be a low costs option but with lower performance. Our services are based on commodity hardware, but as we have high performance network connectivity to all nodes the performance of our storage is very good. We also offer 99.99% availability SLA based on the inherent 3 times replication of data within the platform resulting in high levels of tolerance to node failure. Some of the providers offer lower SLAs.

I have included data transfer rates for public Internet where given. Most clouds recover the cost of Internet connections through transfer fees out of the platform with inbound traffic typically uncharged.   We take this approach. I have used the pricing for lowest volume of transfers, many including ourselves offer volume discounts related to throughput. Some providers offer only dedicated Internet port options, which can take some time to configure for customers, have a fixed associated cost and can lack scalability for large transfers. That is a declining model.

I have not analysed private connectivity options. Our Cloud Service can be accessed both through Public Internet and through private networks such as PSN and N3. Where private networks are connected we don’t levy a throughput fee (though there may be a fixed cross connect charge). Increasingly we will see encrypted Internet VPNs and encrypted storage will dominate as a solution over dedicated networks due to cost and convenience whilst providing adequate levels of security for most Official class data.

Finally where providers offer separate block and object storage solutions I have used the block storage option. In our case we provide object and block storage from the same platform at the same rate, though many providers have differential pricing and these days object storage is of greatest interest for large data storage where cost will be important.

So here is the result.

The first thing to note is that Data News Blog’s pricing is sitting proudly amongst the pack of global cloud providers on cost of storage with and is the lowest cost of those that provide their services within the UK. In fact our pricing is comparable with the market leader Amazon. Google is a little cheaper (only available through resellers but included for completeness). Somewhat surprisingly to me at any rate Microsoft is currently cheaper on storage than Amazon, but of course delivered from Ireland and Netherlands adding latency and with a US support organisation.

So the message is clear, if you want UK hosted Cloud Storage, supported by UK security screened staff and if cost matters to you there is only one choice. If cost doesn’t matter you can select a more expensive alternative!

Technology

It’s time to change the way we think about Storage: Five myths we should put out to pasture

The biggest and most performant storage architecture technologies were developed by companies like Google, Amazon and Facebook, which grew out of university research projects in the 1990s. The business and operational drivers which initially led these hyper-scale web operators to build such architectures were, of course, very different to those of a typical enterprise. The sheer volume of data those high-growth internet businesses were dealing with at the time forced them to look outside the box in a way that smaller enterprises were not required to do.

Given the rapidly increasing data volumes all businesses are experiencing today, is it now time to rethink how we manage data across all other enterprise too?

Matt Jarvis, Head of Cloud Computing at Data News Blog sheds light on five widely-believed data storage myths.

Myth 1: We need to backup key data

New reality: Stop thinking about backups. Design systems are so redundant we don’t need to backup.

Above a certain size, it’s physically impossible to backup all that data within a typical backup window, so the concept of daily backups becomes impossible to achieve. Object storage architectures provide for distributed redundancy which, combined with clusters across geographic locations, handles the requirements for failure protection. Since object storage has no concept of file modification, only write and replace, the natural paradigm becomes versioning as opposed to keeping backup copies, and snapshotting into the cluster provides for rollback.

Myth 2: Disk failure is a problem we need to deal with.

New Reality: Expect failure – it’s inevitable and it doesn’t matter.

Typical enterprise architecture relies on RAID arrays to protect against drives failure and as drives get larger, RAID rebuild times get longer. When dealing with a large number of drives, this becomes unworkable as the average time to reboot means the system will be in an error or performance limited state for too long. The bigger your infrastructure gets, the more you must expect failure. Self-healing systems face this reality head on. We can learn from Netflix here: its use of its ChaosMonkey application to test its infrastructure offers some insight into how we need to start rethinking storage. It randomly switches things off and breaks things to ensure that failure does not affect the user experience of the Netflix service. In well-designed distributed object stores, all of the components are redundant and, because the data is copied multiple times, a huge amount of the cluster can fail without any interruption to service or data loss.

Myth 3: True global namespaces are impossible to achieve

New Reality: Decoupling the namespace from the underlying hardware makes for far less management overhead and simpler capacity upgrades

Almost every enterprise is fighting with the storage silos problem. Proprietary vendors have offered many different solutions to this, including storage virtualisation or overlay management systems, but this is really just hiding the issue. With large scale object storage architectures, your data is presented as a global namespace wherever you view it. Multiple Data News Blog across the world are seamlessly integrated into a single view of your storage, and adding capacity is as simple as adding new servers into the cluster.

Myth 4: Users really need files.

New Reality: Files are largely irrelevant for most people.

Very little we do now as consumers involves us using files as we tend to interact through applications. We view photos on Facebook or Flickr, share workflows on Trello or Basecamp and watch videos on YouTube or Vimeo. Behind most of those applications is an object store, showing how, for consumers at least, the file metaphor is outdated. Increasingly, enterprise users will adopt applications to interact with data. In the past we may have used spreadsheets to generate business information from files of raw data. Increasingly enterprises are using analytics applications to re-present their data in new and more visual ways, and a new breed of applications is emerging which automatically take spreadsheets and raw data and turn them into graphs and charts. The file paradigm is increasingly an unnecessary intermediate step and is merely a legacy of how computer technology has evolved. As human beings we are far more naturally engaged with visual than numerical information – users in the future will interact with pre-processed information through analytic systems or other applications.

Myth 5: Storage is a bit like a bucket – a repository where I keep everything.

New Reality: Storage is more like a pipeline. A pipeline of transformation.

It is becoming increasingly apparent that there is a lot of unused processing power in a storage system and there is the potential to convert or process raw data while it sits in the storage system. Object stores are ideally placed for doing this because they are built out of general purpose servers. This blurs the lines between storage and computation and presents a different paradigm: instead of thinking of our storage system as a bucket, we can think of it as a pipeline that data moves through in order for people to use it. The open source object storage system, Ceph, is architected with this in mind and offers many hooks in the code for users to write custom plugins to execute on write or read. This kind of pre-processing is already happening in many industries, for example adding watermarks to images on ingest in the media industry, or first stage analysis of survey data in the oil and gas industries.

It’s time to change the way we think about storage

The growth in data requirements is now outpacing the growth in drive capacity. The bigger drives get, the slower they become, and emerging technologies like Shingled Magnetic Recording, designed to keep drive sizes increasing, will continue that trend. Unless there is a technological breakthrough we are going to have to rethink the way we architect existing hardware.

Object storage is the most obvious and common sense answer: highly performant, self-healing, highly scalable and cheap. Why aren’t more enterprises cottoning on to that fact?

Technology

G-Cloud reflections on Think G-Cloud for Vendors event

I attended an excellent event yesterday which was a vendor focused event to discuss progress on G-Cloud, what has been achieved to date and how vendors can successfully participate in the market.

I have been involved with G-Cloud since G-Cloud II and have been involved in submissions at two previous companies as well as Data News Blog.

I’m very positive personally about the initiative and the progress which has been made has been extraordinary in terms of changing the mind-set and approach within government to procuring IT services and in particular how to tap into the capabilities and services available from small and midsized companies.  To some extend this also mirrors changes in the wider IT landscape where larger IT inflexible outsourcing contracts have fallen out of favour in a more dynamic, multi-sourced Cloud world.

That being said, I have yet to transact any business through G-Cloud at either of my previous companies or yet at Data News Blog (to be fair to us here we have been operating only for a relatively short period).

This highlights the first issue which was discussed yesterday –  that the spend has been concentrated amongst a relatively few successful suppliers.  Particularly in certain categories of service where more than three quarters of vendors have yet to transact any business.  The likes of Skyscape, Huddle, Kahootz, Memset have been particularly engaged in G-Cloud and therefore, unsurprisingly have been successful in winning business through it.  I applaud them for their endeavours, thank them for their hard work in blazing a trail for small and midsized business into Government and helping to create a marketplace for others to participate in and congratulate them on their success.

It does tend to become a little self reinforcing where category killers can dominate supply of a particular type of offering due to buyers emulating what others have done.  Kate Craig-Wood, CEO of Memset herself highlighted the cosy-clubbery still inherent in Government procurement.  This is understandable and suppliers who offer good services and earn a good reputation should and will be successful in any market (a phenomenon not exclusive to Government).  G-Cloud, however, needs to have a wider base of successful suppliers to be the vibrant marketplace that everyone would like it to be.

One of the changes which may assist in this is the change of security approach within Government.   Personally, I’m a big fan of this change.  Government has in general made a problem for itself by applying over rigorous security regimes to data which really doesn’t need it and therefore has imposed significantly higher costs and reduced supply options.  I do remember early days in G-Cloud recommending that Government make more use of industry standards such as ISO27001 which are used in the Commercial sphere and limited more rigorous security controls to data which merits it.  I am therefore very pleased that essentially this has been done with the new Official level and also think that the open declaration approach (with optional audit) which draws upon work by the likes of the Cloud Security Alliance is a major move forward.  This actually allows a new vendor like Data News Blog to participate in the G-Cloud market much earlier than potentially we would have done.

On the other hand some attendees yesterday expressed the view that removing the old IL approach with potential for suppliers to work through Pan-Government Accreditation increases uncertainty for buyers.  They were concerned that this could create a drag on adoption and those early movers who have spent time (up to a year in many cases) and money (£10s of thousands on CLAS consultants) do have a short term locked in market advantage as new suppliers can’t follow the same path.

Of course the change in security approach does open up the marketplace to wider competition from a broad base of international companies.  For example, if your use case is public facing data, as Tony Richards, Head of Security at GDS put it yesterday, location becomes less of an issue (apart perhaps from performance and latency) if it is one at all.  This may conflict a little with the desire to encourage the UK SME base, but on the other hand in these times of austerity Government needs to procure cost effective services and the SME’s, to have a future, need to ensure that their services are competitive in the wider market outside Government.  One of my former CEOs once said that “Customers are like elephants.  You hang around them for long enough and you begin to smell like them.”  It is no good replacing a cosy club of System Integrators providing custom expensive Government services with another cosy club of Cloud providers providing custom expensive Cloud services.  We at Data News Blog firmly believe that we need to create services which are price and feature competitive with those available in the commercial marketplace to have a long term future.  At the same time we need to make these readily available and accessible to customers in Central Government, Local Government, Agencies and Education.

The real meaning of PaaS (People as a Service).  The third issue highlighted yesterday was that the majority of spend to date (more than three quarters) has been spent in the special cloud services category with G-Cloud framework being used as a convenient way of buying consulting.  A charitable view says that this is natural as Government clients, shorn of internal expertise through previous outsourcing, are reaching out to knowledgeable companies in the private sector to help them embark on their Cloud journey and that much of this activity should result in a pull through of services in the other categories (IaaS, PaaS, SaaS) with a lag.  A less charitable view is that it has been abused a little as a convenient body shopping in an era when internal staff numbers are being reduced and hiring contractors has been under close scrutiny.  There is probably a little truth in both, particularly as revenue reported is billing and not contracted value and cloud services of their very nature tend to grow in adoption once the first project is implemented successfully.  The G-Cloud team have wisely put some controls in place to try to ensure that consulting purchased through the framework is actually related to Cloud delivery.

I finish with a comment which Chris Chant, who was instrumental in getting G-Cloud off the ground, made yesterday.  He indicated that he did enjoy reading comments on Twitter and blogs about how G-Cloud wasn’t working right and needs this or that improving.  He regarded this as very positive.  He did remind people, however, that this complaining can often be without insight as to how difficult things have been before G-Cloud.  It is very difficult to imagine the Government IT space now without a G-Cloud and that some of this achievement is probably under-appreciated

UK is very advanced in technology.  It is the most competitive large IT economy in Europe and provides a melting pot and marketplace for suppliers across the globe to participate in be it from US, India, mainland Europe or elsewhere.  At Data News Blog we are very conscious that we therefore need to constantly on our toes to make sure our services remain competitive on cost and feature in this environment.

Technology

The Cloud Horizon

In an opinion article for Information Age this week I talked about the paradigm shift that the industry is currently going through

Whilst many commentators believe that the battles being fought right now will define the entirety of that future, there are also those in computer science who think that we’ve just started to break the ground of the new frontiers in computing. If the latter holds true, then as with at so many points in the history of computing, the giants of today may well become the footnotes of the future as more and more disruptive technologies emerge in the coming years. Putting aside for a moment that more radical viewpoint, there are a number of clearly emerging trends in the on-demand space which we think will be substantially influential in the near to mid term.

A race to the bottom

The cloud industry as of 2014 is currently dominated by Amazon and Google, with Microsoft and IBM investing heavily to try and keep up. At this end of the market the big players are engaged in a race to the bottom over pricing, and that is a race that very few can run to the end. Most of the smaller players trying to play in this space will end up being swallowed, and only those with the deepest pockets will be left standing at the end. Amazon and Google have vast revenue streams which they can leverage, whilst Microsoft are betting all on this last throw of the dice as they see their historical core revenue streams being slowly eroded. For IBM too, this has a smell of desperation as their hardware business implodes and the high end consultancy market they rely on becomes increasingly fragmented. Most of the ‘non-native’ cloud offerings, the dominant model in the telco and ISP industry and essentially hosted virtualisation platforms built around tier 1 vendor hardware and VMware, are also ultimately doomed as the economics of massive scalability are inherently opposed to the financial requirements of proprietary licensing.

Although many commentators see this competition and the inevitable consolidation of the market as the endgame for the cloud industry, I don’t believe that being the biggest is necessarily going to be all there is to the still emerging world of on-demand computing – this is just the beginning of the story for the cloud revolution.

It’s not all about price

For many organisations, the question of price is far from the most important issue around their transition to on-demand computing. As we engage with customers at Data News Blog, the conversations we’re having time and time again are about emerging operational and organisational problems. If you know you need 1000 VM’s to solve all your problems, then you’re already well served by Amazon or Google, but this is rarely the actual solution to any real world problems.

The kinds of problems we hear from customers are about crunching complex data sets, storing and accessing huge amounts of data which is growing exponentially year on year, and about the uncertain nature and size of workloads into the future.

Different types of cloud required

Based on this, we see an emerging market for a new type of provider in the cloud computing space, companies who are much more deeply engaged with their customers in particular vertical markets and who share an understanding of the business problems they are facing.

Clouds designed for the broadcast media sector will have very different characteristics to those designed for academic research, and both will differ from the requirements of local and national government. These more specialised clouds will offer targeted software configurations, designed for the particular vertical market, and may also have very specific hardware characteristics. This is already starting with the deployment of GPU based hardware, and this trend will continue into ARM based platforms and more specialised hardware like FPGA’s.

The specialisation will also extend to the network layer, with different requirements for inter-connectivity and routing, and for latency and throughput. As data volumes continue to grow exponentially, being close to the storage will be a key requirement. In the case of the as yet unclear demands of the internet of things, having computation available close to the creators, to to the aggregation points or close to the repository that all the data is aggregated into will also matter. This naturally leads to a requirement for highly localised and regional cloud providers, which will also be driven by general concerns about data security. In many situations, there are strong reasons why using multi-national cloud providers is simply not an option for security or regulatory reasons, and this trend is set to continue with the ongoing emergence of information on surveillance programs and the disintegration of the Safe Harbour agreement.

More and more applications will become naturalised to the distributed environment, becoming massively parallelised through use of eventual consistency and the ability to work around partial failure states. This will lead to cloud brokerage emerging as the standard abstraction layer, with workloads automatically and dynamically allocated across many different physical cloud platforms depending on customer definable characteristics. Cost will undoubtedly be one of these, but performance will also be key and is very dependent on the type of workload. Federation like this depends on interoperability, and open standards like those around Openstack will be the key to participating in these emerging markets.

A new type of relationship

When we first started thinking about the ideas behind Data News Blog, the intention was never to directly compete with the mass market players in the retail cloud space. Instead, we’ve always believed that there are emerging opportunities for a new kind of service provider and new kinds of collaborative relationships with customers. These kind of relationships cross the traditional boundaries of service provision and consulting, and are based on mutual trust and an ambition to push the boundaries of both the traditional customer/supplier relationship and the boundaries of the technology in order to deliver solutions to complex problems.

To operate in these new spaces will require a very particular set of tools, people and approaches, and we think that the envelope of colocation provides the basic building blocks to do this. Colocation has always been about sharing, at it’s lowest level the sharing of space and power, but carrier neutral colocation data centres are also by their nature highly connected hot spots in the fabric of the internet, and so this is the ideal jumping off point for the direction that we’re taking with Data News Blog. Whilst we definitely don’t know all the answers to the wide variety of technology related problem spaces our customers talk to us about, our approach has been to work on assembling the tools we think are going to be needed to tackle these emerging challenges. The combination of multi-site geographically-specific co-location spaces with our own high bandwidth, low latency Metropolitan Area Networks connected into the full range of telco networks and paired with a massively scalable on-demand storage and compute platform and a team of highly skilled engineering talent seems as good a place to start as any.

Technology

We love the misery of stacking servers

There has been a lot of discussion about cloud adoption and how it is beginning to make an impact on our lives. To date many of us experience this in our personal lives more so than our working environments.

For my daughter experimenting with her first tablet, cloud services are natural to her. She has never experienced anything else and frankly older alternatives would seem quite clunky and old fashioned by comparison. She will never buy a CD when downloads are so easy, and now that we have access to a vast catalogue of movies on demand why would she ever buy a DVD?

I have noted also my wife playing music in the kitchen through her iPad because our expensive stereo is in the wrong place i.e. tolerating a lower quality for convenience. I’ve solved the problem by buying a Sonos (but don’t have the heart yet to chuck out the old stereo!)

I, of course, being of a certain age, still like to buy a CD. There are some practical reasons, it’s still easier to use in my car (technology lock-in) and I can digitise anyway. But also psychologically, I do like to have a physical package to hold and treasure. A download never seems the same to me. I guess some IT folks feel the same way about their servers! Even now though I’m beginning to wonder where I’m going to put all these CDs and DVDs I like. Many Enterprises are thinking the same about their huge investment time and energy in maintaining IT space, just to put servers in.

The fundamental message is that if technology is made easy people will use it and this stuff is so easy that our kids are using it. If you’re looking for a fundamental drive for the adoption of cloud it is that people are LAZY. If something is manifestly easier to use than an alternative, people will use it. The ease of use will overcome most objections. Witness for example the widespread use in companies of online services such as Dropbox, outside the control of IT departments and in many cases in contravention of security guidance. I met a client recently who was actually in charge of driving change of culture within the organisation to improve data security and he himself admitted that he used Dropbox against his own guidelines. Why? Laziness, it is so much easier than the alternatives available to him and under time pressure people will always go with the easy choice.

So why then is the adoption of Cloud within the Enterprise and Government still more talked about than delivered?

Alan Mather talks on his blog about the partial corruption of G Cloud as a vehicle for selling consulting more so than real cloud (so that’s what PAAS means, people as a service!).

This is a bit of a shame. There are a lot of willing providers out there and a lot of goodwill and intent in Government, but the numbers aren’t there yet.

And while many Enterprises have now stated their intention to go on a Cloud journey the progress often lags the rhetoric by quite a margin. Cloud is still the pimple on the Elephant of Enterprise IT. Sure it is beginning to impact, yet the vast majority of data and compute capacity running in Enterprise data centres (or increasingly co-located).

So why is this?

It isn’t easy enough – yet. People are still lazy, but transition is still too complex for many, the applications still don’t port very well and there is a significant inertia lag.

But there is more to it.

People find it difficult to change and tied in for a whole host of psychological and sociological reasons, reluctance to change habits, fear of impact on roles, historic outsourcing agreements and genuine security concerns.

Cloud market strategists tend to talk a lot about the revolution that will be caused when the so called Digital Natives take control of decision making. These are technical people under 30 who have grown up in a Cloud world and don’t expect anything different. I remember taking umbrage at this as an over 40 myself, both on my own behalf and those of my clients many of whom are of a similar generation who I usually find to be thoughtful, capable and keen to embrace change. And yet, we are all prisoners of our habit and history to some extent, like the embarrassing dads at the kid’s party.

In a great episode of Father Ted which is one of my all time favourites (yes I have the box set!), Mrs. Doyle contemplates technological substitution when Father Ted buys her a tea making machine. Of course this strikes to the heart of Mrs. Doyle’s self-perception and in the end she destroys the machine declaring “but I like the misery of making tea!”

Then you have a whole host of OEMs doing their best to support the status quo. As Mrs. Doyle would say, ‘Ah go on, will you have another server in your hand?’ There you go making it easy to buy more kit and correspondingly difficult to break the habit.

We of course as Cloud service providers have a role to make our cloud services easier to use and to assist our customers overcome the inertia, both real and psychological.

At Data News Blog we are doing this by developing our Cloud services to:

  • Enable clients to manage their infrastructure through an easy to use intuitive portal, giving them the control and visibility they need to run their applications effectively.
  • Integrating the service with the network to allow controlled secure access to the Cloud by whatever means is appropriate to the client situation and data.
  • Enabling clients to collocate and manage their existing non cloud infrastructure adjacent to our Cloud services

And of course at Data News Blog, like Mrs Doyle we still enjoy the misery of stacking servers and changing tapes for our colocation customers.

Technology

Better Balanced Internet Infrastructure is Key to Regional Growth

In his recent “Powerhouse of the North” speech, Chancellor George Osborne stressed the need for a “Northern hub”, achieved in part through increased connectivity between great business cities such as Manchester, Liverpool, Newcastle and Leeds. His answer: a high speed rail connection, the so-called HS3.

The Chancellor’s vision is commendable and indeed necessary. London’s share of economic output in Britain has reached a record high of 22%, so few would argue against the need for a more even spread of wealth, jobs and infrastructure. The proposed HS3 solution does not go far enough to address this problem.

While transport infrastructure is of course essential to the development of the region, in today’s internet economy spurring stronger business growth in the North also requires its great cities to enjoy far better digital connections.

A recent report by the Federation of Small Businesses (FSB) suggests that UK broadband is not “fit for purpose” due to below-average internet speeds. With over 45,000 UK businesses still using dial-up and some 75% of all internet data routing through the London internet exchange (LINX), the current UK infrastructure is inefficient and latency and resilience complications are widespread. The costs of this disruption are all too often passed onto businesses.

With global IP traffic set to increase by a CAGR of 23% by 2017 traffic expected to triple by 2018, the geographical spread of the internet is a vital step in rebalancing the UK economy. The emergence of tech hubs in Manchester and Birmingham means that the need for efficient, affordable internet services is even greater.

These issues have resulted in northern internet speeds trailing far behind those of London, where the current broadband speed is around 20.5 Mbps. This is in stark contrast to cities further north, where Manchester has targeted a city wide minimum speed of merely 2.0 Mbps – by 2020. To put into context, South Korea is seeking standardised speeds of 1GB by 2017. The substandard internet speed outside London presents massive material costs for regional network operators, telecoms providers and internet users, both domestic and commercial. It is estimated to be cheaper to send data across the Atlantic than to transport it between Manchester and London.

Significantly, in addressing this issue, it is possible to resolve the much wider problem of an unbalanced internet infrastructure across the UK. Slow or expensive internet services outside of London are linked to the difficulty faced by providers in having to transport data via the Capital; the historic data hub and previously the only internationally connected destination.

Now that international connections are available nationwide, a similar infrastructure should be replicated outside of London. A greater number of regional internet service providers (ISPs) would help to combat current latency and resilience issues, while reducing costs in the long term.

A corollary of this would also be increased security. Currently data centralisation in London makes the UK vulnerable to cyber terrorism and hacking, such as the 1.2 billion usernames and passwords stolen by Russian terrorists in August, it is encouraging that the London Internet Exchange (LINX) has recognised the additional expenses and inefficiencies borne by regional businesses and has established IXPs in Manchester and Edinburgh to allow more efficient access to content and to cut costs for operators and users.

A greater number of regional IXs, ISPs will not only reduce the overhead transportation costs for northern businesses but will make these areas far more attractive investment opportunities to prospective network operators and telecoms firms.

HS3 alone can only go so far to make the “Northern hub” a reality. With the Chancellor’s transport links possibly decades away, there is a need to invest now in highly connected hubs of digital activity using shared resources and clusters of digital businesses around them.

Investment in the internet economy will underpin the developments of more efficient road and rail investment, help accelerate the North’s economic recovery and allow the region to make a greater contribution to the UK’s future economic growth.

Programming, Technology

Unravelling Logs – Part 2

In my first post in this series, I talked about the ELK stack of Elasticsearch, Logstash and Kibana and how they provide the first steps into automated logfile analysis. In this post, I’m going to deal with the first step in this process, how we get logs into this system in the first place. I’m not going to go into the process of installing logstash and elasticsearch, that’s pretty well covered elsewhere on the internet, so this post assumes you’ve already done that.

Logstash can accept logs in a number of different ways, one of which is plain old syslog format. The first thing to integrate and probably the easiest, is syslog which is where the majority of logging happens anyway. We started off doing this by configuring rsyslog in exactly the way we would to centralise logging to another rsyslog server.

In /etc/rsyslog.conf ( or in a separate config file in /etc/rsyslog.d/ depending on your distribution of choice ) you’d use one of the following :

# Provides UDP forwarding. The IP is the server's IP address
*.* @192.168.1.1:514

# Provides TCP forwarding.
*.* @@192.168.1.1:514

On the logstash server, we then need to define an input to handle syslog :

input {
  syslog {
    port => 514
    type => "native_syslog"
  }
}

The type entry is arbitary, this just adds a tag to any incoming entries on this input, but because the input itself is defined as a syslog input, logstash will use it’s built in filters for syslog in order to structure the data when it’s pushed into ElasticSearch.

The traditional method of doing this would be to send logs over UDP, which has less of a cpu overhead than TCP. But as the saying goes – I’d tell you a joke about UDP but you might not get it … Once we start to treat our log data as something we need to see in real time, as opposed to a historical record that may or may not be useful, then not receiving it is critical. We found that rsyslog would occasionally stop sending logs over the network when using UDP, and we had no way of automatically detecting if it was broken or not.

Switching to TCP guarantees delivery at the network layer, but had it’s own set of problems – we hit another situation where if rsyslog had a problem sending logs, it could also hang without logging locally. From an auditability and compliance perspective it’s critical that we continue to log locally, so architecturally we decided we needed to split the log generation from the log shipping so that the two things can’t possibly interfere with each other, and we’re guaranteed local logging under all circumstances.

You can actually install logstash directly on your clients and use that as your log shipper, but a more lightweight alternative is to use logstash-forwarder. This is a shipping agent, written in Go, to push logs from your clients into a central logstash server. It’s designed for minimal resource usage, is secured using certs, and allows you to add arbitary fields to log entries as it ships them. Since it uses TCP, we also wrote a Nagios NRPE check which uses netstat to confirm that logstash-forwarder is connected to our logstash server, and we get alerted if there any problems with the connectivity.

The logstash-forwarder configuration file is in JSON, and is fairly simple, although you do need to understand a bit about SSL certs in order to configure it. We just set up the server details, certs and define which logs to ship :

{
  "network": {
    "servers": [ "logstash.yourdomain.com:55515" ],
    "ssl certificate": "/var/lib/puppet/ssl/certs/yourhost.yourdomain.com.pem",
    "ssl key": "/var/lib/puppet/ssl/private_keys/yourhost.yourdomain.com.pem",
    "ssl ca": "/var/lib/puppet/ssl/certs/ca.pem",
    "timeout": 15
  },

  "files": [
  {
    "paths": [ "/var/log/syslog" ],
    "fields": {"shipper":"logstash-forwarder","type":"syslog"}
  },
  ]
}

In the syslog section, you can see we’ve added two arbitary fields to each entry that gets shipped, one which defines the shipper used, and one which tags the entries with a type of syslog. These make it easy for us to identify the type of the log for further processing later in the chain.

On the logstash server side, our configuration is slightly different, the input is defined as a lumberjack input, which is the name of the protocol used for transport, and we define the certs :

input {
  lumberjack {
    port => 55515
    ssl_certificate => "/var/lib/puppet/ssl/certs/logstash.yourdomain.com.pem"
    ssl_key => "/var/lib/puppet/ssl/private_keys/logstash.yourdomain.com.pem"
    type => "lumberjack"
  }
}

As we’re not using the syslog input, we also need to tell logstash how to split up the log data for this data type. We do that by using logstash’s built in filters, in this case a grok and a date filter, in combination with the tags we added when we shipped the log entries. One important thing to note here is that logstash processes the config file in order, so you need to have your filter section after your input section for incoming data to flow through the filter.

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "<%{POSINT:syslog_pri}>%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
      add_field => [ "program", "%{syslog_program}" ]
      add_field => [ "timestamp", "%{syslog_timestamp}" ]
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

What this filter does is look for messages tagged with the tag we added when we shipped the entry with logstash forwarder, and if it finds that tag, pass the message on through the filter chain. The grok filter basically defines the layout of the message, and the fields which logstash should assign the sections to. We’re also adding a few extra fields which we use internally at Data News Blog, and finally we pass through a date filter, which tells logstash what format the timestamp is in so it can set the internal timestamp correctly. The final step is very important so you have consistent timestamps across all your different log formats.

Once we’ve configured both sides, we just start the logstash-forwarder service and logstash-forwarder will watch the files we’ve defined, and send any new entries on over the network to our logstash server, which now knows how to process them. In the next post in this series I’ll talk about some of the more advanced filtering and munging we can do in logstash, and also introduce Riemann, a fantastic stream processing engine which can be combined with logstash to do much more complex real time analysis.

Technology

Unravelling Logs – Part 1

Traditionally in enterprise IT environments, logs have tended to be used mainly for auditability purposes, and to enable post incident root cause analysis. Log collection is centralised if you’re lucky, and that in itself is regarded as a fairly major technical achievement. This kind of works OK when systems are mainly stand alone, and problems can be diagnosed from looking at an individual machine. Even in those situations though, logs tend to be the second point of call after an incident has already occurred and you’ve been notified about some failure from your service monitoring system – I’m sure many of us are familiar with the situation of investigating an outage, only to find that the machine involved had been logging partial failures for a long time without anyone noticing.

When you’re trying to manage large scale distributed systems, logs present a very different kind of problem, and a different kind of opportunity. For a start, there’s a vast amount of data, far too much for any sane person to trawl through, and secondly, for any particular point in time to make any sense you potentially need to correlate log entries across a large number of services and machines. Log data also starts to become much more important, as it gives you access to the real time information about the system state and often holds information that service monitoring won’t necessarily tell you, about events that may be transient  or trending but have impact on stuff like performance for example. We’ve started seeing logs as the heartbeat of our platform, and being able to sort and present that data in useful ways and also act on it automatically is one of the key development focuses for our team.

In order to develop ways of making our logs more useful, there are a couple of key things we need to do. Firstly we need a reliable mechanism for gathering and centralising log data, and that data then needs to be structured so we can do search and analysis. In this context structuring means splitting up into it’s consitituent parts such as timestamp, log message, creating process etc. and storing it in some kind of database format. This is obviously totally dependent on understanding the format of the log files, and this in itself can be massively variable – standards in this space are fairly fluid and by no means widespread so the format of log files varies tremendously.

Whilst there are proprietary solutions out there which provide functionality which addresses some of these issues, like Splunk for example, these can be very complex, and are often extremely expensive. Luckily for us, the open source world has a set of fantastic tools for doing exactly what we need – the ELK stack of ElasticSearch, Logstash and Kibana.

ElasticSearch is a distributed search engine, written in Java and built on top of the Apache Lucene search technology. It’s massively horizontally scalable and super fast. Kibana provides a highly configurable graphical frontend into ElasticSearch, and last but not least Logstash is a log storage and processing system.  Over this short series of blog posts, I’ll be going end to end across some of the approaches we’re adopting, including how we’re using the ELK stack, which hopefully may be of some use to others out there in starting to unravel your logs.

Technology

First Steps in Openstack

Into The Abyss…

It’s no secret I like cloud platforms, hence I work here. The facination stems from my time at Transitive. Even way back in the early 2000s our testing platform evolved into a cloud. Starting from a load of bare-metal servers running jobs, it was realised that we could get better utilisation by leveraging VMWare to spin up machines on the fly, which co-exist on the same physical host.

From a development perspective it’s fantastic to have a limitless number of fresh machines to play with. Gone are the days where your laptop would fill to bursting point with pointless transient applications which were required for the duration of the task at hand. Just fire up a VM, install the software, do the work and throw it away.

Orchestration and automation is the next big leap. Not only do you spin up a VM, but you can have it provision itself, installing software and self configure. All with a single command! This brings me to the topic of this post, how do you perform this voodoo magic?

Networking

We as the cloud provider worry about authentication. It is assumed you’ve got the openstack command line clients installed and the environment configured so you can talk to openstack. First task is to set up a network so your VM can talk to the outside world, and you can talk to it.

https://gist.github.com/spjmurray/3c1f1de74782bdc75267

Here I’ve created a new network called simon, then created a new subnet within it. When you create a subnet, neutron (openstack’s network component) also creates a DHCP server within that broadcast domain. Thus when you create a VM, it will get an IP address allocated to it out of that address pool, learn the netmask, broadcast address, and pick up a name service from google (that’s what the 8.8.8.8 and 8.8.4.4 addresses are all about for my less technical readers). Local area network setup, nice and easy. Next up we need to create a router to forward packets between that and the wide area network i.e. the internet!

https://gist.github.com/spjmurray/6f9c11dca405a9cda232

Yeah it looks scary but not really. A lot of this is hidden away if you use the web interface, but we’re really interested in scripting all of this and adding more automation! First we are asking neutron about the external networks onto the internet that exist. Next we create a router (which like all commands returns a resource ID). The final two steps connect the router ID to the external network ID, so if it doesn’t know about an address it will just fire it at the internet, and the router to our LAN. Looking at our subnet any VMs will be allocated an address from 10.0.0.2 to 10.0.0.254, it will query google’s name servers for address translation, and all packets no destined for machines within 10.0.0.0/24 get sent to 10.0.0.1, our router, and forwarded on the the internet. Easy!

https://gist.github.com/spjmurray/67b1c05486fc3c8ca7d7

Creating and Provisioning a Virtual Machine

This is where the fun really begins. My current task was to create a puppet orchestration server within the cloud to manage this website coincidentally. How was this accomplished?

https://gist.github.com/spjmurray/3258c1d5d83fb90f4cd7

One command… I did warn you! Bit of explanation required here. The flavour of a machine relates to the number of CPUs it has, how much RAM & how much disk. The key relates to my SSH public key, this gets automagically installed on the machine during boot so I can log in later. The image is the ID of a disk image I have uploaded into the glance service, in this case it is an Ubuntu 14.04 amd64 cloud image (the cloud image bit is responsible for adding ssh keys and performing provisioning). The NIC parameter tells the VM to create a network adapter and attach it to my network, user-data is a custom script that is run on first boot, poll basically waits for the server to be created, and finally puppet is the hostname of the machine.

So the real magic is the user data I passed in, this sets the machine up for me automatically. After all isn’t that the dream? Pressing the go button and surfing the web for the rest of the day!

https://gist.github.com/spjmurray/e7e50ce334e5f7948e08

For the more astute of you no that’s not the actual production key, nice try! So basically I am installing puppet, r10k and git, then checking out my puppet repository from GitHub, and applying the manifests for that particular host. This has the result of provisioning a puppet master for my subnet and a name server, as openstack does not provide DNS as a service yet. Finally I override the DNS server supplied by DHCP. You can at this point change this in your subnet configuration so it gets picked up automatically.

Is There Anybody Out There?

Not quite finished yet as we can’t talk to the VM from the internet. The VM has an IP address on the local subnet, but nothing that can be uniquely addressed from the internet. Cue floating IPs:

https://gist.github.com/spjmurray/9743e48bb6c271f9c124

Floating IPs are allocated from a pool and are expensive, so keep that in mind, the first thing you will want to provision is a VPN endpoint. The floating IP is associated with the network port on my VM. Under the hood this creates a rule to DNAT incoming packets from the internet to my server at 10.0.0.3. The final thing is to create firewall rules that allow access to the floating IP address, and you can now log in with SSH!

https://gist.github.com/spjmurray/60396ccd36402ef8674a

Looking Forward

So as you can see, I’ve managed to build a LAN, connect it to the internet via a router, create and install a machine which I can log in to in 11 commands. This is the power of the cloud, no more waiting weeks while finance um and ah over buying equipment, IT install it and then finally you can install it and get on with your work. Once the provisioning code was written the whole process end to end took 5 minutes.

The provisioning is the bottleneck now. We want to give a world class experience in simplicity and usability. We don’t want you to have a degree in computer science to use this stuff, so looking forward into the future we want to provide this functionality for you, imagine a world where you click a button and your entire business creates, installs and configures itself. Your infrastructure constantly monitors and heals itself adapting to demands. There are a lot of cool ideas out there and we want to hear them!

close
Start typing to see posts you are looking for.
Scroll To Top