The importance of hosting resilience
The recent week long outage experienced by US based hosting provider HostV highlights the importance of choosing a hosting provider who can demonstrate true resilience – particularly if whatever you are hosting is critical to your business – and let’s face it, nowadays most hosting is business critical in some way shape or form.
You simply cannot afford to pay for a hosting service which will be offline for days at a time. There are many problems which can arise which will bring down a hosting infrastructure - server hardware failure, storage failure, datacentre power failure, network connectivity issues etc. In the case of HostV it seems some sort of RAID failure was at fault – with modern storage systems becoming more and more capacious providers can achieve significant economies by cramming lots of customer virtual machine instances onto a single storage array – if of course there’s a problem with this storage array, all of those customers suffer, and these problems can take a long time to resolve due to the sheer volumes of data involved.
There are things that can be done to mitigate the risk of each of these types of failure, and most providers will mitigate at least some of the risk to achieve a level of resilience for their infrastructure.
Few however go as far as we have at Cloud Data - we completely replicate our infrastructure across geographically separate datacentres, and include this level of protection as standard in all of our hosting products. If Cloud Data suffers a major failure we simply swing all of our services to another location and continue to provide service as normal. In the very worst cases we may have to roll back to a snapshot (which we take every 15 minutes) before resuming service. We don’t think resilience should be an option – it’s a necessity, as this outage has shown to many HostV customers.
