Jul 212008

Lukas Biewald lays bare his frustrations with Amazon’s S3 service, particularly after the recent S3 service outtage that left his FaceStat business offline for more than 7 hours recently.  Actually, Lukas has double posted on this issue – he has a much more scathing criticism of S3 over on his own blog:  “Amazon S3 Screws Us Over“.

Lukas says he’s had it with S3’s reliability problems and is looking for a replacement, but isn’t all that impressed with the available alternatives in the scalable online hosting space.  Google’s App Engine has earned similar criticism after a June 17 service outtage, reported on TechCrunch in “Google App Engine Goes Down and Stays Down“.  So, Lukas indicates he’ll have to go back to the old solution of building dedicated servers, and shouldering all the associated costs and risks that massive online hosting data centers were supposed to do away with.

It sounds like the market is ripe for someone – anyone – who can deliver scalable services with ironclad uptimes and service guarantees that pay out true value of business lost when the service goes belly up.  It’s not about the cost of the service itself – it’s about the value of your business lost when the service goes down, which is almost always significantly higher than the cost of the service.

The first comment on Lukas’s Delores Labs blog post suggests that if a business can’t survive a 7 hour outtage, then something must be wrong with the products or the business itself.

Service outtages do more damage than just lost sales. They damage your site’s reputation, which is much harder to repair than lost sales. Web consumers are very flighty and very finicky and generally follow patterns of addiction with their favorite web sites.  That is, when they can’t get to their favorite sites, they get cranky.  When they can’t get to their favorite sites for hours at a time, they find a new favorite and probably never return.  Worse, they will shift from being advocates for your web site to vocal critics, and thus indirectly channel traffic away from your site via the social network influencer effect.

Few online services today do not have a close competitor, and the barrier to switching is usually little more than inconvenience and emotion.  The best way to retain your current customers is to not give them a reason to go looking at alternatives. Consumers are couch potatoes – as long as they’re satisfied enough with what they have, they aren’t likely to pay attention to alternatives – even when they have criticisms of the product, and even when the alternatives are superior.  Same as TV:  If you like the current program, you’re less likely to switch channels.  (Works great until the most disruptive element in commercial television – the commercial break) 

I think part of the problem, part of what is missing from hosting providers such as Amazon’s S3 and Google App Engine are service agreements that provide meaningful consequences in the event of service failure.  Google offers no service guarantees – it’s up when it’s up, and we’ll get around to fixing it when it breaks when we can.  The Google engineers definitely have their hearts and souls in fixing things ASAP, but Google the corporation protects the koffers and offers no promises.  Lukas indicates that Amazon S3 offers a 25% refund in the form of future service credits in the event of outtages.  25% might take a bite out of Amazon’s profit margin, but it doesn’t come anywhere close to the kind of horriffic damages provided by “utility grade” service agreements. 

Hosting services don’t yet conduct themselves as true utility grade operations.  Compare the service level agreements of any online service with those of, say, an electrical power company.  If you’re a multimegawatt industrial power customer, chances are good that you can demand service level agreements that are downright terrifying to the service provider. From friends and colleagues I hear stories of manufacturing plants in Silicon Valley that crank out a million dollars worth of product per hour, every hour, 24 hours a day, 360 days a year (allowing for only 5 days of systemwide downtime). When the power fails, the service agreement has the power company make up for the plant’s losses – the power company is held responsible for that million dollars per hour, until they fix the problem and restore power service.

Now that’s what I call an incentive.  Invest in redundant systems, continuous monitoring, and rapid response teams, or your power company will go broke within minutes of the first outtage.

Hosting provider service level agreements won’t leap to the extreme of 2x loss of business damages, but as new players enter the hosting market, they will have to do something to differentiate themselves from other hosting services.  There’s not a lot of room for differentiation in the hardware or the service itself, so the new guys will have to distinguish themselves on cost and service guarantees.  A new hosting provider will make deeper concessions to the consumer in service agreements than the old guard offered.  Google and Amazon don’t need to offer significant service level guarantees primarily because nobody else is forcing their hand by offering better.  Over time, competitive pressures will force an improvement in service level agreements for online hosting services as the hosting market becomes deeply commoditized.

All we need now are more competitors.