In my last post I wrote about the problems with Cloud Service Level Agreements (SLAs), and why they tend to be less useful than one would like. OK, so what can you do about it?
1) Ask Questions — If your service provider offers an SLA, study it and ask questions. “Is that all you can promise? What’s your historical performance? Can you show me some documentation of that from your monitoring tools? Have you had any downtime incidents in the last five years? What were the root causes, and what did you do to address those?” The discussion will be useful in filling in your picture of their capabilities. In the end, an SLA is just a number (or a few numbers), and it doesn’t give you much insight into the provider’s ability to deliver. An SLA is a bit like a car warranty—it’s nice to know you have it, but your real hope is that it won’t be needed and the thing won’t have any significant breakdowns.
2) Try to Negotiate — Good luck on this, but give it a try. I’ve found that, normally, the salesman doesn’t have the discretion to change the penalties in the SLA. However, if the vendor doesn’t offer you an SLA, just asking might get you somewhere. In the logistics software space, there is a company called SMC3 that creates a system used for calculating the rate of a Less than Truck Load (LTL) shipment, and most companies in the industry use this product. They recently converted from a traditional licensed software model to a cloud model. I was at their conference in Atlanta a few days ago, and asked about their SLA policy. They don’t promise an out-of-the-box SLA, but they say that they’ll negotiate one on occasion. So it seems that if you are a big enough customer to them, you’ve got a chance of getting an SLA in writing. You’re not doing this to negotiate the SLA up to a level with which the vendor isn’t comfortable, because that isn’t going to change their real-world ability to deliver on the SLA. But you want to understand what they can deliver, and get comfortable (or not) with this level of performance, assuming the vendor’s service is critical to your business. In the case of SMC3, if you are moving LTL shipments for a living and you need to quote prices to your customers, then you’re really depending on SMC3 to deliver, because if their service goes down you can’t provide price quotes to customers. So you owe it to yourself to understand their capabilities.
3) Look at Audit Reports – If the vendor has been audited (e.g., using the SAS 70 standard) then you should get a copy of the audit report. Read it. DO NOT just treat this as a check mark—“great, they passed an audit.” Look where that got investors in Enron and Worldcom. Like most standards of this nature, the audit “control objectives” for SAS 70 are essentially whatever the cloud vendor says they are. So the cloud vendor tells the audit firm their control objectives, and the auditor audits to check that there is a reasonable assurance they would achieve those objectives (the auditor will state very clearly, though, that this is not a guarantee). The auditor will make sure the business objectives that the cloud vendor uses are not completely stupid, but I find that a lot of people assume (wrongly) that SAS 70 is some kind of seal of approval that the vendor is doing “all the right things”. SAS 70, for example, says nothing about providing disaster recovery (DR) capabilities. A vendor can provide zero DR, and pass an SAS 70 audit with flying colors, simply by not stating any control objectives regarding DR, but showing compliance with whatever control objectives they do have.
4) Do Your Own Audit — While it might make you feel good to host your systems in-house, because you feel you have more control, chances are that you can’t do as good a job as a strong cloud vendor. You probably know someone who feels comfortable driving a car, since they control it, but who doesn’t like flying in an airplane because they have to trust the pilot, even though flying is statistically safer than driving a car. So maybe you should find out if your cloud vendor knows how to “run an airline”. If the service provider is critical to you, arrange an audit where you visit them and understand their capabilities. Are they doing the basics? Do they have multiple power supplies on each server, or only on some of them? Do they have multiple NICs on every server? Ask them to show you. Do they have current maintenance contracts with their key vendors (server hardware, network hardware, etc.) or did they “forget” to renew them? How fast would that vendor promise to rush in a part required to fix their box if it went down? Next business day? Hmmmm. One of my customers once did an audit like this on my data center, and while it was painful at the time, ultimately it was beneficial for me and my team, and for the customer, since they had a much deeper knowledge of our real capabilities, and in the end they grew comfortable that we knew what we were doing.