This is a guest post by Frédéric Faure (architect at Ysance) on the differences between using a cloud infrastructure and building your own. Frédéric was kind enough to translate the original French version of this article into English.
I’ve been noticing many questions about the differences inherent in choosing between a Cloud infrastructure such as AWS (Amazon Web Services) and a traditional physical infrastructure. Firstly, there are a certain number of preconceived notions on this subject that I will attempt to decode for you. Then, it must be understood that each infrastructure has its advantages and disadvantages: a Cloud-type infrastructure does not necessarily fulfill your requirements in every case, however, it can satisfy some of them by optimizing or facilitating the features offered by a traditional physical infrastructure. I will therefore demonstrate the differences between the two that I have noticed, in order to help you make up your own mind.
There are several types of Cloud possibilities and I will stick with the AWS types which are infrastructure-oriented services, rather than the Google-type services (GAE – Google App Engine), to mention just one, which offers a running environment for your web applications developed with the APIs provided (similar to a framework). In fact, regarding the latter, we can’t speak for clients (they are the ones holding the credit card) about infrastructure management, because we upload our application using the APIs provided and leave the entire infrastructure management to the service provider. It doesn’t mean it’s less about Cloud computing, but simply about a Cloud service that’s more PaaS-oriented than infrastructure-oriented.
As far as physical infrastructure is concerned, I will examine the notion of self-hosted infrastructure and equally the notion of infrastructure supported by a hosting provider. Similarly, I’ll also look at infrastructures based directly on hardware, as well as those based on virtualized environments. Cloud computing is also based on virtualization, but we are not so interested in that technology here, rather the way in which it is provided to clients (you). In fact, you can simply start up instances via a console, as you do for EC2s if you have an ESX (VMware) for example, but it involves “only” a hypervisor which partitions the physical servers into several virtual computers. You will still have to take care of buying equipment (blades, etc.), configuring the network, etc. But we will come back to these details later.
Cloud computing = Systems administrators marked down?
Yes, the sales are on! Are you looking for a sweater, a jacket, … a systems administrator? I have often come across people who think that Cloud (in the case of AWS) will enable them to get by without an experienced systems administrator, and to build an infrastructure with inferior competence. The answer is obvious: WRONG!
Perhaps a clever sales pitch can convince you that the various services are so user-friendly, you can do it all yourself, and that prepackaged AMIs (Amazon Machine Image) will make life easy, but it goes without saying that once you have started up the EC2 instances, you connect to the computers (SSH / port 22 for Linux and TSE / port 3389 for Windows), then you have to set the parameters, do the fine-tuning, etc.
What applies to systems administrators faced with AWS applies equally to systems architects in the context of Cloud computing services providing access to higher layers (PaaS like Google App Engine). A person with experience in the field, able to set up the infrastructure requirement is needed: the tool may change but the skills must still be available. Note however that if you use GAEs, you don’t need a systems administrator for the application. If the Cloud service editor is offering a service in a given layer (HaaS, IaaS, PaaS, etc.), there is no longer any need for people to deal with the lower layers. However, we do accept the framework supplied by the Cloud editor.
The systems administrator cannot be done away with, but his role is changing: he is becoming more and more of a developer. Indeed, being able to pull up resources on the fly will enable infrastructure management to be scheduled and automated via scripts which will call up the APIs provided by Amazon enabling communication with its web services. Everything at Amazon is web service: EC2, EBS, SQS, S3, SimpleDB: the only non-SOAP or REST operations that exist are when you connect directly to EC2 instances that you called up via web service or when EC2 instances dialogue with the EBS that you called up via … I’ll let you guess.
The administrator can then, rather than going into the computer room, add a disk, connect a server (which would be the case with a physical architecture) or else pick up the phone and ask the host to do it (fetch a coffee… call again… take a Xanax or a Prozac pill…), request resources via a script in Ruby or Python… You can then take automation of a Cloud infrastructure much, much further, with a set of scripts and tools.
The systems administrator’s craft is therefore evolving between a physical infrastructure and an AWS-type Cloud infrastructure: he is becoming more and more of a developer. But the systems administrator remains essential nevertheless.
Elastic is fantastic!
As I mentioned earlier, one of the crucial differences between the two types of infrastructure is the flexibility and dynamism provided by the Cloud solution, compared to a traditional physical architecture (whether it is based on virtualization or not). That means the elimination of the time it takes to install and set up the logistics (equipment purchase, installation of the OS, connecting to the network – the physical network and the configuration of the interfaces, etc.). Likewise, when you no longer need an item of hardware (EC2 virtual instance, EBS volume, S3 object, etc.), you return it to the resource pool: it will be reinitialized so that none of your data can be retrieved, and made available again until the next web service call.
You also have complete access to certain elements such as security groups (firewall) set for each instance… And that’s very useful. It’s very practical, particularly compared to a traditional hosting provider: do you remember the last time you had to change the firewall rules?
But it’s not simply about the pros and cons of purchasing servers versus running instances. AWS are backed by datacenters which are already industrially organized and tested. All the safety standards that need to be met in terms of fire protection, computer cooling, redundant electrical power supply, physical security against break-ins, distributing the hardware across 2 or more physical datacenters for disaster recovery, etc. entail a colossal initial investment and even when everything is installed, you still won’t be able to recreate the same quality within your company (99% of the time, in any case). You can get all that however, or part of it, with a traditional hosting provider.
But there are also all the more software-like services, such as data durability management (redundancy/replication as on the EBS and on S3), accessibility or high availability, monitoring hardware (to be alerted when physical components are showing signs of weakening), procedures for breakdowns, etc. I will let you read The 9 Principles of S3 (French version), so you understand just how many concepts are included. You won’t get all that with a traditional hosting provider (and forget about having it @Home). The quality of the S3 service is effectively a huge advantage, especially compared to current pricing… Let’s talk about prices!
There are no fixed rules. With Cloud, you pay for the resource by the hour and when you stop using it, you stop paying. Instances (Linux at the time I’am writing the article, but Windows will come soon) can also be reserved on the AWS for one year or for three years: this is known as Reserved Instances: you pay a one-time fee at the start, and afterwards you pay the hourly usage charge at a discounted rate, which leads you to a tipping point starting from a certain percentage of the resource use over the year (or three years). For more information, click here. To easily calculate how much your infrastructure will cost you, take advantage of the new calculator provided by Amazon: Simple Monthly Calculator. In all cases one part is related to the hourly usage and another to your traffic/volume stored.
You can do the comparison with the cost of your local infrastructure or with that of your hosting provider. The Cloud set price is particularly attractive in the following cases:
- The set price is unbeatable for POCs, demonstrations/presentations or architecture load/validation tests.
- It is very attractive for applications or APIs mounted on an SaaS-type economic model and for which you need to spend money on their resources only when the clients pay to use the abovementioned APIs.
- It is a good price for the social applications found on Facebook, for example, and which can take off overnight thanks to the social media phenomenon and may experience a boom (or a drop) in hits.
- It is also cost-effective when you are launching a new company, or a specific project within a larger entity and you do not wish to invest heavily in logistics right at the start.
For all the many other specific cases among you, you’ll have to do your own calculations.
Slacking off? No, of course not …
Usually, no matter what type of infrastructure you have, the same components and mechanisms should be installed. However, it must be acknowledged that often the cosy aspects of “home”-hosted infrastructure can lead to a certain lack of rigor on many issues. The fact that Amazon, with its AWS, offers a dynamic and volatile solution for these EC2s, compels you to install mechanisms (which should be standard) in order to consider failure or disaster recovery plans more seriously, given the volatile nature of the tool, and to identify the important data with the aim of ensuring data durability (EBS, S3 backups, etc).
Network and shared resources
The network is an important feature … and in more than one way! Effectively, the Cloud configuration is already prepared for you, and that’s convenient. But that also means that you don’t control it, and consequently you cannot monitor it yourself and therefore diagnose, for example, the causes of a slow-down. There is a similar lack of transparency concerning the shared resources at Cloud: at our level it is impossible to estimate the impact of other people using the resource we are sharing (such as the physical computer on top of which is based our EC2 instance, the physical device we use as an EBS network-attached device, the network bandwidth, etc.). The only network monitoring possible is limited to input/output on the instance (for example, an EBS is a network-attached device, so … there is no means of verifying the connectivity conditions, you will have to do with the I/O disks). The monitoring you can do is concentrated on the EC2 virtual instance itself (the instance is managed by a hypervisor and based on a Xen virtualization). Total visibility of our infrastructure is therefore not possible on Cloud. This must be taken into account… and accepted, if you wish to implement your architecture on Cloud. You must also weigh up the same lack of transparency for other shared resources as previously mentioned (EBS use, etc.).
Likewise, a multicast is not possible when it comes to communication protocols: bear that in mind for certain clusters. This constraint is understandable, given the far-reaching impact that a mismanaged multicast can have.
That is due to Cloud’s own way of operating: it has easy ways which mask a certain number of elements you no longer control.
On-call support, monitoring, BCP (Business Continuity Planning) and penalties
One question I’ve been asked frequently is “Does Amazon provide on-call support (regarding your own application/infrastructure your are running on top of the AWS) ?” The answer is no. AWS must be seen as a set of tools offered by Amazon which ensures that these tools are always up and well working. They maintain these tools and the development of their various functions. However, you are responsible for your use of the tools (in any case they do not possess the private keys of your EC2 instances…), so there is no monitoring / on-call support / BCP (Business Continuity Planning) package.
Unlike the specific contracts you may sign with your hosting providers, you must provide for these elements yourself, or regarding on-call support for example, you could out-source it to a facilities management company. Ditto for monitoring: Amazon offers Amazon CloudWatch, but the information (% of CPU, read/written bytes onto disk and in/out bytes on the network) is too brief for genuine monitoring as provided by Centreon/Nagios, Cacti, Zabbix or Munin. The CloudWatch information is used by the Auto Scaling function, but does not replace true monitoring. Some traditional hosting providers offer packaged monitoring with their services.
As far as the BCP and penalties are concerned, it amounts to being internally hosted: you are responsible for your resources and you manage failure / disaster recovery in line with the tool’s capacities (the AWS). This is where it’s important to understand the global nature of the architecture of Amazon services: if you don’t understand how the tool works, you will not be able to implement an effective BCP. As for penalties, there’s nothing unusual: it simply means a smaller bill for the month when you come under the ‘Service unavailable’ category as defined by Amazon criteria. This has nothing to do with the penalties based on the amounts lost due to unavailability.
It is imperative to consider Amazon web services as a tool. In spite of it being an accessible AWS support (that you can call for tools’ level questions and issues) that you can pay for, you will not get the full contractual potential you would have with a more traditional hosting provider, and broadly speaking, you will be responsible for your architecture on all levels (including the security of your instances: don’t lose your keys!).
Security… frequently a taboo topic, as soon as we start talking about Cloud computing. I don’t mean the integrity of stored data or even access management on the virtual instances we are responsible for, I’m talking about the confidentiality of the stored data on the different services (S3, EBS, EC2, SQS, etc.) or data which transit between those services.
What provokes my scepticism is that Cloud is not easily audited. You have to have faith. It is no more risky than placing your trust in a traditional hosting provider, or in your own internal teams… But it’s brand new! So be careful! A normal reaction. But perhaps this is exactly an opportunity for us to work on security at our own level, something often neglected, due to over-confidence or lack of interest. The first task is to encrypt the information: for stored data as well as data in transit. Remember to take into account the CPU overhead encryption/decryption procedures. The second is to fully understand the various security mechanisms of Amazon’s services:
- Access Credentials: Access Keys, X.509 Certificates & Key Pairs
- Sign-In Credentials: E-mail Address, Password & AWS Multi-Factor Authentication Device
- Account Identifiers: AWS Account ID & Canonical User ID
Next you must select the people who’ll be authorized to access the different security keys.
The evolving duties of infrastructure management can be clearly seen in this first part: from handling physical resources by means of APIs, underlying mechanisms ensuring data durability, availability of services, etc. right up to server power supply and the physical security of datacenters, all of which are supported transparently. The end result: one should “only” see the API which dialogs with a distant server. That is the difference with physical infrastructures. The virtualization (which is one facet of AWS) that we know and have already been using for some time now, is used by Amazon: it’s not so much a technical revolution - even though I don’t deny the complexity of the implementation and support that goes into it – as the service offered with it, which provides the real added value. It is matched with a new ‘pay-as-you-use’ aaS (as a Service) economic model. This has enabled the emergence of some applications (such as the games found on social networks), which only a few years ago would otherwise have been compromised by the initial investment.
The facilities provided by Cloud Computing inevitably come with some measure of reduced control and visibility on certain parts of the infrastructure, especially on the network. That’s the price you pay, be it quite negligible, or truly problematic: it all depends on your requirements.
AWS should be viewed as a complete tool, but one which does not excuse you from performing all the best practices or obtaining all the standard components of an infrastructure: log server, monitoring, BCP, configuration manager, etc. All these elements are and will continue to be your responsibility. One mustn’t have too naïve an expectation: as AWS offers HaaS and IaaS, you will still need a competent systems administrator, and particularly one who fully understands the AWS architecture (otherwise you might be disappointed) – if you switch to GAE (Google App Engine), you will still need an architect who fully understands GAE architecture, etc. The business is constantly evolving.
As for AWS security, I am reasonably confident. It must be emphasized firstly that information and data are probably less secure within your own company than entrusted to Amazon (in most cases anyway – I shouldn’t generalize). AWS’ exposure on the Net and Amazon’s commitment to the business imply that Amazon take security very seriously. Furthermore, you are responsible for a large part of this security (management of the keys, etc.) and believe me, that is surely the most risky feature. When it comes to transferring and storing data, think “encryption”.
- Here are the original English versions of the article: Cloud AWS Infrastructure vs. Physical Infrastructure (1/2) and Cloud AWS Infrastructure vs. Physical Infrastructure (2/2) by Frédéric Faure. I edited them together to make one article.