Is Amazon’s EC2 right for you?
by Justin Leider on January 26, 2009
I've been asked this and similar questions quite a bit lately. But before I delve into the answer to this I want to lay the foundation and ask you a question. This one question should play a large part in your final assessment to go with EC2 or not. The question you should ask yourself is:
How quickly do you actually need to scale either up or down?
The answer to this will likely influence the correct solution to your problems. The following bullet point list is how I classify levels of scalability, each one comes with its own pros and cons but generally the quicker you need something the more expensive it is going to be.
- Immediate - within minutes - EC2 or other cloud computing networks
- Fast - within days to a week - Managed Hosting, Rackspace, The Planet, etc
- Average - within weeks to a month - Own your own hardware, Dell, HP, IBM, etc
- Corporate - within months/years - Good Luck
With this in mind, everyone hears the hype of EC2, with its scalability, fully managed hardware and virtualization but there really aren't that many people out there describing their experiences with it. When we made the decision to go with EC2 we did our research and due diligence before making the switch. There wasn't much to go on but the few articles and blog posts we did read were all positive. I guess we all got caught up in the hype here as well.
Even after all our research it turns out that going with EC2 was one of the poorer IT decisions we have made. EC2 has turned out to be more expensive, more difficult to implement and with poorer performance than we had ever expected even with our worst case estimations. To top it all off, we didn't fully utilize the benefits of going with EC2 which was immediate scalability. Our traffic is relatively predictable and grows or shrinks in manageable percentages and can be scaled up within days instead of minutes. We never have any massive spikes in our traffic either up or down. Even if we did have spikes we are limited by our MySQL cluster.
While we had to rethink a lot of our architecture to create a more horizontal platform instead of the traditional vertical scaling, MySQL was by far our biggest bottleneck. The source of the problem is rooted in Amazon's preset machine size. While they have done an adequate job of offering different types of instances with more memory in one line and more computational power in the other you are still limited to what they are offering. With the large database we have and the latencies between the instances and their permanent storage we were forced to keep as much of our database cached in RAM. Now this shouldn't have been too big a deal. Just get a machine with a ton of RAM. Well, unfortunately Amazon's biggest instance only offered us a maximum of 15GB. Needless to say this was not sufficient and forced us to adopt a cluster solution. This in and of itself is not ideal especially when you should be able to run off a single box with 32GB of RAM and access to fast local disks. However, it took us twelve (12) m1.xlarge instances to reach the level of performance and availability we desired. Not to mention the network IO latency between node and disk storage and node to node adding insult to injury.
While the speed and size of the cluster was not desirable, it worked. However, we had to completely forfeit any sort of scalability to achieve a working database. To my knowledge there is no way to quickly and easily boot up more instances of MySQL to supplement a live cluster. In order for us to add more capacity we would have to perform a rolling reboot of every machine in the cluster. Its unfortunate that databases were not designed with EC2 in mind.
However, there are companies who are trying to tap into this pain point. We were looking very intently at a company called Continuent who produces a MySQL cluster monitoring and management tool. Unfortunately, as of Jan 2009 the product was still in private beta and was unavailable to us. This tool would have allowed us to add nodes to the cluster on the fly without having to take it down in the process. Although, even then with this extra tool, which wasn't cheap, you still couldn't scale down the cluster without taking it off-line. As far as I am concerned, if you are already using the largest instance available to you (an m1.xlarge or c1.xlarge), there is no way to vertically scale up a database with EC2. Instead you are forced into a less than ideal environment for hosting a horizontal architecture which could have serious consequences for your code base and SQL queries.
To be honest, EC2 offers a lot of benefits that are hard to come by with other solutions. EC2 is great for companies doing lots of non-real-time activities such as batch and queued processing. Companies who have a small database that can be cached in RAM and replicated easily will also benefit from EC2, just boot up a bunch of instances and go to town. However, the bottom line is if you have fairly consistent usage patterns and your applications are performance sensitive then there are much faster and more cost effective ways of abstracting your hardware requirements. We at citysquares are in the process of moving off of EC2 and onto a managed hosting platform. We still enjoy the benefits of leased hardware like we had with EC2 and the ability to quickly add new hardware. Granted, more servers aren't available to us at the drop of a hat but a couple days lead time to get another box up and running is more than sufficient for us. Not only that but we also have a whole team of IT people working with us to help alleviate our burden of supporting the entire hardware/software stack. We can now focus on what we do best which is our application.
Keep in mind that there is no concrete answer as to whether EC2 or cloud computing in general will work for you or not. You need to determine if the capacity and latencies of the pre-determined instance sizes will meet your growing infrastructure needs. For us the bitter answer was a resounding no. We were able to spec out a solution in a fully managed hosting environment for about half the monthly cost of EC2 while increasing the performance of our application significantly.
So, is Amazon's EC2 right for you?
PHP on Rails – The Flash
by Jacques Fuentes on December 15, 2008
Last week I wrote an article about extending my framework (Pho framework courtsey of Kien La) to DRY your code and to be more similar to Rails.
Today, I would like to present to you another class that will help extend your framework in similar fashion.
The Flash
If you aren't familiar with the way RoR uses "flash" messaging, you can take a look at the RoR API, or you can read this explanation:
The flash provides a way to pass temporary objects between actions. Anything you place in the flash will be exposed to the very next action and then cleared out.
The flash allows you to place messages in a user's session that can be viewed when navigating (or redirecting) to different pages or even the current page. If you read further into the API, you will see that the RoR flash offers you the following behavior:
- Set a regular message (viewable upon next http request)
- Create a "now" message (viewable now and not upon next http request)
- "Keep" a message or all messages (is a message is viewable now, it will now also be viewable upon the next http request)
- Discard/unset a message or all messages
If you're a good little coder, you would first attempt to steal this code from somewhere else. I have found a few sites that offer 'some' of the functionality of the RoR flash; however, I would like to have the flash in its full capacity. Therefore, I have created my own class.
The behavior of the flash is rather simple and so is my class. If you think about the lifetime of the messages, it should hit you that we have essentially two types of flash messages: now and later (next http request). Even though we have two types of messages, we actually only have one corresponding message. This means that a certain flash message "error" could have a now and a later message. The reason behind this is beceause we want to use discard() and keep() which we should be able to pass a 'key' to when invoked so that discard would remove all traces of our "error" message (or all messages if no key is passed). My class requires php >= version 5. Keep in mind that you will only want one instance of this class, so you could make it a singleton if you'd like. The constructor does all of the work by removing old messages and such, so you only have to worry about setting messages. Okay, let's see some code.
<!--p # Instantiate flash which will remove old messages; $flash = new UserSessionFlash; $flash->error = 'This is an error message for the next page request.'; $flash->now('notice', 'Message will be available for the current page request.'); # Allows you to use the 'notice' for the next page request. # If you do not pass a key, then it will keep ALL messages. $flash->keep('notice'); # The $flash->error above has now been erased. # Just like keep, if you do not pass a key, then ALL messages will be erased. $flash->discard('error'); # Delete all messages. $flash->discard(); ?>
As you can see this is all fairly basic. Simply add the flash instance to your action controller, and you access it via your page controller. What I have done is included a function in my action controller called flash() that returns my instance of the flash class. So my page controller would call
$this->flash()->error = 'This is an error msg.'; $this->flash()->now('notice', 'Show me the money.');
If you'd like, you can check out some other implementations below. Again, none of the other classes I've found offer the full rails flash behavior.
http://poorbuthappy.com/ease/archives/2007/04/22/3589/rails-flash-in-php
http://www.phpclasses.org/browse/package/3668.html
http://shabadeehoob.com/2007/03/17/rails-like-flash-messages-in-cakephp/
http://api.phpontrax.com/__filesource/fsource_PHPonTrax__vendortraxsession.php.html
PHP on Rails
by Jacques Fuentes on December 8, 2008

Adding conventions to DRY our code
This article will provide a few snippets of code that I have recently plugged into the custom PHP framework that I use. However, those of you whom use and are more familiar with popular frameworks may be able to use this as well. I decided a few days ago to alter this custom framework in the following ways:
- an "application controller" which has access to the base and current page controller
- filter capabilities in page controllers
First, I will outline some basics about my framework so that you can understand my implementations.
The custom framework that I use is very similar to rails' conventions. It is an MVC framework and has almost a 1:1 folder/file structure to rails. One glaring difference is that it uses camelCase throughout the framework in opposition to the rails convetion. Anyway, let me start with the app folder (since this is the center of our attention).

You can see above that my app folder is almost exactly like rails except for the camelCase and the fact that my views have .tpls (go smarty!). You'll also notice that I have 3 page controllers plus the application controller. We are going to start with how I have implemented the application controller.
The Application Controller
You should already know that my page controllers will extend the "base" controller (which is simply Controller for me). Thus, my page controllers have access to all of the essentials such as redirections, session access, IoC, flash messaging, etc. So, we want to make another controller which will allow those 3 page controllers (and any future page controller) to access shared functions so that we can maintain DRY code. Keep in mind that this application controller should also have access to the parent controller.
The problem: How do we easily give our application controller shared communication between the parent controllers and child controller, but also disallowing it from being a page controller itself?
First let's take a look at my page controller:
<!--p class HomeController extends Controller { public function actionIndex() { $thi-->requireUser(); } }; ?>
You may have guessed that this "action" will respond to http://somewhere/home/index/ and render app/view/home/index.tpl (and you are correct!). Inside the function we are trying to access the application controller's "requireUser" method. Here's what the application controller looks like:
<!--p class ApplicationController extends AbstractApplicationController { public function requireUser() { if (!$thi-->session()->hasRole(array('User'))) { $this->flash()->error = 'You must be logged in to do that.'; $this->redirect(array('controller' => 'home', 'action' => 'index')); } } }; ?>
As you can see my application controller doesn't extend "Controller", but I'm still calling upon its methods. By having it extend from another class I get the easy benefit of not allowing the controller to be a "page" controller which means I cannot place action functions inside of it and render pages. Instead, it is simply there to DRY up some code and allow my other page controllers to access its methods. So how do we accomplish this and what does AbstractApplicationController look like? We'll get to that in a second. First, let me show you the two parts of the parent controller which we will need. Basically, directly before our method actionIndex in HomeController is invoked, we will create a new ApplicationController object in our parent controller so we can start communication.
$this->applicationController = new ApplicationController($this);
We pass $this to its constructor because the AbstractionApplicationController is basically a placeholder (or proxy) for the controller object.
<!--p abstract class AbstractApplicationController { private $controller; public function _construct(&$controller) { if (is_subclass_of($controller, 'Controller')) $thi-->controller =& $controller; else throw new Exception(get_class($controller) .' must be a subclass of Controller'); } public function getController() { return $this->controller; } public function __call($meth, $args) { return call_user_func_array(array($this->controller, $meth), $args); } }; ?>
The beauty is in the magic method __call()
As you can see any method not found in your ApplicationController will find the magic __call() method and attempt to execute the method on the cached controller. This means when we redirect or add messages to the flash, we can use the same syntax as we would in our page controllers. This gives us one-way communication. What about accessing the "requireUser" method from the page controller? Again, there is beauty in __call(). We place the same thing in the base controller so that any method invoked from our page controller that isn't part of the base/page controller should be redirected to the ApplicationController object. Let's take a look.
<!--p public function __call($meth, $args) { return $thi-->applicationController->$meth(); } ?>
This allows us to call $this->requireUser() in our HomeController' actionIndex() method and it will invoke through base controller's __call() on the ApplicationController object. Now we have two-way communication between our page controllers and our application controller. Keep in mind that since the ApplicationController is not a child of Controller, we can only call upon public methods inside Controller and our page controllers.
The Filter
Rails has a great option for allowing the developer to invoke certain methods before or after the current page's action is invoked. I wanted this bonus also.
<!--p class GamesController extends Controller { protected $BEFORE_FILTER = array( 'requireUser'--> array('except' => array('myExceptionAction')) ); public function actionIndex() { //before filter method will be invoked before this method is invoked } public function actionMyExceptionAction() { //before filter method will not be invoked for this } }; ?>
If you aren't familiar with rails, don't fret, the logic is simple. I'm using the protected $BEFORE_FILTER to tell my parent controller to execute the requireUser() method before it executes any of the action methods except actionMyExceptionAction(). You can change the "except" key to "only" to change from a black-list to a white-list. Where is the requireUser() method? Well, If you haven't fallen asleep yet, you should know that requireUser() resides in our ApplicationController. How does this code work?
The beauty is in PHP's reflection class.
<!--p //instaniate our ApplicationController before //the current page's action is invoked $thi-->applicationController = new ApplicationController($this); //run before filter $this->filter('before'); private function filter($temporality) { $filter = strtoupper($temporality.'_FILTER'); $controllerName = $this->getControllerName().'Controller'; $controller = new ReflectionClass($controllerName); foreach ($controller->getProperties() as $property) { if ($property->name === $filter) { $filter = $this->{$property->name}; if (!is_array($filter)) { settype($filter, 'array'); $filter[$filter[0]] = $filter[0]; unset($filter[0]); } foreach ($filter as $method => $options) { $action = $this->currentAction(); if(!is_array($options) || (!isset($options['only']) && !isset($options['except']))) $this->$method(); elseif (isset($options['only']) && in_array($action, $options['only'])) $this->$method(); elseif (isset($options['except']) && !in_array($action, $options['except'])) $this->$method(); } } } } ?>
Basically, we use reflection on our GamesController and then find any properties that match our filter name (BEFORE_FILTER). We make sure to turn it into an array if it isn't (which means we can do BEFORE_FILTER = 'requireUser';), and then invoke the method based on whether or not it fits the description. If "only" and "except" do not exist then it should be invoked. Otherwise, it should not be invoked if the current page's action is not in the "only" array or if it is in the "except" array. As you can see, you simply need to add protected $AFTER_FILTER = 'someMethodHere'; to have an after filter invoke some method. There you have it!
This concludes our crazy talk of making your framework more like rails (or at least adding bits and pieces to make your life easier).
The Limitations of Scaling with EC2
by Justin Leider on October 8, 2008
Just as with any platform you choose, EC2 has its own limitations as well. These limitations are often different and harder to overcome than what you might find while running your own hardware. Without the proper planning and development, these limitations can wind up being extremely detrimental to the well being and scalability of your website or service.
There are quite a few blogs, articles and reviews out there that mention all the positive aspects of EC2 and I have written a few of them myself. However, I think users need to be informed of the negative aspects of a particular platform as well as the positive. I will be brief with this post as my next will focus on designing an architecture around these limitations.
The biggest limitations of Amazon's EC2 at the moment as I have experienced, are the latencies between instances, latencies between instances and storage (local, and EBS), and a lack of powerful instances with more than 15GB of RAM and 4 virtual CPUs.
All the latency issues can all be traced back to the same root cause, a shared LAN with thousands of non localized instances all competing for bandwidth. Normally, one would think a LAN would be quick... and they generally are, especially when the servers are sitting right next to each other with a single switch sitting in between them. However, Amazon's network is much more extensive than most local LANs and chances are your packets are hitting multiple switches and routers on their way from one instance to another. Every extra node added between instances is just another few milliseconds that get added to the packet's round trip time. You can think of Amazon's LAN as a really small Internet. The layout of Amazon's LAN is very similar to that of the Internet, there is no cohesiveness or localization of instances in relation to one another. So lots of data has to go from one end of the LAN to the other, just like on the Internet. This leads to data traveling much farther than it needs to and all the congestion problems that are found on the Internet can be found on Amazon's LAN.
For computationally intensive tasks this really isn't too big a deal but for those who rely on speedy database calls every millisecond added per request really starts adding up if you have lots of requests per page. When the CitySquares site moved from our own local servers to EC2 we noticed a 4-10x increase in query times which we attribute mainly to the high latency of the LAN. Since our servers are no longer within feet of each other, we have to contend with longer distances between instances and congestion on the LAN.
Another thing to take into consideration is the network latency for Amazon's EBS. For applications that move around a lot of data, EBS is probably a god send as it has a high bandwidth capability. However, in CitySquares' case, we wind up doing a lot of small file transfers to and from our NFS server as well as EBS volumes. So while there is a lot of bandwidth available to us, we can't really take advantage of it, especially since we have to contend with the latency and overhead of transferring many small files. Not only are small files an issue for us but we also run our MySQL database off of an EBS volume. Swapping to disk has always been a critical issue for databases but the added overhead of network traffic can wreak havoc on your database load much more than normal disk swapping. You can think of the difference in access times from disk to disk over a network as a book on a bookcase vs a book somewhere down the hall in storage room B. Clearly the second option would take far longer to find what you are looking for and that's what you have to work with if you want to have the piece of mind of persistent storage.
The last and most important limitation for us at CitySquares was the lack of an all powerful machine. The largest instance Amazon has to offer is one with just 15GB of ram and 4 virtual CPUs. In a day and age where you can easily find machines with 64GB of RAM and 16 CPUs, you are definitely limited by Amazon. In our case, it would be much easier for us just to throw hardware at our database to scale up but the only thing we have at our disposal is a paltry 15GB of RAM. How can this be the biggest machine they offer? Instead of dividing one of those machines in quarters just give me the whole thing. It just seems ludicrous to me that the largest machine they offer is something not much more powerful than the computer I'm using right now.
Long story short, just because you start using Amazon's AWS doesn't mean you can scale. Make sure your architecture is tolerant of higher latencies and can scale with lots of little machines because that's all you have to work with.
Running your own hardware Vs EC2 and RightScale — Part 2
by Justin Leider on September 16, 2008
This week I've been reminded of a very important lesson... No matter how abstracted you are from your hardware, you still inherently rely on its smooth and consistent operation.
This past week CitySquares' NFS server went down for the count and was completely unresponsive to any type of communication. In fact, the EC2 instance was so FUBAR we couldn't even terminate it from our RightScale dashboard. A post on Amazon's EC2 board was required to terminate it. Turns out the actual hardware our instance was running on had a catastrophic failure of some sort. Otherwise, at least so I'm told, server images are usually migrated off of machines running in a degraded state automatically.
Needless to say, the very reasons for deciding against running our own hardware have come back to plague us. Granted we weren't responsible for replacing the hardware but we were still affected by the troublesome machine. We weren't just slightly affected by the loss of our NFS server either. Since we are running off of a heavily modified Drupal CMS our web servers depend on having a writable files directory. As it turned out Apache just spun waiting for a response from the file system, our web services ground to a halt waiting on a machine that was never going to respond... ever. Talk about a single point of failure! A non critical component, serving mainly images and photos managed to take down our entire production deployment.
This event has prompted us to move forward with a rewrite of Drupal's core file handling functionality. The rewrite will include automatically directing file uploads to a separate domain name like csimg.com or something similar. Yahoo goes into more detail with their performance best practices. However, editing the Drupal core is generally frowned upon and heavily discouraged since it usually conflicts with the upgrade path and maintainability of the Drupal core becomes much more difficult. While we haven't stayed out of the Drupal core entirely, the changes we have made are minor and only for performance improvements. I believe it is possible to stay out of the core file handling by hooking into it with the nodeapi but it seems like more trouble than its worth.
The idea behind the file handling rewrite is to serve our images and photos directly from our Co-Location while keeping a local files directory on each EC2 instance for non user committed things like CSS and JS aggregation caching among other simple cache related items coming from the Drupal core. This rewrite will allow us to run one less EC2 instance, saving us some money as well as remove our dependence on a catastrophic single point of failure.
For the time being we have set up another NFS server. This time based on Amazon's new EBS product. I spoke about this in a previous post. One of the issues we had when the last NFS server went down was the loss of user generated content. Once the instance went down all the storage associated with that instance went down with it. There was no way to recover from the loss, it was just gone. This is just one of the many possible problems you can run into with the cloud. While on the pro side, you don't have to worry about owning your own hardware, the con side is you cant recover from failures like you can with your own hardware. This is a very distinct difference and should be seriously considered before dumping your current architecture for the cloud.