top-image

OLDER ARTICLES

After working on the migration this week and moving CitySquares.com over to a new environment we ended the week with a load test from Soasta. Starting at several hundred concurrent users to a cap of 2500 simulated users on at once we were able to really pick apart the infrastructure. Couple this with our own internal testing and tweaking with ab over the past few days I've come to the conclusion that Drupal does not scale so well when you're talking about traffic in the tens of millions of page views a month. It's just not possible to get requests in and out fast enough without continually adding more boxes to the front end. I tried every setup from mod_php to php-fcgi, worker, prefork, eaccelerator, xcache, and every configuration therein.

I realized that I was over-thinking the problem so I did a little test to just see how much overhead Drupal is adding. I can execute ten times the number of queries natively through PHP and mysql_connect than if I were to execute those queries from within Drupal with just bootstrap included. Apache answered requests and processed these several million round trips to the database much faster. I understand that as a framework there is a lot being loaded, but a whole decimal point off when dealing with requests/sec is a huge overhead.

The better part of the last few days was spent gutting the inner workings of Drupal and removing as much of it as possible to lover overhead, reduce response time, and hopefully let us scale a little further on the hardware that we have. As it is, traffic is growing a steady 40-50% month over month for a while now and we are already dealing with several million unique visitors a month. That number is growing rapidly with no ceiling in the near future and it's unfortunate that at this point the only way to make this work is to break the upgrade path of Drupal.

Going through the code base, it's easy to see where so much of the bloat comes from. With comments like: "TODO: remove this when we require at least PHP 4.4.0" there is a lot of backwards compatibility that is required to support such a large community. I am seeing more and more Frameworks replace legacy CMS systems that provide a bare bone set of tools. The agility, performance, and maintenance of a ground up site using frameworks like Symfony or Rails leads to longer development cycles and increased costs but there is a point where you stop putting a round peg in a square hole and realize that the cost of maintenance and development down the road will far exceed the up front investment.

I'm glad Drupal has been able to take us this far and I'm very grateful for the community behind it. It's a pleasant surprise that everything is still working and hopefully we can get a few more miles out of it before we are ready to do the next iteration of the site.

I have been doing a lot of auditing and clean up of database performance the last few days. We are currently in the middle of a migration and with hardware infrastructure in place it is time to go back and see what changes we can do on the code and database side of things to help bring the site up to optimal performance and lower query times. I found this gem in the MySQL Forge site; which turned out to be a great resource for MySQL tidbits.

/*
SQL script to grab the worst performing indexes
in the whole server
*/
SELECT
t.TABLE_SCHEMA AS `db`
, t.TABLE_NAME AS `table`
, s.INDEX_NAME AS `inde name`
, s.COLUMN_NAME AS `field name`
, s.SEQ_IN_INDEX `seq in index`
, s2.max_columns AS `# cols`
, s.CARDINALITY AS `card`
, t.TABLE_ROWS AS `est rows`
, ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `sel %`
FROM INFORMATION_SCHEMA.STATISTICS s
INNER JOIN INFORMATION_SCHEMA.TABLES t
ON s.TABLE_SCHEMA = t.TABLE_SCHEMA
AND s.TABLE_NAME = t.TABLE_NAME
INNER JOIN (
SELECT
TABLE_SCHEMA
, TABLE_NAME
, INDEX_NAME
, MAX(SEQ_IN_INDEX) AS max_columns
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA != 'mysql'
GROUP BY TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
) AS s2
ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA
AND s.TABLE_NAME = s2.TABLE_NAME
AND s.INDEX_NAME = s2.INDEX_NAME
WHERE t.TABLE_SCHEMA != 'mysql'                         /* Filter out the mysql system DB */
AND t.TABLE_ROWS > 10                                   /* Only tables with some rows */
AND s.CARDINALITY IS NOT NULL                           /* Need at least one non-NULL value in the field */
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 /* Selectivity < 1.0 b/c unique indexes are perfect anyway */
ORDER BY `sel %`, s.TABLE_SCHEMA, s.TABLE_NAME          /* Switch to `sel %` DESC for best non-unique indexes */
LIMIT 10;

To audit just one database if you are running on a server with several different databases, just adjust the where clause to WHERE t.TABLE_SCHEMA = 'mytable'. This would have been very useful when working with the cluster to recover memory/space from indexes that aren't being used and to optimize queries to hit indexes that are meaningful.

There isn't too much out there for easy reliable mysql load balancing. I recently built a master/master mysqld setup and needed automatic roll over on the software side. After going through the options I ended up going with MySQL Proxy 0.60 for low overhead, ease of setup, and ease of maintenance. I installed the proxy on each of my web servers (already load balanced through hardware) and configured the code to just use the local proxy. A quick start for RHEL systems, you can get it up and running in just a few simple steps:

  1. Download the latest version at http://dev.mysql.com/downloads/mysql-proxy/.
  2. yum install glib2-devel.x86_64 ncurses-devel.x86_64 libevent-devel.x86_64 mysql-devel.x86_64
  3. ./configure -without-lua (for a straight proxy, you will not need LUA and this will save you having to install additional dependencies).
  4. make && make install

Easy enough to get it installed in the system, but I noticed that the default parameters and as with most software is rather wide open so be sure to lock down the ip addresses you are listening on.

mysql-proxy --proxy-backend-addresses=192.168.1.101:3306 --proxy-backend-addresses=192.168.1.102:3306 --admin-address=127.0.0.1:4041 --proxy-address=127.0.0.1:4040 --proxy-skip-profiling --daemon

The admin address and proxy address will be accessible by anyone hitting your server unless you lock it down to your local host. Now configure your web servers to use localhost:4040 to connect to MySQL and all should be working fine. You can find additional configuration parameters, admin commands, and LUA scripts at http://forge.mysql.com/wiki/MySQL_Proxy.

Limitations of EC2 as a web platform:

  • Price- An m1.xlarge instance will run you ~$600 with data transfer costs. Managed hosting solutions run cheaper especially if you plan on purchasing in bulk. The grid is designed for on-demand computation and not as a cost efficient web services.
  • Configuration- There are a limited number of options and you will not be able to tailor the hardware to your application. Databases over 10GB of size will have performance issues since that is the memory cap.
  • Network storage- The primary disks offers limited storage, additional volumes will need to be attached across the network and at an additional cost.
  • Software - No hardware based solutions for load balancing or custom application servers. The model is software driven so all needs must be met with a software solution. In a managed hosting or collocation solution you will at least have the option of adding additional hardware and having a private network. No dedicated switching, routers, firewall, or load balancers.



EC2 might be right for you if:

  • Distribution Awareness- Your application was designed to scale horizontally from the get-go and you can take advantage of grid computing.
  • Research and Development- EC2 & Rightscale will allow for you to bring up new servers, test configuration, and scale quickly. If you are not sure what your hardware demands will be or the scope of the project it will allow for some flexibility to get this right before committing to rather lengthy contracts with other hosting options.
  • Disaster Recovery- If you need an off-site mirror for your site that you can keep dormant and activate as needed.



Over the past several months I have been doing extensive development using Amazon's EC2 as my hardware infrastructure. I was tasked with taking CitySquares.com from a New England area hyper-local search and business directory to a national site in a few months. Due to the memory limitations of EC2 instances, m1.xlarge only providing 15GB the jump from a 15GB database to anything larger becomes very costly. When everything was able to be contained in two servers in a master/slave environment we were able to provide redundancy, performance, and easy management when working with the database. The final estimations of the national roll out would put our core data at 50GB. Far too large for any one EC2 instance. Going to disk was not an option as everything works off of EBS attached storage and that any disk writes means traveling over the network. An additional overhead which degrades performance even more when switching off RAM. Then, there was also the nature of the data, which means that at any page load, any piece of data could be requested.

With OS overhead, index storage, and ndb overhead, each x1.mlarge instance gave about 12GB of usable storage. Include replication, and it comes down to 6GB of storage per node. To store a ~50GB database that I had planned on requires 8 storage nodes, two management servers, and two mysqld api servers. This is where it became important to understand the advantages of vertical scaling versus horizontal scaling. EC2 provided fast horizontal scaling and configuration. Servers can be launched on demand and their configurations scripted. While I appreciated that aspect of cloud computing and being able to bring that many servers up and configure each one relatively quickly I really just needed two decent boxes with 64GB of RAM in each and a master/slave setup. The operations costs for the cluster was $6000/month, a hefty bill considering I could buy all the hardware needed to run the cluster in just a few months of paying for EC2.

Twelve servers later and a working cluster we were able to successfully roll out our MySQL Cluster with minimal performance loss. A lot of it was due to tweaking every query top to bottom. The site runs on a Drupal core, which meant a lot of the queries were not designed with distribution awareness both from code in house and the core. This was another added growing pain since the network overhead of running 12 servers on shared resources, with mediocre latency, and throughput limitations was amplifying flaws in the database design and every poorly designed query and join degraded performance significantly.

To give credit where credit is due, EC2 did allow us to scale the site up rather quickly. We were able to test server configurations, new applications, and have easy management. It would not have been possible for us to push out the data, handle influx of new traffic, and expand as fast as we had without it. Long term however, it made absolutely no sense that once we were finished scaling up, to stay on EC2. It is a great platform for start-ups to be able to configure and launch servers for their service or application and grow rapidly.

As a database platform, until EC2 offers more configuration options in it's hardware and the ability to increase memory, the cap of 15GB will make EC2 problematic for any database that plans on growing past that. It is important to understand your application and database needs before considering EC2. It is no surprise to me that even with the RightScale interface, and easy management of EC2 web sites are reluctant to switch off their own hardware or managed hosting.

Both Sun and Continuent are pursuing MySQL Clustering on cloud computing. As of this post, Continuent for EC2 was still in closed beta and testing and Sun is doing their own research into offering more support for database clusters on compute clouds. Maybe in the future this can be something to revisit, but it would require more ndb configuration options on the network layer to cope with shared bandwidth and additional hardware configurations by Amazon (someday).

The 6.4 release which is in beta now offers new features which would make MySQL Clustering more attractive on cloud computing architecture:

  • Ability to add nodes and node groups online. This will allow the database to scale up without taking the cluster down.
  • Data node multithreading support. The m1.xlarge instance comes with 4 cores.

If you are setting up a MySQL cluster, the following resources will help get you up and running quickly:

I've been asked this and similar questions quite a bit lately. But before I delve into the answer to this I want to lay the foundation and ask you a question. This one question should play a large part in your final assessment to go with EC2 or not. The question you should ask yourself is:

How quickly do you actually need to scale either up or down?

The answer to this will likely influence the correct solution to your problems. The following bullet point list is how I classify levels of scalability, each one comes with its own pros and cons but generally the quicker you need something the more expensive it is going to be.

  • Immediate - within minutes - EC2 or other cloud computing networks
  • Fast - within days to a week - Managed Hosting, Rackspace, The Planet, etc
  • Average - within weeks to a month - Own your own hardware, Dell, HP, IBM, etc
  • Corporate - within months/years - Good Luck

With this in mind, everyone hears the hype of EC2, with its scalability, fully managed hardware and virtualization but there really aren't that many people out there describing their experiences with it. When we made the decision to go with EC2 we did our research and due diligence before making the switch. There wasn't much to go on but the few articles and blog posts we did read were all positive. I guess we all got caught up in the hype here as well.

Even after all our research it turns out that going with EC2 was one of the poorer IT decisions we have made. EC2 has turned out to be more expensive, more difficult to implement and with poorer performance than we had ever expected even with our worst case estimations. To top it all off, we didn't fully utilize the benefits of going with EC2 which was immediate scalability. Our traffic is relatively predictable and grows or shrinks in manageable percentages and can be scaled up within days instead of minutes. We never have any massive spikes in our traffic either up or down. Even if we did have spikes we are limited by our MySQL cluster.

While we had to rethink a lot of our architecture to create a more horizontal platform instead of the traditional vertical scaling, MySQL was by far our biggest bottleneck. The source of the problem is rooted in Amazon's preset machine size. While they have done an adequate job of offering different types of instances with more memory in one line and more computational power in the other you are still limited to what they are offering. With the large database we have and the latencies between the instances and their permanent storage we were forced to keep as much of our database cached in RAM. Now this shouldn't have been too big a deal. Just get a machine with a ton of RAM. Well, unfortunately Amazon's biggest instance only offered us a maximum of 15GB. Needless to say this was not sufficient and forced us to adopt a cluster solution. This in and of itself is not ideal especially when you should be able to run off a single box with 32GB of RAM and access to fast local disks. However, it took us twelve (12) m1.xlarge instances to reach the level of performance and availability we desired. Not to mention the network IO latency between node and disk storage and node to node adding insult to injury.

While the speed and size of the cluster was not desirable, it worked. However, we had to completely forfeit any sort of scalability to achieve a working database. To my knowledge there is no way to quickly and easily boot up more instances of MySQL to supplement a live cluster. In order for us to add more capacity we would have to perform a rolling reboot of every machine in the cluster. Its unfortunate that databases were not designed with EC2 in mind.

However, there are companies who are trying to tap into this pain point. We were looking very intently at a company called Continuent who produces a MySQL cluster monitoring and management tool. Unfortunately, as of Jan 2009 the product was still in private beta and was unavailable to us. This tool would have allowed us to add nodes to the cluster on the fly without having to take it down in the process. Although, even then with this extra tool, which wasn't cheap, you still couldn't scale down the cluster without taking it off-line. As far as I am concerned, if you are already using the largest instance available to you (an m1.xlarge or c1.xlarge), there is no way to vertically scale up a database with EC2. Instead you are forced into a less than ideal environment for hosting a horizontal architecture which could have serious consequences for your code base and SQL queries.

To be honest, EC2 offers a lot of benefits that are hard to come by with other solutions. EC2 is great for companies doing lots of non-real-time activities such as batch and queued processing. Companies who have a small database that can be cached in RAM and replicated easily will also benefit from EC2, just boot up a bunch of instances and go to town. However, the bottom line is if you have fairly consistent usage patterns and your applications are performance sensitive then there are much faster and more cost effective ways of abstracting your hardware requirements. We at citysquares are in the process of moving off of EC2 and onto a managed hosting platform. We still enjoy the benefits of leased hardware like we had with EC2 and the ability to quickly add new hardware. Granted, more servers aren't available to us at the drop of a hat but a couple days lead time to get another box up and running is more than sufficient for us. Not only that but we also have a whole team of IT people working with us to help alleviate our burden of supporting the entire hardware/software stack. We can now focus on what we do best which is our application.

Keep in mind that there is no concrete answer as to whether EC2 or cloud computing in general will work for you or not. You need to determine if the capacity and latencies of the pre-determined instance sizes will meet your growing infrastructure needs. For us the bitter answer was a resounding no. We were able to spec out a solution in a fully managed hosting environment for about half the monthly cost of EC2 while increasing the performance of our application significantly.

So, is Amazon's EC2 right for you?

Page 4 of 7:« First« 1 2 3 4 5 6 7 »
bottom-img