<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Derivante &#187; Web Technology</title>
	<atom:link href="http://www.derivante.com/tag/web-technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.derivante.com</link>
	<description>to obtain or receive from a source</description>
	<lastBuildDate>Mon, 26 Apr 2010 18:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Running your own hardware Vs. EC2 and RightScale</title>
		<link>http://www.derivante.com/2008/08/20/running-your-own-hardware-vs-ec2-and-rightscale/</link>
		<comments>http://www.derivante.com/2008/08/20/running-your-own-hardware-vs-ec2-and-rightscale/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 20:13:52 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[citysquares]]></category>
		<category><![CDATA[Cloud Computing]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[Flexibility]]></category>
		<category><![CDATA[Gentoo]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[rightscale]]></category>
		<category><![CDATA[s3]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Server Hardware]]></category>
		<category><![CDATA[Servers]]></category>
		<category><![CDATA[Site Architecture]]></category>
		<category><![CDATA[Xen]]></category>

		<guid isPermaLink="false">http://justinleider.wordpress.com/?p=21</guid>
		<description><![CDATA[A couple weeks ago I began working with EC2 and RightScale in preparation of our big IT infrastructure change over. Ill start by giving a brief overview of our hardware infrastructure. Currently we're running the CitySquares' website on our own (&#8230;)</p><p><a href="http://www.derivante.com/2008/08/20/running-your-own-hardware-vs-ec2-and-rightscale/">Read the rest of this entry &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p style="margin-bottom:0;">A couple weeks ago I began working with <a title="Amazon's Elastic Compute Cloud" href="http://aws.amazon.com/ec2" target="_blank">EC2</a> and <a title="RightScale" href="http://rightscale.com" target="_blank">RightScale</a> in preparation of our big IT infrastructure change over. Ill start by giving a brief overview of our hardware infrastructure. Currently we're running the <a title="CitySquares Online -- Hyper Local Neighborhood Search" href="http://citysquares.com" target="_blank">CitySquares'</a> website on our own hardware in a <a title="Somerville Businesses" href="http://ma.citysquares.com/somerville" target="_blank">Somerville</a> co-location not too far from our headquarters in Boston's trendy <a title="Boston's trendy South End neighborhood businesses" href="http://ma.citysquares.com/boston/south-end" target="_blank">South End</a> neighborhood.</p>
<p style="margin-bottom:0;">From the very beginning our contract IT guy set us up with a extremely robust and flexible IT infrastructure. It consists of a few machines running <a title="Xen Hypervisor" href="http://www.xen.org/" target="_blank">Xen</a> Hypervisors with <a title="Gentoo Linux" href="http://www.gentoo.org/" target="_blank">Gentoo</a> as the main host OS. Running Gentoo allows us to be as efficient as possible by specifically optimizing and compiling only the things we need. While this is a good step, it is Xen that really makes the big difference. It allows us to trade around resources as we see fit, more memory here, more virtual CPUs there, all can be done on the fly. For a startup or any company with limited resources this is rather essential. You never know where you are going to need to allocate resources in the months to come.</p>
<p style="margin-bottom:0;">While this is all well and good, we are still limited when it comes to scaling with increasing traffic or adding additional resource intensive features. We have a set amount of available hardware and adding more is an expensive upfront capital investment. Not only that but in order for us to really begin to take advantage of Xen and use it to its full potential we were presented with an expensive option, it required the purchase of a <a title="SAN Storage Area Network" href="http://en.wikipedia.org/wiki/Storage_area_network" target="_blank">SAN</a> and more servers. For those in the industry I don't think I need to mention that these get expensive in a hurry. This would have been a huge upfront cost for us, one we didn't want to budget for. The second option, which is the one we eventually went with was to drop our current hardware solution and make the plunge into cloud computing with Amazon's EC2.</p>
<p style="margin-bottom:0;">Here I am now. A couple of weeks into the switch with a lot of lessons learned. There are definitely pros and cons for each platform, either going with EC2 or rolling your own architecture. Before I get into the details I want to make clear that there are many factors involved in choosing a technology platform. I am only going to scratch the surface, touching upon the major pros and cons with respect to my own opinions with best interest for CitySquares in mind.</p>
<p style="margin-bottom:0;">Let me begin by starting with the pros for running your own hardware:</p>
<ul>
<li>
<p style="margin-bottom:0;">The biggest pro is most definitely 	persistence across reboots. I can not stress the importance of this 	one. You really take for granted the ability to edit a file and 	expect it to be there the next time the machine is restarted.</p>
<ul>
<li>
<p style="margin-bottom:0;">You only need to configure the 		software once. Once its running you don't really care what you did 		to make it work. It just works, every time you reboot.</p>
</li>
<li>UPDATE 8/21/08: <a title="Amazon releases the much anticipated Elastic Block Store" href="http://justinleider.com/2008/08/21/amazons-ebs-elastic-block-store/" target="_blank">Amazon releases persistent storage</a>.</li>
</ul>
</li>
<li>
<p style="margin-bottom:0;">Complete and utter control over 	everything that is running. This extends from the OS to the amount 	of RAM, CPU specs, hard drive specs, NICs, etc. The ability to have 	a economy or performance server is all up to you.</p>
</li>
<li>
<p style="margin-bottom:0;">Rather stable and unchanging 	architecture. Server host keys stay the same, the same number of 	servers are running today as there were yesterday and as there will 	be tomorrow.</p>
</li>
<li>
<p style="margin-bottom:0;">Reboot times. For those times when 	something is just AFU you can hit the reset button and be back up 	and running in a few minutes.</p>
</li>
<li>
<p style="margin-bottom:0;">You can physically touch it... Its 	not just in the cloud somewhere.</p>
</li>
</ul>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">Some cons for running your own hardware:</p>
<ul>
<li>
<p style="margin-bottom:0;">Companies with limited resources 	usually end up with architectures that exhibit single points of 	failure.</p>
<ul>
<li>
<p style="margin-bottom:0;">As an aside, you can be plagued 		by hardware failures at any time. This usually is accompanied by 		angry emails, texts and calls at 3am on Saturday morning.</p>
</li>
</ul>
</li>
<li>
<p style="margin-bottom:0;">Limited scalability options. For a 	rapidly expanding and growing website, the couple weeks it takes to 	order and install new hardware can be detrimental to your potential 	traffic and revenue stream.</p>
</li>
<li>
<p style="margin-bottom:0;">Management of physical pieces of 	hardware. Its a royal pain to have to go to a co-location to upgrade 	or fix anything that might need maintenance. Not to mention the 	potential down time.</p>
<ul>
<li>
<p style="margin-bottom:0;">Also, there are many hidden costs 		associated with IT maintenance.</p>
</li>
</ul>
</li>
<li>
<p style="margin-bottom:0;">Up front capital expenditures can 	be quite costly. This is especially true from a cash flow 	perspective.</p>
</li>
<li>
<p style="margin-bottom:0;">Servers and other supporting 	hardware are rendered obsolete every few years requiring the 	purchase of new equipment.</p>
</li>
</ul>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">These pros and cons for running your own hardware are pretty straight forward. Some people might mention managed hosting solutions which would mostly eliminate some of the cons related to server maintenance and hardware failures. However, this added service comes with an added price tag for the hosting. Whether it is right for you or your company is something to look into. We decided to skip this intermediary solution and go straight to the latest and greatest solution which is cloud computing. To be specific we sided with Amazon's EC2 (Elastic Compute Cloud) using RightScale as our management tool.</p>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">Some of the pros for using EC2 in conjunction with the RightScale dashboard are as follows:</p>
<ul>
<li>
<p style="margin-bottom:0;">Near infinite resources (Server 	instances, Amazon's S3 Storage, etc) available nearly 	instantaneously. No more Slashdot DoS attacks if everything is 	properly configured and set to introduce more servers automatically. 	(RightScale Benefit)</p>
</li>
<li>
<p style="margin-bottom:0;">No upfront costs, everything is 	usage based. In the middle of the night if you are only utilizing 	one server thats all you pay for. Likewise, if during peak hours 	you're running twenty servers you pay for those twenty servers. 	(Amazon Benefit, RightScale is a monthly service)</p>
</li>
<li>
<p style="margin-bottom:0;">No hardware to think of. If fifty 	servers go down at Amazon we wont even know about it. No more angry 	calls at 3am. (Amazon Benefit)</p>
</li>
<li>
<p style="margin-bottom:0;">Multiple availability zones. This 	allows us to run our master database in one zone which is completely 	separate from our slave database. So if there is an actual fire or 	power outage in one zone the others will theoretically be 	unaffected. The single points of failure mentioned before are a 	thing of the past and this is just one example. (Amazon Benefit)</p>
</li>
<li>
<p style="margin-bottom:0;">Ability to clone whole deployments 	to create testing and development environments that exactly mirror 	the current production when you need them. (RightScale Benefit)</p>
</li>
<li>
<p style="margin-bottom:0;">Security updates are taken care of 	for the most part. RightScale provides base server images which are 	customized upon boot with the latest software updates. (RightScale 	Benefit)</p>
</li>
<li>
<p style="margin-bottom:0;">Monitoring and alerting tools are 	very good and highly customizable. (RightScale Benefit)</p>
</li>
</ul>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">Some of the cons for using EC2 and RightScale:</p>
<ul>
<li>
<p style="margin-bottom:0;">No persistence after reboot. I 	can't stress this one enough! All local changes will be wiped and 	you'll start with a blank slate!</p>
<ul>
<li>
<p style="margin-bottom:0;">All user contributed changes must 		be backed up to a persistent storage medium or they will be lost! 		We back up incrementally every 15 minutes with a full backup every 		night.</p>
</li>
<li>UPDATE 8/21/08: <a title="Amazon releases the much anticipated Elastic Block Store" href="http://justinleider.com/2008/08/21/amazons-ebs-elastic-block-store/" target="_blank">Amazon releases persistent storage</a>.</li>
</ul>
</li>
<li>
<p style="margin-bottom:0;">Writing scripts to configure 	everything upon boot is a time consuming and tedious process 	requiring a lot of trial and error.</p>
</li>
<li>
<p style="margin-bottom:0;">Every reboot takes approximately 	10-20 minutes depending on the number and complexity of packages 	installed on boot. Making the previous bullet point even that much 	more painful.</p>
</li>
<li>
<p style="margin-bottom:0;">A few of the pre-configured 	scripts are written quite well. The one for MySQL is as good as they 	get. You upload a config file complete with special tags for easy on the 	fly regular expression customization. The Apache scripts on 	the other hand are about as bad as they get. Everything must be 	configured after the fact.</p>
<ul>
<li>
<p style="margin-bottom:0;">With Apache however, you'll be writing regular expressions to 		match other regular expressions. Needless to say is a royal pain and you usually end up with unreadable gibberish.</p>
</li>
</ul>
</li>
</ul>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">So there you have it, take it as you wish. For CitySquares, EC2 and RightScale were the best options. It allows us to scale nearly effortlessly once configured. It is also a much cheaper option up front where as owning your own hardware is generally cheaper in the long run. We did trade a lot of the pros of owning your own hardware to get the scalability and hardware abstraction of EC2. It was a tough decision for us to switch away from our current architecture but in the end it will most likely be the best decision we've made. The flexibility and scalability of the EC2 and RightScale platform are by far the biggest advantages to switching and in the end its what <a title="CitySquares Online -- Hyper Local Neighborhood Search" href="http://citysquares.com" target="_blank">CitySquares</a> needs.</p>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2008/08/20/running-your-own-hardware-vs-ec2-and-rightscale/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Part 2: An Architecture Overview &#8212; Apache, MySQL, Memcached, SQLite</title>
		<link>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/</link>
		<comments>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/#comments</comments>
		<pubDate>Thu, 24 Jul 2008 19:56:41 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[citysquares]]></category>
		<category><![CDATA[horizontal architecture]]></category>
		<category><![CDATA[horizontal database]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[sqlite]]></category>
		<category><![CDATA[xcache]]></category>

		<guid isPermaLink="false">http://justinleider.wordpress.com/?p=11</guid>
		<description><![CDATA[In my last post I mentioned the numerous technologies which were on tap for the upcoming version of CitySquares. This installment will continue to define an overview of the underlying architecture and begin to dig a little deeper into the (&#8230;)</p><p><a href="http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/">Read the rest of this entry &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 8.5in 11in; margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<p style="margin-bottom:0;">In my last post I mentioned the numerous technologies which were on tap for the upcoming version of <a title="CitySquares Online -- Hyper Local Neighborhood Search" href="http://citysquares.com" target="_blank">CitySquares</a>. This installment will continue to define an overview of the underlying architecture and begin to dig a little deeper into the actual implementation of the technologies. The idea and focus of this new architecture is aimed at creating a much more stable and scalable platform for us to work with. Before I get into the details you'll see Ive provided a graphic representation of how the architecture will be laid out.</p>
<p style="margin-bottom:0;">
<div id="attachment_12" class="wp-caption aligncenter" style="width: 430px"><a href="http://justinleider.files.wordpress.com/2008/07/architecture-overview.jpg"><img class="size-full wp-image-12" src="http://justinleider.files.wordpress.com/2008/07/architecture-overview.jpg" alt="A visual representation of a horizontal web architecture." width="420" height="300" /></a><p class="wp-caption-text">A visual representation of a horizontal web architecture.</p></div>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">Bear with me as I explain the work flow behind this graphic as it is not 100% clear from the visual representation. First off, I run <a title="Ubuntu Linux" href="http://www.ubuntu.com/" target="_blank">Ubuntu Linux</a> which is great for just about everything I need, except for creating any sort of graphics, so I apologize in advance for the lackluster graphic. As you can see, there are a few different layers: users, <a title="HA Proxy -- Load Balancing " href="http://haproxy.1wt.eu/" target="_blank">HA Proxy</a>, Apache, <a title="High performance caching system" href="http://www.danga.com/memcached/" target="_blank">Memcached</a>, <a title="SQLite -- A small fast file based database" href="http://www.sqlite.org/" target="_blank">SQLite</a> and finally MySQL labeled as databases.</p>
<p style="margin-bottom:0;">First and foremost are our beloved users, which whom without we would have no need for a website. Starting from the beginning, the users request a page from CitySquares, from there their request is passed through one of two HA Proxy servers. The sole purpose of these two machines is to load balance the incoming requests among all our Apache web servers and serve as a failsafe for one another. Once the user's request has been accepted and forwarded along to Apache we actually begin to process the request.</p>
<p style="margin-bottom:0;">The Apache servers run PHP and XCache modules. The PHP part I feel is fairly straight forward and out of the scope of this post so I will skip that part of the architecture. XCache however, is used in conjunction with and is an enhancement to PHP. More specifically XCache is an opcode optimizer and cache. It works by removing the compilation time of PHP scripts by caching the compiled and optimized state of the PHP scripts directly in the shared memory of the Apache server. This compiled version can increase page generation times by up to 500%, speeding up overall response time and reducing server load.</p>
<p style="margin-bottom:0;">Just as with all dynamic websites most if not all the actual data is stored in databases. Gone are the days of flat files with near zero processing required. Databases are the new workhorses of the web world and as such usually become the bottle neck of the overall system. CitySquares is in a somewhat unique position, nearly all our page loads have quite a bit of location and distance based processing and nearly all of this is done in our MySQL database. So while our Apache servers are sitting idle waiting for responses from their queries, the DB is preforming the brunt of the work calculating distances between objects and the like.</p>
<p style="margin-bottom:0;">We can reduce this bottleneck in a couple of different ways, the first of which is object caching. We will use Memcached to cache objects returned from the database. Say for example, we know the distance between two businesses. We know with a fair amount of certainty that those two businesses are going to be in the same place they were an hour ago, just as they were a week ago and as they will be a day from now. So we can cache this information with an expiration time of a couple days, thus saving ourselves the expense of calculating the distance between them on every page load. Of course if a user comes by and changes the location of one of these businesses, we can expire the object in cache and replace it with a newly calculated object straight from the database on the subsequent page load. These expensive queries require large table scans and mathematical formulas calculations on every row. These query results can be cached to free up the database and allow it to do what it does best. Store and retrieve data.</p>
<p style="margin-bottom:0;">In the case where we cant find the data in Memcached, either because it doesn't yet exist or has expired we will turn to our databases. We must first query a SQLite instance which is the gate keeper between Apache and the numerous databases we have. By having a separate lookup table we can essentially divide and parcel out our data sets on a table by table basis even down to an entry by entry basis. Depending on the type of data we are requesting SQLite will provide us with the location of one database or another to query for our data.</p>
<p style="margin-bottom:0;">One could argue that this just adds another layer of latency and they would be correct. However, as scalability becomes an issue you will find that adding database replication generally results in diminishing returns.  As new servers are brought online the overhead associated with replicating writes across all the replicated servers becomes choking and creates its own bottleneck. On the other hand, with a lookup table and a horizontal database architecture we don't have to worry about database replication nearly as much. You can just as easily divide your data sets into different databases. Now how you go about this varies greatly depending on your data. For CitySquares the solution turns out to be rather simple. Everything we do is location specific so it only makes sense that each data set is only as big as its parent city. Theoretically every city and all the data related to said city could reside in its own database. As you can probably guess we are only performance limited by the biggest cities, <a title="Manhattan on CitySquares" href="http://ny.citysquares.com/manhattan" target="_blank">Manhattan</a>, <a title="Brooklyn on CitySquares" href="http://ny.citysquares.com/brooklyn" target="_blank">Brooklyn</a>, etc. In these few cases we can always fall back to bigger and better servers and or replication if necessary.</p>
<p style="margin-bottom:0;">Just as our database has become a bottleneck in our current site, our search engine is also one as well, just to a lesser extent. We can take the lessons learned from our horizontal database architecture and apply it to the search engine architecture as well. By dividing our data sets into logical partitions we can keep our data from getting too large and unwieldy;  And with these smaller data sets we can reduce or remove all together the overhead associated with replicating data over multiple machines.</p>
<p style="margin-bottom:0;">While this solution sounds great, it won't be worth the effort if every time a programmer wanted to access some data they would be required to check Memcached, then SQLite and then finally MySQL for every query. In order for this to be feasible from a programmers standpoint the programmer should never have to think about this underlying architecture. This of course I will discuss in greater detail in the upcoming installments. Stay Tuned.</p>
<p style="margin-bottom:0;">
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
