<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Derivante &#187; architecture</title>
	<atom:link href="http://www.derivante.com/tag/architecture/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.derivante.com</link>
	<description>to obtain or receive from a source</description>
	<lastBuildDate>Mon, 26 Apr 2010 18:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>Extensible PHP Caching Library</title>
		<link>http://www.derivante.com/2010/04/23/extensible-php-caching-library/</link>
		<comments>http://www.derivante.com/2010/04/23/extensible-php-caching-library/#comments</comments>
		<pubDate>Fri, 23 Apr 2010 17:31:43 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[Cache]]></category>
		<category><![CDATA[Class]]></category>
		<category><![CDATA[Drupal]]></category>
		<category><![CDATA[Flexibility]]></category>
		<category><![CDATA[Library]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=765</guid>
		<description><![CDATA[Everyone has probably already seen every caching class there ever was and ever will be. However, when I was searching for a class that could easily be switched from one data store to the next I couldn't find a thing. Every caching class I seemed to come by was written specifically for a single back [...]]]></description>
			<content:encoded><![CDATA[<p>Everyone has probably already seen every caching class there ever was and ever will be. However, when I was searching for a class that could easily be switched from one data store to the next I couldn't find a thing. Every caching class I seemed to come by was written specifically for a single back end and with their predefined static keys/column names. Well, those silly restrictions have come to an end!</p>
<p>I present to you the <a title="PHP Caching Library" href="http://github.com/jleider/extensible-php-caching-library" target="_blank">Extensible PHP Caching Library</a> (Hosted at GitHub) - This collection of classes makes it easy to customize key or column names to your needs as well as switch from one data store to another. This extensibility is baked into the core of the library via an abstract class. This abstract Cache class takes an associative array of keys/columns and determines how to use those keys based on the type of cache back end you are using. For example: Suppose you are using a RDBMS such as MySQL. In this case the associative array will be parsed and the query built such that the key is the column name and the value is what you want to query on. However, if you chose to use a NoSQL key/value store such as Memcache then the associative array is sorted and imploded to create a single string as the key.</p>
<p>Since we always specify the key as an associative array, switching between different data stores is as simple as changing the Class name from say MCache to SQLCache in your code. Nothing else is required to change data stores. Keys, expiration dates, and data processing all stay the same and all functions are called with the same arguments. This functionality is attributed to the abstract base Cache class which guides and regulates the inheriting classes.</p>
<p>Lets get to some examples:</p>
<p>In the following example we create a new SQLCache object and pass it our associative array of keys and values. Check if there is cached data, if there isnt then do something to generate the data and then cache it. Notice how you dont have to pass the key in again when setting the cache. The key is stored within the object so you can get and set as many times as you need without having to set the key every time. Lastly we delete the cache.</p>
<pre class="php"><span style="color: #808080; font-style: italic;">// Get Cache</span>
<span style="color: #0000ff;">$cache</span> = <span style="color: #000000; font-weight: bold;">new</span> SQLCache<span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'column1'</span> =&gt; <span style="color: #cc66cc;">123</span>, <span style="color: #ff0000;">'column2'</span> =&gt; <span style="color: #ff0000;">'blah'</span>, … <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
<span style="color: #0000ff;">$output</span> = <span style="color: #0000ff;">$cache</span>-&gt;<span style="color: #006600;">getCache</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;
<span style="color: #b1b100;">if</span><span style="color: #66cc66;">&#40;</span>!<span style="color: #0000ff;">$output</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
  <span style="color: #808080; font-style: italic;">// Do something to generate data</span>
  <span style="color: #0000ff;">$output</span> = <span style="color: #ff0000;">'some datas'</span>;
  <span style="color: #808080; font-style: italic;">// Set Cache</span>
  <span style="color: #b1b100;">if</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$cacheNow</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span>
    <span style="color: #808080; font-style: italic;">// Force a write to cache now</span>
    <span style="color: #0000ff;">$cache</span>-&gt;<span style="color: #006600;">setCache</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$output</span>, <span style="color: #ff0000;">'+1 day'</span>, <span style="color: #000000; font-weight: bold;">true</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span> <span style="color: #b1b100;">else</span> <span style="color: #66cc66;">&#123;</span>
    <span style="color: #808080; font-style: italic;">// Setting no time defaults to time() + 86400, one day from now.</span>
    <span style="color: #0000ff;">$cache</span>-&gt;<span style="color: #006600;">setCache</span><span style="color: #66cc66;">&#40;</span><span style="color: #0000ff;">$output</span><span style="color: #66cc66;">&#41;</span>;
  <span style="color: #66cc66;">&#125;</span>
<span style="color: #66cc66;">&#125;</span>
&nbsp;
<a style="text-decoration: none;" href="http://www.php.net/print"><span style="color: #000066;">print</span></a> <span style="color: #0000ff;">$output</span>;
&nbsp;
<span style="color: #808080; font-style: italic;">// Delete the cache</span>
<span style="color: #0000ff;">$cache</span>-&gt;<span style="color: #006600;">deleteCache</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>If ever you needed to update this caching strategy to include Memcached the only change would be to change SQLCache to Mcache:</p>
<pre class="php"><span style="color: #0000ff;">$cache</span> = <span style="color: #000000; font-weight: bold;">new</span> SQLCache<span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'column1'</span> =&gt; <span style="color: #cc66cc;">123</span>, <span style="color: #ff0000;">'column2'</span> =&gt; <span style="color: #ff0000;">'blah'</span>, … <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
to
<span style="color: #0000ff;">$cache</span> = <span style="color: #000000; font-weight: bold;">new</span> MCache<span style="color: #66cc66;">&#40;</span><a style="text-decoration: none;" href="http://www.php.net/array"><span style="color: #000066;">array</span></a><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'key1'</span> =&gt; <span style="color: #cc66cc;">123</span>, <span style="color: #ff0000;">'key2'</span> =&gt; <span style="color: #ff0000;">'blah'</span>, … <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>The column and key names are arbitrary and may be set to anything you want to name it. For SQL caches make sure you create a <a title="Example Cache Table Schema" href="http://github.com/jleider/extensible-php-caching-library" target="_blank">cache table</a> that has the corresponding column names and that they are indexed optimally.</p>
<p>While I have only created classes for SQL (MySQL - since there is a LIMIT 1), Memcached and File based caching, the base class can be extended to include any key/value store or any database with columns (MongoDB, CouchDB, Tokyo, Postgress, Oracle, etc). Just update the back end calls and you are good to go.</p>
<p>For anyone who updates or adds functionality please let me know so I can give credit where credit is due. This library is available under the <a title="GNU Lesser General Public License" href="http://www.gnu.org/licenses/lgpl.html" target="_blank">LGPLv3</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2010/04/23/extensible-php-caching-library/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>100x Increase in SOLR Performance and Throughput</title>
		<link>http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/</link>
		<comments>http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/#comments</comments>
		<pubDate>Mon, 27 Apr 2009 20:28:27 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[SOLR]]></category>
		<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=341</guid>
		<description><![CDATA[Is your SOLR installation running slower than you think it should? Performance, throughput and scalability not what you are expecting or hoping? Do you constantly see that others have much higher SOLR query performance and scalability than you do? All it might take to fix your woes is a simple schema or query change. The [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft" style="margin: 5px;" title="SOLR" src="http://www.derivante.com/wp-content/uploads/2009/05/solr_fc.jpg" alt="" width="170" height="94" />Is your SOLR installation running slower than you think it should? Performance, throughput and scalability not what you are expecting or hoping? Do you constantly see that others have much higher SOLR query performance and scalability than you do? All it might take to fix your woes is a simple schema or query change.</p>
<p>The following scenario I am about to describe is proof positive that you should always take the time to understand the underlying functionality of whatever operating system, programming language or application you are using. Let my oversight and 'quick fix solution' be a lesson to you, it is almost always worth the upfront cost of doing something right the first time so you don't have to keep revisiting the same issue.<br><br />
<span id="more-341"></span><br />
Before I delve into the nuances of SOLR let me first give you some background on what took place over the last half year at CitySquares. Back in the fall of last year the CitySquares website began experiencing an exponential growth in traffic. This growth was due to an expansion of its IYP (Internet Yellow Page) services into the New England and Metro New York areas. Prior to and during the beginning of the first wave of traffic growth, every business listing was powered by very large MySQL queries including a couple joins. The queries themselves weren't all that complex but they were big and unwieldy with joins on very large tables and lots of columns in the result sets. In some of the larger cities covered at the time (Manhattan, Bronx, Queens, Boston, etc) there were up to 100,000 rows of data that needed to be sorted before returning a rather small subset (20-40 rows) for each business listing page load. While this wasn't a big deal when CitySquares was still a niche Boston centric destination, it quickly became a huge burden on the MySQL servers. Some of these queries were so big the servers would run out of memory trying to crunch through a 3GB temp table and start thrashing the disks to server a request for Manhattan. We needed a better solution and quick.</p>
<p>Luckily for us we had already implemented a SOLR search engine with all the necessary data indexed from our database initially with the sole intent that search result sets shouldn't have to query the database. This worked to our advantage since it was very easy for us to modify the code base to query SOLR instead of MySQL. Both result sets were formatted as an object with the same field names and all. It was a perfect drop in replacement.</p>
<p>The SOLR solution we implemented utilized SOLR's wild card q.alt=*:* field to select all documents while applying filter query (fq) on that set to get all documents related to our filter. It was a huge win for us at the time. Not only were the queries faster than the MySQL ones, but the SOLR servers could handle more of these queries without even coming close to exhausting the server's resources. This quick and dirty solution was satisfactory for the next few months until CitySquares' next round of expansion began, where again, the queries became a burden. The second time around we didn't have another seemingly quick fix. I spent a couple days trying to figure out a better way to implement the q.alt=*:* field but to no avail I gave up and moved onto other performance optimizations.</p>
<p>Unfortunately, I didn't take the time to understand the code behind the query and I didn't understand exactly how SOLR was implementing the query in its back end process. Since I didn't understand the basis of the problem I couldn't possibly know the query could be easily re-factored. After a few weeks of high loads, 20+ on our 8 core servers, I struck up a conversation with Michael, the developer who wrote the query. We discussed how the query worked and what it needed to do and after five minutes we had discovered a much better way to structure the query. It took me only about a minute or two to re-factor the original query to produce the exact same result set. This new query was incredibly fast! I benchmarked it to be about 100x faster than the previous query and on top of that it was a simple drop in replacement!</p>
<p>From what I've deduced the original query passed a blank query string with a filter query to SOLR which in turn defaulted to the q.alt catch all first and then applied the filter on the catch all query. This is exactly the opposite of what we were expecting SOLR to do. We believed that the filter was applied first and then the q.alt was applied. However, that was not the case. while this misunderstanding wasn't ideal it wasn't too slow either with only 1.4 million documents to parse over. However once CitySquares hit the 14.5 million mark this query became unmanageable. Basically SOLR parsed over every single document in the index before applying the query filter we were using. To rectify this and regain performance and through put on our servers I simply moved the filter query statement to the query statement and specified the query field to be the same as the original filter field.</p>
<p>i.e.</p>
<p>Original query passed a blank query string with a filter query:</p>
<ul>
<li>select?q=+&amp;fq=&lt;FIELD&gt;:&lt;ID&gt;</li>
</ul>
<p>The updated query now passes the id as the query string and specifies the former filter field:</p>
<ul>
<li>select?q=&lt;ID&gt;&amp;qf=&lt;FIELD&gt;</li>
</ul>
<p>Instead of taking advantages of SOLR's and every other search engines strength of O(1) search time we were at the mercy of its worst case scenario O(n) scan time. This simple misunderstanding of how SOLR processes queries in the back end caused massive performance and throughput bottlenecks. These bottlenecks affected our short and long term infrastructure plans, and was the root cause of many performance headaches for our users, customers and IT department.</p>
<p>If this isn't proof positive that you should always take the time to understand the underlying functionality of whatever operating system, programming language or application you are using I don't know what is.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/04/27/100x-increase-in-solr-performance-and-throughput/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Is Amazon&#8217;s EC2 right for you?</title>
		<link>http://www.derivante.com/2009/01/26/is-amazons-ec2-right-for-you/</link>
		<comments>http://www.derivante.com/2009/01/26/is-amazons-ec2-right-for-you/#comments</comments>
		<pubDate>Mon, 26 Jan 2009 20:50:11 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[Hartware]]></category>
		<category><![CDATA[horizontal architecture]]></category>
		<category><![CDATA[horizontal database]]></category>
		<category><![CDATA[IT Infrastructure]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[Site Architecture]]></category>

		<guid isPermaLink="false">http://justinleider.com/?p=49</guid>
		<description><![CDATA[I've been asked this and similar questions quite a bit lately. But before I delve into the answer to this I want to lay the foundation and ask you a question. This one question should play a large part in your final assessment to go with EC2 or not. The question you should ask yourself [...]]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 8.5in 11in; margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<p style="margin-bottom:0;">I've been asked this and similar questions quite a bit lately. But before I delve into the answer to this I want to lay the foundation and ask you a question. This one question should play a large part in your final assessment to go with EC2 or not. The question you should ask yourself is:</p>
<p style="margin-bottom:0;"><strong>How quickly do you actually need to scale either up or down? </strong></p>
<p style="margin-bottom:0;">The answer to this will likely influence the correct solution to your problems. The following bullet point list is how I classify levels of scalability, each one comes with its own pros and cons but generally the quicker you need something the more expensive it is going to be.</p>
<ul>
<li><strong>Immediate</strong> - within minutes - EC2 or other cloud computing networks</li>
<li><strong>Fast</strong> - within days to a week - Managed Hosting, Rackspace, The Planet, etc</li>
<li><strong>Average</strong> - within weeks to a month - Own your own hardware, Dell, HP, IBM, etc</li>
<li><strong>Corporate</strong> - within months/years - Good Luck</li>
</ul>
<p style="margin-bottom:0;">With this in mind, everyone hears the hype of EC2, with its scalability, fully managed hardware and virtualization but there really aren't that many people out there describing their experiences with it. When we made the decision to go with EC2 we did our research and due diligence before making the switch. There wasn't much to go on but the few articles and blog posts we did read were all positive. I guess we all got caught up in the hype here as well.</p>
<p style="margin-bottom:0;">Even after all our research it turns out that going with EC2 was one of the poorer IT decisions we have made. EC2 has turned out to be more expensive, more difficult to implement and with poorer performance than we had ever expected even with our worst case estimations. To top it all off, we didn't fully utilize the benefits of going with EC2 which was immediate scalability. Our traffic is relatively predictable and grows or shrinks in manageable percentages and can be scaled up within days instead of minutes. We never have any massive spikes in our traffic either up or down. Even if we did have spikes we are limited by our MySQL cluster.</p>
<p style="margin-bottom:0;">While we had to rethink a lot of our architecture to create a more horizontal platform instead of the traditional vertical scaling, MySQL was by far our biggest bottleneck. The source of the problem is rooted in Amazon's preset machine size. While they have done an adequate job of offering different types of instances with more memory in one line and more computational power in the other you are still limited to what they are offering. With the large database we have and the latencies between the instances and their permanent storage we were forced to keep as much of our database cached in RAM. Now this shouldn't have been too big a deal. Just get a machine with a ton of RAM. Well, unfortunately Amazon's biggest instance only offered us a maximum of 15GB. Needless to say this was not sufficient and forced us to adopt a cluster solution. This in and of itself is not ideal especially when you should be able to run off a single box with 32GB of RAM and access to fast local disks. However, it took us twelve (12) m1.xlarge instances to reach the level of performance and availability we desired. Not to mention the network IO latency between node and disk storage and node to node adding insult to injury.</p>
<p style="margin-bottom:0;">While the speed and size of the cluster was not desirable, it worked. However, we had to completely forfeit any sort of scalability to achieve a working database. To my knowledge there is no way to quickly and easily boot up more instances of MySQL to supplement a live cluster. In order for us to add more capacity we would have to perform a rolling reboot of every machine in the cluster. Its unfortunate that databases were not designed with EC2 in mind.</p>
<p style="margin-bottom:0;">However, there are companies who are trying to tap into this pain point. We were looking very intently at a company called Continuent who produces a MySQL cluster monitoring and management tool. Unfortunately, as of Jan 2009 the product was still in private beta and was unavailable to us. This tool would have allowed us to add nodes to the cluster on the fly without having to take it down in the process. Although, even then with this extra tool, which wasn't cheap, you still couldn't scale down the cluster without taking it off-line. As far as I am concerned, if you are already using the largest instance available to you (an m1.xlarge or c1.xlarge), there is no way to vertically scale up a database with EC2. Instead you are forced into a less than ideal environment for hosting a horizontal architecture which could have serious consequences for your code base and SQL queries.</p>
<p style="margin-bottom:0;">To be honest, EC2 offers a lot of benefits that are hard to come by with other solutions. EC2 is great for companies doing lots of non-real-time activities such as batch and queued processing. Companies who have a small database that can be cached in RAM and replicated easily will also benefit from EC2, just boot up a bunch of instances and go to town. However, the bottom line is if you have fairly consistent usage patterns and your applications are performance sensitive then there are much faster and more cost effective ways of abstracting your hardware requirements. We at citysquares are in the process of moving off of EC2 and onto a managed hosting platform. We still enjoy the benefits of leased hardware like we had with EC2 and the ability to quickly add new hardware. Granted, more servers aren't available to us at the drop of a hat but a couple days lead time to get another box up and running is more than sufficient for us. Not only that but we also have a whole team of IT people working with us to help alleviate our burden of supporting the entire hardware/software stack. We can now focus on what we do best which is our application.</p>
<p style="margin-bottom:0;">Keep in mind that there is no concrete answer as to whether EC2 or cloud computing in general will work for you or not. You need to determine if the capacity and latencies of the pre-determined instance sizes will meet your growing infrastructure needs. For us the bitter answer was a resounding no. We were able to spec out a solution in a fully managed hosting environment for about half the monthly cost of EC2 while increasing the performance of our application significantly.</p>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">So, is Amazon's EC2 right for you?</p>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/01/26/is-amazons-ec2-right-for-you/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Part 2: An Architecture Overview &#8212; Apache, MySQL, Memcached, SQLite</title>
		<link>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/</link>
		<comments>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/#comments</comments>
		<pubDate>Thu, 24 Jul 2008 19:56:41 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[Web Architecture]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[architecture]]></category>
		<category><![CDATA[citysquares]]></category>
		<category><![CDATA[horizontal architecture]]></category>
		<category><![CDATA[horizontal database]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[SOLR]]></category>
		<category><![CDATA[sqlite]]></category>
		<category><![CDATA[xcache]]></category>

		<guid isPermaLink="false">http://justinleider.wordpress.com/?p=11</guid>
		<description><![CDATA[In my last post I mentioned the numerous technologies which were on tap for the upcoming version of CitySquares. This installment will continue to define an overview of the underlying architecture and begin to dig a little deeper into the actual implementation of the technologies. The idea and focus of this new architecture is aimed [...]]]></description>
			<content:encoded><![CDATA[<p><!-- 		@page { size: 8.5in 11in; margin: 0.79in } 		P { margin-bottom: 0.08in } --></p>
<p style="margin-bottom:0;">In my last post I mentioned the numerous technologies which were on tap for the upcoming version of <a title="CitySquares Online -- Hyper Local Neighborhood Search" href="http://citysquares.com" target="_blank">CitySquares</a>. This installment will continue to define an overview of the underlying architecture and begin to dig a little deeper into the actual implementation of the technologies. The idea and focus of this new architecture is aimed at creating a much more stable and scalable platform for us to work with. Before I get into the details you'll see Ive provided a graphic representation of how the architecture will be laid out.</p>
<p style="margin-bottom:0;">
<div id="attachment_12" class="wp-caption aligncenter" style="width: 430px"><a href="http://justinleider.files.wordpress.com/2008/07/architecture-overview.jpg"><img class="size-full wp-image-12" src="http://justinleider.files.wordpress.com/2008/07/architecture-overview.jpg" alt="A visual representation of a horizontal web architecture." width="420" height="300" /></a><p class="wp-caption-text">A visual representation of a horizontal web architecture.</p></div>
<p style="margin-bottom:0;">
<p style="margin-bottom:0;">Bear with me as I explain the work flow behind this graphic as it is not 100% clear from the visual representation. First off, I run <a title="Ubuntu Linux" href="http://www.ubuntu.com/" target="_blank">Ubuntu Linux</a> which is great for just about everything I need, except for creating any sort of graphics, so I apologize in advance for the lackluster graphic. As you can see, there are a few different layers: users, <a title="HA Proxy -- Load Balancing " href="http://haproxy.1wt.eu/" target="_blank">HA Proxy</a>, Apache, <a title="High performance caching system" href="http://www.danga.com/memcached/" target="_blank">Memcached</a>, <a title="SQLite -- A small fast file based database" href="http://www.sqlite.org/" target="_blank">SQLite</a> and finally MySQL labeled as databases.</p>
<p style="margin-bottom:0;">First and foremost are our beloved users, which whom without we would have no need for a website. Starting from the beginning, the users request a page from CitySquares, from there their request is passed through one of two HA Proxy servers. The sole purpose of these two machines is to load balance the incoming requests among all our Apache web servers and serve as a failsafe for one another. Once the user's request has been accepted and forwarded along to Apache we actually begin to process the request.</p>
<p style="margin-bottom:0;">The Apache servers run PHP and XCache modules. The PHP part I feel is fairly straight forward and out of the scope of this post so I will skip that part of the architecture. XCache however, is used in conjunction with and is an enhancement to PHP. More specifically XCache is an opcode optimizer and cache. It works by removing the compilation time of PHP scripts by caching the compiled and optimized state of the PHP scripts directly in the shared memory of the Apache server. This compiled version can increase page generation times by up to 500%, speeding up overall response time and reducing server load.</p>
<p style="margin-bottom:0;">Just as with all dynamic websites most if not all the actual data is stored in databases. Gone are the days of flat files with near zero processing required. Databases are the new workhorses of the web world and as such usually become the bottle neck of the overall system. CitySquares is in a somewhat unique position, nearly all our page loads have quite a bit of location and distance based processing and nearly all of this is done in our MySQL database. So while our Apache servers are sitting idle waiting for responses from their queries, the DB is preforming the brunt of the work calculating distances between objects and the like.</p>
<p style="margin-bottom:0;">We can reduce this bottleneck in a couple of different ways, the first of which is object caching. We will use Memcached to cache objects returned from the database. Say for example, we know the distance between two businesses. We know with a fair amount of certainty that those two businesses are going to be in the same place they were an hour ago, just as they were a week ago and as they will be a day from now. So we can cache this information with an expiration time of a couple days, thus saving ourselves the expense of calculating the distance between them on every page load. Of course if a user comes by and changes the location of one of these businesses, we can expire the object in cache and replace it with a newly calculated object straight from the database on the subsequent page load. These expensive queries require large table scans and mathematical formulas calculations on every row. These query results can be cached to free up the database and allow it to do what it does best. Store and retrieve data.</p>
<p style="margin-bottom:0;">In the case where we cant find the data in Memcached, either because it doesn't yet exist or has expired we will turn to our databases. We must first query a SQLite instance which is the gate keeper between Apache and the numerous databases we have. By having a separate lookup table we can essentially divide and parcel out our data sets on a table by table basis even down to an entry by entry basis. Depending on the type of data we are requesting SQLite will provide us with the location of one database or another to query for our data.</p>
<p style="margin-bottom:0;">One could argue that this just adds another layer of latency and they would be correct. However, as scalability becomes an issue you will find that adding database replication generally results in diminishing returns.  As new servers are brought online the overhead associated with replicating writes across all the replicated servers becomes choking and creates its own bottleneck. On the other hand, with a lookup table and a horizontal database architecture we don't have to worry about database replication nearly as much. You can just as easily divide your data sets into different databases. Now how you go about this varies greatly depending on your data. For CitySquares the solution turns out to be rather simple. Everything we do is location specific so it only makes sense that each data set is only as big as its parent city. Theoretically every city and all the data related to said city could reside in its own database. As you can probably guess we are only performance limited by the biggest cities, <a title="Manhattan on CitySquares" href="http://ny.citysquares.com/manhattan" target="_blank">Manhattan</a>, <a title="Brooklyn on CitySquares" href="http://ny.citysquares.com/brooklyn" target="_blank">Brooklyn</a>, etc. In these few cases we can always fall back to bigger and better servers and or replication if necessary.</p>
<p style="margin-bottom:0;">Just as our database has become a bottleneck in our current site, our search engine is also one as well, just to a lesser extent. We can take the lessons learned from our horizontal database architecture and apply it to the search engine architecture as well. By dividing our data sets into logical partitions we can keep our data from getting too large and unwieldy;  And with these smaller data sets we can reduce or remove all together the overhead associated with replicating data over multiple machines.</p>
<p style="margin-bottom:0;">While this solution sounds great, it won't be worth the effort if every time a programmer wanted to access some data they would be required to check Memcached, then SQLite and then finally MySQL for every query. In order for this to be feasible from a programmers standpoint the programmer should never have to think about this underlying architecture. This of course I will discuss in greater detail in the upcoming installments. Stay Tuned.</p>
<p style="margin-bottom:0;">
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2008/07/24/an-architecture-overview-apache-mysql-memcached-sqlite/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->