<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Derivante &#187; Throughput</title>
	<atom:link href="http://www.derivante.com/tag/throughput/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.derivante.com</link>
	<description>to obtain or receive from a source</description>
	<lastBuildDate>Mon, 26 Apr 2010 18:44:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>SOLR Performance Benchmarks – Single vs. Multi-core Index Shards</title>
		<link>http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/</link>
		<comments>http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/#comments</comments>
		<pubDate>Tue, 05 May 2009 22:23:13 +0000</pubDate>
		<dc:creator>Justin Leider</dc:creator>
				<category><![CDATA[SOLR]]></category>
		<category><![CDATA[Web Technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Shards]]></category>
		<category><![CDATA[Throughput]]></category>

		<guid isPermaLink="false">http://www.derivante.com/?p=350</guid>
		<description><![CDATA[Single vs. multi-core sharded index. Which one is the right one? There is not a whole lot of information out there, especially when it comes to hard numbers and comparisons. There are a couple reasons for this. The first one (&#8230;)</p><p><a href="http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/">Read the rest of this entry &#187;</a></p>]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-415" title="solr_fc" src="http://www.derivante.com/wp-content/uploads/2009/05/solr_fc.jpg" alt="solr_fc" width="170" height="94" />Single vs. <a title="SOLR multi-core indexing" href="http://wiki.apache.org/solr/CoreAdmin" target="_blank">multi-core sharded index</a>. Which one is the right one? There is not a whole lot of information out there, especially when it comes to hard numbers and comparisons. There are a couple reasons for this. The first one that comes to mind is the multi-core functionality offered by <a title="SOLR Search Engine" href="http://lucene.apache.org/solr/" target="_blank">Apache SOLR</a> is very nascent. It was recently introduced with the latest SOLR v1.3 and hasn't had much time to be adopted by the SOLR community. Second, the results are dependent on your schema, index size, query types and user load. These factors can account for varying performance results. As evidenced by the following benchmarks, a multi-core SOLR index has the potential to speed up the performance of your application or cut throughput and scalability by approximately the inverse number of cores.</p>
<p style="margin-bottom: 0in; padding-left: 30px;">i.e. For n cores the maximum throughput is roughly 1/n vs. a single index.</p>
<p style="margin-bottom: 0in;">With multi-core sharded indexes the underlying assumption is that search performance improves by splitting your index into smaller chunks. These smaller shards are then faster and more efficient to search and index. However, you never get anything for free, the performance increase comes at a cost of higher CPU utilization. By breaking the index into multiple smaller pieces it makes searching and indexing on that smaller subset of the index faster, but you'll need to search each core individually for every query. Where as a single index runs one slightly slower query, a multi-core sharded query runs n queries in parallel and then combines the results.</p>
<p><span id="more-350"></span></p>
<p style="margin-bottom: 0in;">There is one problem which still needs to be worked out with the multi-core sharded index. There is no distributed IDF (inverse document frequency). This is to say, if your documents are not spread evenly across all shards then you risk a result set that is improperly ordered based on your sorts, query boosts, etc. This happens with a distributed multi-core index because the scoring of the documents takes place within each individual  core before the results are combined and the query returned.</p>
<p style="margin-bottom: 0in;">Ideally, a multi-core index is great if you need to increase the performance of your queries and can afford to sacrifice some scalability and throughput to see it through.</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;">Below are some charts of benchmarks that I have compiled on the CitySquares SOLR index. The specifications of the machine and indexes are as follows:</p>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>Testing machine - Dell r900:</strong></p>
<ul>
<li>4x Quad Core Intel(R) Xeon(R) CPU 		E7340 @ 2.40GHz (16 physical cores)</li>
<li>24GB RAM</li>
<li>3x 15k RPM drives in RAID 0</li>
<li>Gig-Ethernet on a local LAN</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Index Stats:</strong></p>
<ul>
<li>14.5 Million Documents</li>
<li>13 GB total size</li>
<li> 56 fields (indexed and/or stored 	w/ various amounts of processing)</li>
<li>Fully optimized index</li>
</ul>
<p style="margin-bottom: 0in;"><strong>Benchmarks:</strong></p>
<ul>
<li>Used Apache Bench for testing purposes from another machine on the same LAN over Gig-E.</li>
</ul>
<pre class="bash">&nbsp;
<span style="color: #808080; font-style: italic;">#!/bin/bash</span>
<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;&quot;</span> &gt; solr_results.log
<span style="color: #000000; font-weight: bold;">for</span> C <span style="color: #000000; font-weight: bold;">in</span> <span style="color: #000000;">2</span> <span style="color: #000000;">4</span> <span style="color: #000000;">8</span> <span style="color: #000000;">16</span> <span style="color: #000000;">32</span> <span style="color: #000000;">64</span> <span style="color: #000000;">128</span> <span style="color: #000000;">256</span> <span style="color: #000000;">512</span>
<span style="color: #000000; font-weight: bold;">do</span>
<span style="color: #007800;">N=</span>$<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #007800;">$C</span>*<span style="color: #000000;">1000</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>
<span style="color: #7a0874; font-weight: bold;">echo</span> <span style="color: #ff0000;">&quot;ab -n$N -c$C&quot;</span> &gt;&gt; solr_results.log
ab -n<span style="color: #007800;">$N</span> -c<span style="color: #007800;">$C</span> <span style="color: #ff0000;">'http://solr:8080/solr/select?q=&lt;ID&gt;&amp;qf=&lt;FIELD&gt;&amp;fq=&lt;FIELD&gt;:&lt;ID&gt;&amp;start=0&amp;rows=20'</span> &gt;&gt; solr_results.log
<span style="color: #000000; font-weight: bold;">done</span>
&nbsp;</pre>
<p style="margin-bottom: 0in;">
<p style="margin-bottom: 0in;"><strong>For the trends in red the lower the number the better.<br />
For the trends in blue the higher the number the better.</strong></p>
<p style="margin-bottom: 0in;">
<div class="mceTemp">
<dl id="attachment_356" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Single index with no caching enabled <img class="size-full wp-image-356" title="single-index-no-cache" src="http://www.derivante.com/wp-content/uploads/2009/04/single-index-no-cache.jpg" alt="Single index with no caching enabled" width="500" height="400" /></dt>
</dl>
</div>
<div class="mceTemp">
<dl id="attachment_355" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Single index with filterCache enabled<img class="size-full wp-image-355" title="single-index-cache" src="http://www.derivante.com/wp-content/uploads/2009/04/single-index-cache.jpg" alt="Single index with filterCache enabled" width="500" height="400" /></dt>
</dl>
</div>
<p>We can see here in the above graph that there were no results from the 512 concurrency test. This is because there was a deadlock in the Apache Tomcat server. The max number of connections was set to 512 with an overflow of 100.  This is the cause of all the cases where there are no results for the 512 test case. Ironically the Single core without the cache managed to finish but the test with fieldCache on failed.</p>
<div class="mceTemp">
<dl id="attachment_353" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Multicore Index (2 Cores) with no caching enabled<img class="size-full wp-image-353" title="multicore-no-cache" src="http://www.derivante.com/wp-content/uploads/2009/04/multicore-no-cache.jpg" alt="Multicore Index (2 Cores) with no caching enabled" width="500" height="400" /></dt>
</dl>
</div>
<div class="mceTemp">
<dl id="attachment_352" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Multicore Index (2 Cores) with filterCaching enabled<img class="size-full wp-image-352" title="multicore-cache" src="http://www.derivante.com/wp-content/uploads/2009/04/multicore-cache.jpg" alt="Multicore Index (2 Cores) with filterCaching enabled" width="500" height="400" /></dt>
</dl>
</div>
<p><strong>The higher the better in the following chart.</strong></p>
<div class="mceTemp">
<dl id="attachment_354" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Requests per second across all benchmarks<img class="size-full wp-image-354" title="requests-per-second" src="http://www.derivante.com/wp-content/uploads/2009/04/requests-per-second.jpg" alt="Requests per second across all benchmarks" width="500" height="400" /></dt>
</dl>
</div>
<p><strong>The lower the better in the following charts.</strong></p>
<div class="mceTemp">
<dl id="attachment_357" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Time per request across all benchmarks<img class="size-full wp-image-357" title="time-per-request" src="http://www.derivante.com/wp-content/uploads/2009/04/time-per-request.jpg" alt="Time per request across all benchmarks" width="500" height="400" /></dt>
</dl>
</div>
<p>The above graph shows the only test to finish successfully with 512 concurrent connections was the single index with caching disabled.</p>
<div class="mceTemp">
<dl id="attachment_362" class="wp-caption alignnone" style="width: 510px;">
<dt class="wp-caption-dt">Time per request across all benchmarks (truncated view)<img class="size-full wp-image-362" title="time-per-request-zoom" src="http://www.derivante.com/wp-content/uploads/2009/04/time-per-request-zoom.jpg" alt="Time per request across all benchmarks (truncated view)" width="500" height="400" /></dt>
</dl>
</div>
<p>This graph is the same as the one before without the last two concurrency levels so you can see whats going on at the beginning of the benchmark. Its still hard to see but the multi-core sharded indexes are a bit lower that the single indexes. Its clear however at the higher concurrencies that the single indexes beat the multi-core ones hands down.</p>
<p>Ive attached a <a title="SOLR Benchmarks" href="http://www.derivante.com/wp-content/uploads/2009/04/solr-blog-benchmarks.xls" target="_blank">spreadsheet</a> with actual numbers from the benchmarks since some of the charts are hard to read.</p>
<p>So there it is, take it as you will. There are definitely benefits to moving from a single index to a distributed multi-core sharded index. However, whether it works for your dataset and application is up in the air. After these benchmarks we decided that the multi-core index that had served us well on <a title="Limitations of scaling with EC2" href="http://www.derivante.com/2008/10/08/the-limitations-of-scaling-with-ec2/" target="_blank">Amazon's EC2</a> no longer worked well for us on our new managed hosting. We are currently running a single index at <a title="CitySquares Online -- Hyper Local Neighborhood Search" href="http://citysquares.com" target="_blank">CitySquares</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
<!-- WP Super Cache is installed but broken. The path to wp-cache-phase1.php in wp-content/advanced-cache.php must be fixed! -->
