100x Increase in SOLR Performance and Throughput

by Justin Leider on April 27, 2009

Is your SOLR installation running slower than you think it should? Performance, throughput and scalability not what you are expecting or hoping? Do you constantly see that others have much higher SOLR query performance and scalability than you do? All it might take to fix your woes is a simple schema or query change.

The following scenario I am about to describe is proof positive that you should always take the time to understand the underlying functionality of whatever operating system, programming language or application you are using. Let my oversight and 'quick fix solution' be a lesson to you, it is almost always worth the upfront cost of doing something right the first time so you don't have to keep revisiting the same issue.


Before I delve into the nuances of SOLR let me first give you some background on what took place over the last half year at CitySquares. Back in the fall of last year the CitySquares website began experiencing an exponential growth in traffic. This growth was due to an expansion of its IYP (Internet Yellow Page) services into the New England and Metro New York areas. Prior to and during the beginning of the first wave of traffic growth, every business listing was powered by very large MySQL queries including a couple joins. The queries themselves weren't all that complex but they were big and unwieldy with joins on very large tables and lots of columns in the result sets. In some of the larger cities covered at the time (Manhattan, Bronx, Queens, Boston, etc) there were up to 100,000 rows of data that needed to be sorted before returning a rather small subset (20-40 rows) for each business listing page load. While this wasn't a big deal when CitySquares was still a niche Boston centric destination, it quickly became a huge burden on the MySQL servers. Some of these queries were so big the servers would run out of memory trying to crunch through a 3GB temp table and start thrashing the disks to server a request for Manhattan. We needed a better solution and quick.

Luckily for us we had already implemented a SOLR search engine with all the necessary data indexed from our database initially with the sole intent that search result sets shouldn't have to query the database. This worked to our advantage since it was very easy for us to modify the code base to query SOLR instead of MySQL. Both result sets were formatted as an object with the same field names and all. It was a perfect drop in replacement.

The SOLR solution we implemented utilized SOLR's wild card q.alt=*:* field to select all documents while applying filter query (fq) on that set to get all documents related to our filter. It was a huge win for us at the time. Not only were the queries faster than the MySQL ones, but the SOLR servers could handle more of these queries without even coming close to exhausting the server's resources. This quick and dirty solution was satisfactory for the next few months until CitySquares' next round of expansion began, where again, the queries became a burden. The second time around we didn't have another seemingly quick fix. I spent a couple days trying to figure out a better way to implement the q.alt=*:* field but to no avail I gave up and moved onto other performance optimizations.

Unfortunately, I didn't take the time to understand the code behind the query and I didn't understand exactly how SOLR was implementing the query in its back end process. Since I didn't understand the basis of the problem I couldn't possibly know the query could be easily re-factored. After a few weeks of high loads, 20+ on our 8 core servers, I struck up a conversation with Michael, the developer who wrote the query. We discussed how the query worked and what it needed to do and after five minutes we had discovered a much better way to structure the query. It took me only about a minute or two to re-factor the original query to produce the exact same result set. This new query was incredibly fast! I benchmarked it to be about 100x faster than the previous query and on top of that it was a simple drop in replacement!

From what I've deduced the original query passed a blank query string with a filter query to SOLR which in turn defaulted to the q.alt catch all first and then applied the filter on the catch all query. This is exactly the opposite of what we were expecting SOLR to do. We believed that the filter was applied first and then the q.alt was applied. However, that was not the case. while this misunderstanding wasn't ideal it wasn't too slow either with only 1.4 million documents to parse over. However once CitySquares hit the 14.5 million mark this query became unmanageable. Basically SOLR parsed over every single document in the index before applying the query filter we were using. To rectify this and regain performance and through put on our servers I simply moved the filter query statement to the query statement and specified the query field to be the same as the original filter field.

i.e.

Original query passed a blank query string with a filter query:

  • select?q=+&fq=<FIELD>:<ID>

The updated query now passes the id as the query string and specifies the former filter field:

  • select?q=<ID>&qf=<FIELD>

Instead of taking advantages of SOLR's and every other search engines strength of O(1) search time we were at the mercy of its worst case scenario O(n) scan time. This simple misunderstanding of how SOLR processes queries in the back end caused massive performance and throughput bottlenecks. These bottlenecks affected our short and long term infrastructure plans, and was the root cause of many performance headaches for our users, customers and IT department.

If this isn't proof positive that you should always take the time to understand the underlying functionality of whatever operating system, programming language or application you are using I don't know what is.

9 comments

sorry for wrong syntax
select?q=l&qf=user&fq=testing

by rajat rastogi on June 9, 2011 at 9:39 am. Reply #

try select?q=l&qf=userand fq=testing

by rajat rastogi on June 9, 2011 at 9:37 am. Reply #

Interesting read.
How would you create a query with more than one element in this fashion?
e.g. you want to find every male person with the age 40

with the old query:
select?q=+&fq=gender:male AND age:40

with the new query:
???

Thanks,
Marc

by Marc on June 1, 2010 at 11:09 am. Reply #

Marc, I would first filter by age since you know that is going to result in fewer rows to search than a binary, m/f search. Once you’ve narrowed the fields down as much as you can then apply the filters.

select?q=40&qf=age&fq=gender:male

SOLR 1.4 actually comes with a bunch of new filtering improvements where they apply filters during the query process. In this simple case it might do it for you in the background but I can’t be positive. Benchmark it both ways and see what happens.

by Justin Leider on June 10, 2010 at 4:34 pm. #

[...] couple months ago I wrote about the terrible performance and a work around for SOLR / Lucene search engine. I discovered that performance would drop off a cliff while using [...]

by Derivante » SOLR Filtering Performance Increase on June 23, 2009 at 2:26 pm. Reply #

[...] qf= vs fq=: 100 Times Performance Increase in Solr [...]

by qf= vs fq=: « Solrhack on May 13, 2009 at 3:26 pm. Reply #

Jim,

I have benchmarked the performance differences between the old and new queries as well as their use of the fieldCache as well as the differences between single indexes and multi-core indexes.

http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/

by Justin Leider on May 11, 2009 at 12:18 pm. Reply #

Does your new query style take advantage of the filterCache as well a the old one did? I guess if overall perf is 100x better who cares. :)

Nice, We do similar things with our queries. I’ll see if this applies in our case. +1 to understanding whats going on before optimizing!

by Jim Murphy on May 10, 2009 at 9:40 am. Reply #

hi..

old query:

select?q=testing&fq=user:1

how do I convert into new query.

I want to search the term testing.

if I change the query like below I believe it wont search the word testing..

select?q=1&qf=user

any way to do it..

thanks.

by rahul on March 16, 2011 at 6:12 am. #

Leave your comment

Required.

Required. Not published.

If you have one.