2

My site's pages have exceeded the limit of pages for Google Custom Search so many of the results are not found in our site search.

I've been reading about Lucene, Nutch, Solr, etc and I'm wondering if I'd have the requirements for running those on a single server that also runs the site (on nginx) and our mysql server. We hae 2 gigs of RAM.

I'd appreciate any suggestions for migrating to a new site search.

HopelessN00b
  • 53,385
  • 32
  • 133
  • 208
Ian
  • 251
  • 2
  • 10

1 Answers1

3

How many pages do you have to have exceeded the limit of a Google Custom Search (just wondering)?

I recommend using Sphinx, Lucene was fine up to a few thousand items being indexed but beyond that it was unusable, searches were unbearably slow and re-building the index would take hours.

We have Sphinx running on a Rackspace Cloud Server with 1gb of RAM alongside the rest of the services required to run the site (Apache, PHP, MySQL, Memcached etc.) and it performs great.

The website we have Sphinx running on at present has >70,000 articles, searches complete very quickly and it can rebuild its entire index in ~11 seconds. I chose Sphinx based on recommendations from other developers and the knowledge that a few big sites rely on it for their search engines (Neowin being one of them).

Steve
  • 282
  • 1
  • 6
  • Just looking at the CSE stats and it only appears to have about 9,850 pages indexed, but we have far more when looking at the public Google search. I've thought about Sphinx before, but a lot of our pages are not in our mysql database. It's been suggested to have a crawler toss the pages & URLs into a db and then have Sphinx index that, but I haven't found a crawler yet that toss its results into mysql. – Ian Feb 22 '10 at 01:03
  • I see, I think you'd definitely run into difficulty quickly with that quantity of data in Lucene, Sphinx may well be a good option but I don't know how you could go about getting your non-mysql content into the index. Perhaps you should ask another question about crawler->mysql? :) – Steve Feb 22 '10 at 12:22
  • 1
    What lucene version/implementation were you using that you had problems with over a few thousand documents? Lucene and Solr can handle millions of problems easy, so it seems odd you had problems – Cristian Vat Jun 11 '10 at 05:58
  • I wasn't clear on this; it was the PHP implementation of Lucene. I think I can stop writing now :P – Steve Oct 02 '10 at 11:20