Infovore » Cross-model searching in Rails with Ferret

If you’re building a Rails applciation with search, chances are you’ve run across acts_as_ferret. And if you haven’t, check it out – it allows any model to be searchable by Ferret, the Ruby port of Lucene. Ferret’s a pretty nifty search engine – reasonably fast, pretty accurate – and it’s nice to be able to use it so simply in your Rails app.

Of course, what makes acts_as_ferret really handy is that it’s neatly designed to perform multi-model searches. Or, rather, the interface to do so is there, you just have to glue the results together. After some rough stumbling about, here’s what I came up with.

Firstly: don’t use find_by_contents, that’s not much use. find_by_contents is a wrapper around find_id_by_contents. find_id_by_contents is useful because it returns you only three things: a relative score, a model name, and a model id. That means you can merge everything into a big list, and then perform individual queries on the relevant models.

So how did I implement this?

My first stab was this (borrowing some real code from what I’m working on):


articles = Article.find_id_by_contents(query)
authors = Author.find_id_by_contents(query)
matches = (articles + authors).sort_by {|match| match[:score] }

This gives you an array of model/score/id hashes, which you can then do lookups on. The problem is it’s not very DRY, and it requires editing every time you make a new model that’s Ferretable. This is my more final solution, which I came up with. This time, I’ll walk you through each step.

First, somewhere like environment.rb, define a constant array:

FERRETABLE_MODELS = %w[Article Author]

That means we can update things at a later date easily. Then, in your search controller action, start with this:

klasses = FERRETABLE_MODELS.collect {|klass| Kernelt.const_get klass}

That should give you an array containing the Article and Author objects – or whatever ActiveRecord objects you’ve chosen in your constant array. Next, let’s get the array that we arrived at last time:

matches = klasses.inject([]) {|out, klass| out << klass.find_id_by_contents(query)}.flatten

Bit more complex, but also more succinct. All this does is iterate over each klass, using an inject method with an empty array passed in, and tells each class to call find_id_by_contents, passing in the query. It then flattens that lot, so that the array is only one-deep. We're now where we were before.

Finally: let's generate an array of the actual objects we referred to, sorted by ranking. I'm going to generate an array of hashes. Each hash has two keys: :object, the actual data object we want; and :score, the rank that ferret assigned it. We get that out like so:


results = matches.collect {|match|
   :score => match[:score], 
   :object => (Kernel.const_get(match[:model]).find(match[:id]))
}.sort_by {|o| o[:score]}.reverse

Again, possibly a bit ugly. :score remains the same; we run find on the appropriate ActiveRecord model, passing in the appropriate id to obtain the :object. Finally, we sort the array by :score and flip it, so that results[0] is the most popular search record. Obviously, you can pass extra parameters into that "find" method.

All that remains is to display that lot in your view, perhaps paginate it, and build a conditional to determine how to display each kind of object.

And that's it. I made a few changes to my dummy code when typing this up, so if something's broken, tell me and I'll fix it. I think that's a more maintainable way of searching across a range of models with ferret, and it takes advantage of some useful Ruby dynamics - finding objects through Kernel.const_get, in particular. That's my Ruby fun for today, then.

Update

My colleague Ben proposes this much tidier (but untested) solution:


results = []

FERRETABLE_MODELS.each  do |klass|
  k = Kernel.const_get klass
  k.find_id_by_contents(query).each do |m|
    results.push {
       :score => m[:score], 
       :object => k.find(match[:id]) 
     }
  end
end

results = results.sort{ |a,b| b[:score] <=> a[:score] }

I like the nested loop much more - should have thought of that myself - and will admit to being lazy wrt the sort_by and .reverse trick.

6 comments on this entry.

Matt Biddulph | 23 Sep 2006

Looks good… I’ve been meaning to try out ferret and will do so soon.

Worth keeping in mind: In the acts_as_ferret find_id_by_contents RDoc that says “Note that the scores retrieved this way arenâ€™t normalized across indexes, so that the order of results after sorting by score will differ from the order you would get when running the same query on a single index” — http://projects.jkraemer.net/acts_as_ferret/rdoc/classes/FerretMixin/Acts/ARFerret/ClassMethods.html#M000015
Simon Jones | 8 Apr 2008

(Website due up soon, that’s what I’m working on)

Thanks for the excellent tips, been really helpful!

Tried out the code suggested by your colleague and there is just one change that’s required. Because find_id_by_contents returns an array with the total results, followed by the actual results, like [5, [] ] you need to change the line:

k.find_id_by_contents(query).each do

to

k.find_id_by_contents(query).last.each do
Tom | 8 Apr 2008

Hey Simon, glad you found it useful. I’m not sure that I’d recommend Ferret for production use any momre, based on other people’s experiences to date. If you’re interested in a fast, efficient, and powerful search engine for your Rails application, I can recommend Sphinx in combination with the excellent Ultrasphinx plugin. The performance boost is quite remarkable.
Nasir | 24 Apr 2008

I think tom is right. It was OK in production when data on SPhred wasn’t that huge but as traffic increased thus data as well then it started giving errors very often, more specifically like this one

File Not Found Error occured at :117 in xpop_context Error occured in fs_store.c:329 – fs_open_input tried to open “/path_to/index/production/item/_br_1.del” but it doesn’t exist:

Our users think it is some coding issue though it is just ferret messing up.

Now I am looking for options but haven’t decided on anything yet.

Is Sphinx / Ultraphinx reliable enough? Any idea on Solr?
Nasir | 29 Apr 2008

A follow up to my previous comment:

So I implemented Sphinx with Ultrasphinx on http://www.SPhred.com a couple of days back and so far I didn’t receive any error notification emails as opposed to receiving over 10 every day due to indexing problems when I was using Ferret.

As of now, Sphinx looks better compared to Ferret :)
Tom Armitage | 29 Apr 2008

That’s great news, Nasir. Glad to know it’s an improvement.
24 Sep 2006

Trackback: Infovore : More on multi-model search with acts_as_ferret

Cross-model searching in Rails with Ferret

22 September 2006

6 comments on this entry.

Matt Biddulph | 23 Sep 2006

Simon Jones | 8 Apr 2008

Tom | 8 Apr 2008

Nasir | 24 Apr 2008

Nasir | 29 Apr 2008

Tom Armitage | 29 Apr 2008

24 Sep 2006

Archives