Friday, September 19, 2008

Boolean Searches with Sphinx

Well, we now have a nice little search system for our entity data. But what if the user want to search for "a" OR "b". What we have so far doesn't handle that. So let's make the modifications we need to make that work.

All I have to say is it's a good thing that the RailsCast on Sphinx addressed this, because I couldn't find this in the thinking_sphinx online docs anywhere. Maybe I just missed it, but I've looked 20 times and just don't see it. (Don't think I'm knocking thinking_sphinx in any way. It's still awesome!)

Anyway, what we'll need to do is modify our search call in our home_controller.rb file. The magic word is :match_mode. So here's the new search line in the controller:

@entities = Entity.search params[:search],
:page => params[:page],
:per_page => 10,
:order => "lastname ASC, firstname ASC",
:match_mode => :boolean


Notice I've set the match mode to :boolean. This will allow us to do boolean searches. So what is a boolean search? Well, for one thing, it allows us to search for "a" OR "b". However, thinking_sphinx doesn't use the word OR, it uses a pipe (|) symbol. Hmm...I don't think my users will want to do this. I know I'm used to putting the word OR in most of the searches I do. So we'll have to do something about that later. Boolean searches also allow us to use AND, NOT and grouping operators. Those would be &, ! or -, and (). Again, not something most of my users will be able to/like to deal with. (Note that there are other match modes, but I'm not going to be using those. For further information on those, look here.)

So what do we do to allow our end user to use boolean syntax they are familiar with (assuming they're not programmers)? We have to do some string replacement in our code. For simplicity's sake right now, I'm just going to put this in my controller. I'll probably refactor it out later because I don't think that's really the appropriate place for it. Course, I'm a n00b, so what do I know?

Here's what I did. There may be better ways, and maybe I'll stumble on to those later, but for now, this should give us something to work with.

class HomeController < ApplicationController
def index
params[:search].gsub!(/ or /i," | ")
params[:search].gsub!(/ and /i," & ")
params[:search].gsub!(/\"(.*)\"/,'(\1)')
params[:search].gsub!(/\'(.*)\'/,'(\1)')
@entities = Entity.search params[:search],
:page => params[:page],
:per_page => 10,
:order => "lastname ASC, firstname ASC",
:match_mode => :boolean
end
end


This makes the necessary replacements in the search string to allow the user to use OR, AND and single- or double-quoted strings in their search. Make sure you include the spaces around the "or" and "and" in your gsub calls! You don't want to be replacing "or" and "and" inside search terms! (Yes I made that mistake.) The /i makes the replacement case-insensitive. This is important because you don't know if the user will type "or" or "OR". Also, notice that we're using gsub! instead of just gsub. This is so that the params[:search] is modified directly. If we had used gsub instead, we'd have to assign each line to a variable, because gsub just returns the value, it doesn't modify the variable it's operating on directly.

So now I'm jazzed! The only thing that bugs me is that when I put a or b in my search box and click the "Search" button, when the search results come back, my search box shows a | b instead of the text I put in. While technically correct, I could see this confusing the tar out of end users. Not a big deal, but attention to detail is important! If I figure this one out, I'll let you know.

Now I'm going to eat lunch before I fall over on my keyboard!

See ya next post!

Chris

Thursday, September 18, 2008

Sorting Sphinx

So we've gotten our results out and have them paginated nicely. That's spiffy, but I sure would like to see them sorted. So how do we do that?

The first thing we need to do is to tell Sphinx which fields are sortable. To do this, we'll add :sortable => true to the indexes in our entity model. So now the indexes look like this:

define_index do
indexes firstname, :sortable => true
indexes lastname, :sortable => true
end


Now we'll need to rebuild the indexes. Make sure that you shut down the Sphinx server if it's running or things won't work right (I think there's actually a way to rebuild the indexes on the fly without shutting the server down, but I'll have to investigate that later). To rebuild the indexes, we open the cmd window and type

rake ts:index

And then restart the server:

rake ts:start

Now, we'll just need to tell our controller to sort the results. In the home_controller.rb, the search now looks like this:

@entities = Entity.search params[:search],
:page => params[:page],
:per_page => 10,
:order => "lastname ASC, firstname ASC"


You'll notice that in the :order, we've included the ASC for each field we want to index by. This is not normally necessary in a standard Rails .find command. However, Sphinx is a little quirky and requires this. If it's not there, Sphinx won't return any results. You can't imagine how long it took me to figure this out. Of course, if I'd read the thinking_sphinx instructions, I would have seen this and saved myself a lot of frustration. Stupid me!

Anyway, if we now reload our home page, we should see sorted results. Woot!

Next time, I'll be trying to figure out how to do some advanced searching with our simple search box.

See ya next post!

Chris

Paginating Sphinx

In the last post, you'll remember that I mentioned the fact that Thinking_Sphinx automatically works with will_paginate. So this time, we're going to explore how to get pagination working with our spiffy new Sphinx searching.

The first thing we'll need to do is to get will_paginate. Will_paginate has been around for a while, but has recently (relatively speaking) been moved to GitHub. Brilliant move IMHO. However, it's name also changed to mislav-will_paginate. If you don't already have the old will_paginate gem installed, that's not relevant to you. If you do, you'll want to remove the old will_paginate before installing this one (unless, of course, removing it will dork up your other projects, in which case you're on your own. Sorry).

Let's head over to the (very nicely done) will_paginate installation instructions page. You'll notice there are several options for installing it. I'm going to use "Installing the Gem manually". You go with whatever floats your boat.

So I open my cmd window and type:

gem sources -a http://gems.github.com

because I don't have github in my gem sources. I shouldn't ever have to do this for this machine again. And good thing, because that was just brutal...

Anyway, now I type:

gem install mislav-will_paginate

Now the gem is installed. Simple goodness...

Open the project\config\environment.rb file and add the following line AFTER the end of the initializer block (if you put it inside the block, it won't work!)
require "will_paginate"


I'm pretty sure at this point, we need to restart whatever rails server we're using (I'm using WEbrick for testing). If not, just call me stupid and move on!

Now we're all set up to use pagination. This should be pretty simple (even for me)...

The pagination stuff will be added in my views/home/index.html.erb. Just put the following line wherever you want the pagination links to show up (I'm putting mine at the bottom, so there):
<%=will_paginate @entities%>


Now, if we reload the page (and we have multiple pages of data in the result), we'll see the pagination links. If you don't see any, you probably don't have enough data. I'd recommend watching the RailsCast I mentioned in my first post to get some data in your tables. Or you can be masochistic and put it in by hand. Go ahead, we'll wait...

Ok, so now we see the pagination links, right? Just click on one of those links and...WTH???? I keep getting the same data back regardless of which page link I select! Well, that's because I haven't had my coffee today and forgot an important little change. We've got to tell the controller to give us the next page, duh. So in the controllers/home_controller.rb, in the index method, I'm going to make the search command look like this:
@entities = Entity.search params[:search],
:page => params[:page]


Now if we click on the pagination links, they should work.

The one thing I'm not liking on this is that it's showing me 20 entities per page. I only want to see 10. So one more addition to the controllers/home_controller.rb:
@entities = Entity.search params[:search],
:page => params[:page],
:per_page => 10


Time for a celebratory beer...

See ya next post!

Chris

Working with Thinking_Sphinx

So now that Sphinx and Thinking_Sphinx are installed (right?), we can get on with actually using this stuff.

If you missed the post on installing Thinking_Sphinx, you can read it here. The post about installing Sphinx itself is here. And if you haven't yet watched the RailsCast about this, you need to do that!

Note that there are other Sphinx plugins for RoR, but based on all the stuff I've read on the web, I've chosen to use Thinking_Sphinx. We'll see how that decision pans out over the next few posts.

The first thing we need to do is generate the configuration file that Sphinx will use. To do that, I'll open the cmd windows, navigate to the root of my project and type

rake thinking_sphinx:configure

This should put out a file under our project\config directory named development.sphinx.conf. If that file isn't there, we've messed something up. The cmd output for the rake task should point to the issue. At this point, I'll assume that we've gotten this file created seccessfully.

Next, we'll need to specify some indexes in our models. Let's assume that we have an Entity model with the fields "firstname" and "lastname" (which I do). In our entity.rb model file, we'll add the following code:


entity.rb

class Entity < ActiveRecord::Base
define_index do
indexes firstname
indexes lastname
end
end



This should allow Sphinx to index the firstname and lastname fields of the entities table.

Now we'll need to generate the indexes. So we open the cmd window and type:

rake ts:index

This could take a few minutes if the dataset is very large. When it's done, it should have created some files in the project\db\sphinx\development directory. Mine are named entity_core.s*. Note that it would be a good idea to actually have the tables indexed in MySQL as Sphinx uses those indexes to retrieve data. From the docs it doesn't appear necessary, but will make the searching faster.

It's time to start the Sphinx service. In the cmd window we type:

rake ts:start

The Sphinx server should now be started. Now we can set up a search page and see if it works.

For simplicity, because we're just testing concepts here, I've set up a Home controller with its associated parts and pieces. The Home index page is my root page. I've put the search form in this page.

Here's a pastie of the relevent code. (yes, I'd like to show this code inline here, but I haven't figured that out yet. I'd appreciate any hints!)

You'll notice it's all very simple, basic stuff. I haven't even broken a sweat yet!

If we load the home page and type something into the search box and click the "Search" button, we're greeted with the search results. Rock on! This worked like a champ and wasn't very difficult at all. RoR kicks arse!

Next, we'll be dealing with pagination. Thinking_Sphinx automatically works with the will_paginate plugin. Here's the RailsCast on that one. Bone up and we'll muddle through that next time.

See ya next post!

Chris

Installing Sphinx on Windows

So now we come to the part of installing Sphinx on Windows. It turns out that it was not terribly difficult, but I had a hard time finding instructions on the web, so I'll post my steps here.

1. I downloaded Sphinx from the official Sphinx download site (I downloaded Win32 release binaries with MySQL support: sphinx-0.9.8-win32.zip).

2. I unzipped the file into a temp folder.

3. I copied all of the files from the \sphinx-0.9.8-win32\bin directory into C:\Dev\Sphinx (you can put them in the directory of your choice).

4. I added C:\Dev\Sphinx to my Windows PATH

And Sphinx is installed. Nice, simple, easy.

In the next post, I'll see if I can actually set up some indexes and search with Thinking_Sphinx.

See ya next post!

Chris

Wednesday, September 17, 2008

Getting Git and Thinking Sphinx

As I was looking into what I needed to do to use Sphinx in our project, I noticed that the Thinking Sphinx plugin uses Git. I don't have Git. So I'll now be getting Git. (yeah, yeah. it sounds funny. Big laugh. Stop it.)

This poses a particular problem for me because I am using a (**gasp**) Windows machine (quit laughing). Hey, right now it's buy a Mac or feed my kids. The kids win.

It turns out that there is a project called msysgit that is basically Git for Windows. So I'm downloading the installer from here. I'm hoping this is what I need to be able to install the Thinking Sphinx plugin.

Stand by...

Stand by some more...

Ok. I executed the .exe file and installed Git. I chose the option to run from the Windows cmd window (seemed logical to me) so that Git would be in my path. Installation was a snap! I've been officially forked.

Now I've opened my cmd window, navigated to the root directory of my project (this is important!) and typed in the command to install the Thinking Sphinx plugin:

ruby script/plugin install git://github.com/freelancing-god/thinking-sphinx.git

Here comes the big moment...trepidation...anxiety...the shakes..."Enter"...Woohoo! I now have the Thinking Sphinx plugin in my vendors/plugins directory under my project. That wasn't too bad at all. Next time, I'll be downloading and installing Sphinx itself. Call the comedy club, cause that should be a riot!

See ya next post!

Chris

Tuesday, September 16, 2008

Here we go!

So I just started a large RoR project and thought I would set up this blog to share my frustrations and triumphs in case anyone else starting out in the Ruby world might benefit.

To be honest, this isn't completely magnanimous. I needed a place to log stuff that I've done so I don't forget it. But I figured a blog would be better than a notebook because someone else might learn something, too.

The project is essentially a Contact Management System (pretty dry stuff I know). But it's a large enough project with enough features that I thought it would be a good foray into this world and give me lots of material for a blog series.

Our first challenge is making sure that the system's search features are very fast. We're talking about large amounts of data here (millions of records across multiple tables). The users will need to be able to search for terms across multiple fields in those tables and it will have to be fast!

So my first step is to generate large amounts of realistic test data. Not a very pleasant prospect to be sure. Generating realistic names, addresses, phone numbers, etc. is not simple. Or so I thought. After starting to monkey around with creating a ruby library to generate this stuff for me (without a lot of progress), I stumbled on this railscast about setting up realistic test data. Wow! Not only is this my favorite rails site, but this railscast was posted the day after I started trying to do this! How did he know? Ryan Bates does an awesome job explaining this stuff. I would highly recommend that all RoR n00bs watch every episode!

I installed the two gems and followed the simple instructions and viola! my data is populated. It took me just a couple of hours to get a complex database set up with lots of realistic looking data. And that included setting up all of my models! (I know, RoR masters probably could have done it faster, but I was jazzed!) Very, Very Cool!

Next, I'm going to be experimenting with the search engines. Thinking Sphinx looks very promising for what we're doing. I'll post about my experience with that in the next blog. In the mean time, I'd recommend watching this railscast about Thinking Sphinx (RailsCasts again).

See ya next post!

Chris