SphinxSearch and Django on Ubuntu

Sometime ago I needed to create a search in my project. I tested many search engines like whoosh, xapian. I didn’t test Solr - configuration is terrible and it’s Java ;-) I chose SphinxSearch.

Installation. (Ubuntu 12.04)

Terminal window
sudo apt-get install sphinxsearch

In short, model (wiki.models) used for this example.

class Page(models.Model):
name = models.CharField(max_length=255, unique=True, blank=False,
null=False)
content = models.TextField(blank=False, null=False)
description = models.CharField(max_length=255, blank=False, null=False)
created = models.DateTimeField(auto_now_add=True, editable=False,
blank=False, null=False)
edited = models.DateTimeField(auto_now=True, blank=False, null=False)
editor = models.ForeignKey(User, blank=False, null=False)
changes = models.CharField(max_length=255, blank=False, null=False)

We need to create the configuration file.

Terminal window
cd /etc/sphinxsearch/
vim sphinx.conf

Here is my sample configuration.

source wiki_source
{
type = pgsql
sql_host = 127.0.0.1
sql_user = eshlox
sql_pass = PASSWORD
sql_db = helper
sql_port = 5432
sql_query = SELECT id, edited, name, content, description FROM wiki_page
sql_attr_timestamp = edited
sql_query_info = SELECT * FROM wiki_page WHERE id=$id
}
index wiki_index
{
source = wiki_source
path = /home/eshlox/sphinxsearch/helper/data/wiki_index
docinfo = extern
charset_type = utf-8
html_strip = 1
enable_star = 1
min_prefix_len = 2
}
index wiki_rt
{
type = rt
rt_mem_limit = 32M
path = /home/eshlox/sphinxsearch/helper/data/wiki_rt
charset_type = utf-8
rt_field = description
rt_field = content
rt_attr_uint = gid
}
indexer
{
mem_limit = 32M
}
searchd
{
listen = 9312
log = /var/log/sphinxsearch/searchd.log
query_log = /var/log/sphinxsearch/query.log
read_timeout = 5
max_children = 30
pid_file = /var/run/sphinxsearch/searchd.pid
max_matches = 1000
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
workers = threads # for RT to work
binlog_path = /home/eshlox/sphinxsearch/helper/data
compat_sphinxql_magics = 0
}

Create the necessary directories.

Terminal window
eshlox@eshlox:~$ mkdir sphinxsearch
eshlox@eshlox:~$ mkdir sphinxsearch/helper
eshlox@eshlox:~$ mkdir sphinxsearch/helper/data
chown -R sphinxsearch:sphinxsearch sphinxsearch/

Start SphinxSearch.

Terminal window
sudo service sphinxsearch start

Run indexer.

Terminal window
sudo indexer --all

Some tests. Only one core was used.

Terminal window
eshlox@helper ~ $ cat /proc/cpuinfo | grep -i "model name"
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
eshlox@helper ~ $ cat /proc/meminfo | grep -i memtotal
MemTotal: 6115396 kB

Index all - 178 records in database.

Terminal window
helper eshlox # /usr/bin/indexer --all --rotate
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx/sphinx.conf'...
indexing index 'wiki_index'...
collected 179 docs, 0.3 MB
sorted 0.3 Mhits, 100.0% done
total 179 docs, 333768 bytes
total 0.145 sec, 2287288 bytes/sec, 1226.67 docs/sec
skipping non-plain index 'wiki_rt'...
total 3 reads, 0.001 sec, 312.5 kb/call avg, 0.5 msec/call avg
total 9 writes, 0.005 sec, 222.9 kb/call avg, 0.5 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=11809).

Second test.

Terminal window
eshlox@eshlox:~/projects/helper.multimedia.pl/project$ cat /proc/cpuinfo | grep -i "model name"
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
eshlox@eshlox:~/projects/helper.multimedia.pl/project$ cat /proc/meminfo | grep -i memtotal
MemTotal: 8090528 kB

Index all - 100000 records in database.

Terminal window
eshlox@eshlox:/etc/sphinxsearch$ sudo /usr/bin/indexer --all --rotate
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'wiki_index'...
collected 100000 docs, 157.4 MB
sorted 133.2 Mhits, 100.0% done
total 100000 docs, 157396234 bytes
total 64.419 sec, 2443309 bytes/sec, 1552.33 docs/sec
skipping non-plain index 'wiki_rt'...
total 865 reads, 0.113 sec, 573.9 kb/call avg, 0.1 msec/call avg
total 1027 writes, 0.744 sec, 994.6 kb/call avg, 0.7 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=11834).

Search.

Terminal window
eshlox@eshlox:~$ search quo
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...
index 'wiki_index': query 'quo ': returned 1000 matches of 55443 total in 0.034 sec
displaying matches:
1. document=299, weight=3493, edited=Thu Jan 1 01:33:32 1970
2. document=506, weight=3493, edited=Thu Jan 1 01:33:32 1970
3. document=629, weight=3493, edited=Thu Jan 1 01:33:32 1970
4. document=679, weight=3493, edited=Thu Jan 1 01:33:32 1970
5. document=834, weight=3493, edited=Thu Jan 1 01:33:32 1970
6. document=1390, weight=3493, edited=Thu Jan 1 01:33:32 1970
7. document=1503, weight=3493, edited=Thu Jan 1 01:33:32 1970
8. document=1897, weight=3493, edited=Thu Jan 1 01:33:32 1970
9. document=2307, weight=3493, edited=Thu Jan 1 01:33:32 1970
10. document=2512, weight=3493, edited=Thu Jan 1 01:33:32 1970
11. document=2591, weight=3493, edited=Thu Jan 1 01:33:32 1970
12. document=2789, weight=3493, edited=Thu Jan 1 01:33:32 1970
13. document=2982, weight=3493, edited=Thu Jan 1 01:33:32 1970
14. document=3897, weight=3493, edited=Thu Jan 1 01:33:32 1970
15. document=3987, weight=3493, edited=Thu Jan 1 01:33:32 1970
16. document=4275, weight=3493, edited=Thu Jan 1 01:33:32 1970
17. document=4489, weight=3493, edited=Thu Jan 1 01:33:32 1970
18. document=4871, weight=3493, edited=Thu Jan 1 01:33:32 1970
19. document=5245, weight=3493, edited=Thu Jan 1 01:33:32 1970
20. document=5413, weight=3493, edited=Thu Jan 1 01:33:32 1970
words:
1. 'quo': 55443 documents, 82096 hits

Indexer can be added to cron or run from model signal.

Sample usage in Django.

First, we need SphinxSearch API. Copy sphinxapi.py from SphinxSearch source package to PYTHONPATH.

Forms.

# coding=utf-8
from django import forms as forms
class SearchForm(forms.Form):
q = forms.CharField(max_length=255)

Urls.

from django.conf.urls import patterns, url
from views import search
urlpatterns = patterns('',
...
url(r'^search/', search, name="wiki-search")
)

Simple view.

# coding=utf-8
import logging
from sphinxapi import SphinxClient
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
from django.shortcuts import render
from django.contrib import messages
from wiki.models import Page
from wiki.forms import SearchForm
def search(request):
if request.GET:
form = SearchForm(request.GET)
query = request.GET.get('q', '')
s = SphinxClient()
s.SetServer('localhost', 9312)
s.SetLimits(0, 16777215)
if s.Status():
query_results = s.Query(query)
total = query_results['total']
pages_id = [page['id'] for page in query_results['matches']]
if pages_id:
results = Page.objects.filter(id__in=pages_id)
else:
results = None
if results:
paginator = Paginator(results, 25)
page = request.GET.get('page')
try:
results = paginator.page(page)
except PageNotAnInteger:
results = paginator.page(1)
except EmptyPage:
results = paginator.page(paginator.num_pages)
return render(request, 'wiki/search.html',
{'results': results,'total': total,
'query': query, 'form': form})
else:
logger = logging.getLogger('helper')
logger.error('Sphinxsearch Error! %s' % s.GetLastError())
messages.add_message(request, messages.ERROR, 'Search server is '
'not responding. Administrator '
'has been informed.')
form = SearchForm()
return render(request, 'wiki/search.html', {'form': form})
else:
form = SearchForm()
return render(request, 'wiki/search.html', {'form': form})

SphinxSearch really does a great job. Read the documentation to speed up and implement additional options.

Edit. Below is template that I used in a project (wiki/search.html). I removed most of the unnecessary things.

<form method="get" class="well form-search form-inline center">
{% for f in form %}
{{ f|add_class:"input-xxlarge search-query"|attr:"placeholder:Enter a search term.." }}
<button class="btn" type="submit">Search</button>
{% endfor %}
</form>
{% if query %}
<p>Your search for "{{query}}" returned results: {{total}}</p>
<hr />
{% for result in results.object_list %}
<p class="top20">&bull; <a href="{{ result.get_absolute_url }}">{{ result.name|wikiname }}</a></p>
<p>{{ result.description }}</p>
{% empty %}
<div class="alert-message info">
<p>No results.</p>
</div>
{% endfor %}
{% if results %}
<div class="pager">
<ul>
{% if results.has_previous %}
<li><a href="?q={{ query }}&amp;page={{ results.previous_page_number }}">&larr; previous page</a></li>
{% endif %}
<li>Page {{ results.number }} from {{ results.paginator.num_pages }}</li>
{% if results.has_next %}
<li><a href="?q={{ query }}&amp;page={{ results.next_page_number }}">next page &rarr;</a></li>
{% endif %}
</ul>
</div>
{% endif %}
{% endif %}