DEVOPS , PROGRAMMING , LINUX

SphinxSearch and Django on Ubuntu

#django , #ubuntu , #python , #sphinxsearch

Sometime ago I needed to create a search in my project. I tested many search engines like whoosh, xapian. I didn’t test Solr - configuration is terrible and it’s Java ;-) I chose SphinxSearch.

Installation. (Ubuntu 12.04)

sudo apt-get install sphinxsearch

In short, model (wiki.models) used for this example.

class Page(models.Model):
    name = models.CharField(max_length=255, unique=True, blank=False,
                            null=False)
    content = models.TextField(blank=False, null=False)
    description = models.CharField(max_length=255, blank=False, null=False)
    created = models.DateTimeField(auto_now_add=True, editable=False,
                                   blank=False, null=False)
    edited = models.DateTimeField(auto_now=True, blank=False, null=False)
    editor = models.ForeignKey(User, blank=False, null=False)
    changes = models.CharField(max_length=255, blank=False, null=False)

We need to create the configuration file.

cd /etc/sphinxsearch/
vim sphinx.conf

Here is my sample configuration.

source wiki_source
{
    type = pgsql
    sql_host = 127.0.0.1
    sql_user = eshlox
    sql_pass = PASSWORD
    sql_db = helper
    sql_port = 5432
    sql_query = SELECT id, edited, name, content, description FROM wiki_page
    sql_attr_timestamp = edited
    sql_query_info = SELECT * FROM wiki_page WHERE id=$id
}

index wiki_index
{
    source = wiki_source
    path = /home/eshlox/sphinxsearch/helper/data/wiki_index
    docinfo = extern
    charset_type = utf-8
    html_strip = 1
    enable_star = 1
    min_prefix_len = 2
}

index wiki_rt
{
    type = rt
    rt_mem_limit = 32M
    path = /home/eshlox/sphinxsearch/helper/data/wiki_rt
    charset_type = utf-8
    rt_field = description
    rt_field = content
    rt_attr_uint = gid
}

indexer
{
    mem_limit = 32M
}

searchd
{
    listen = 9312
    log = /var/log/sphinxsearch/searchd.log
    query_log = /var/log/sphinxsearch/query.log
    read_timeout = 5
    max_children = 30
    pid_file = /var/run/sphinxsearch/searchd.pid
    max_matches = 1000
    seamless_rotate = 1
    preopen_indexes = 1
    unlink_old = 1
    workers = threads # for RT to work
    binlog_path = /home/eshlox/sphinxsearch/helper/data
    compat_sphinxql_magics = 0
}

Create the necessary directories.

[email protected]:~$ mkdir sphinxsearch
[email protected]:~$ mkdir sphinxsearch/helper
[email protected]:~$ mkdir sphinxsearch/helper/data
chown -R sphinxsearch:sphinxsearch sphinxsearch/

Start SphinxSearch.

sudo service sphinxsearch start

Run indexer.

sudo indexer --all

Some tests. Only one core was used.

[email protected] ~ $ cat /proc/cpuinfo | grep -i "model name"
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz
model name      : Intel(R) Xeon(R) CPU           E5430  @ 2.66GHz

[email protected] ~ $ cat /proc/meminfo | grep -i memtotal
MemTotal:        6115396 kB

Index all - 178 records in database.

helper eshlox # /usr/bin/indexer --all --rotate
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinx/sphinx.conf'...
indexing index 'wiki_index'...
collected 179 docs, 0.3 MB
sorted 0.3 Mhits, 100.0% done
total 179 docs, 333768 bytes
total 0.145 sec, 2287288 bytes/sec, 1226.67 docs/sec
skipping non-plain index 'wiki_rt'...
total 3 reads, 0.001 sec, 312.5 kb/call avg, 0.5 msec/call avg
total 9 writes, 0.005 sec, 222.9 kb/call avg, 0.5 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=11809).

Second test.

[email protected]:~/projects/helper.multimedia.pl/project$ cat /proc/cpuinfo | grep -i "model name"
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
model name      : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz

[email protected]:~/projects/helper.multimedia.pl/project$ cat /proc/meminfo | grep -i memtotal
MemTotal:        8090528 kB

Index all - 100000 records in database.

[email protected]:/etc/sphinxsearch$ sudo /usr/bin/indexer --all --rotate
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'wiki_index'...
collected 100000 docs, 157.4 MB
sorted 133.2 Mhits, 100.0% done
total 100000 docs, 157396234 bytes
total 64.419 sec, 2443309 bytes/sec, 1552.33 docs/sec
skipping non-plain index 'wiki_rt'...
total 865 reads, 0.113 sec, 573.9 kb/call avg, 0.1 msec/call avg
total 1027 writes, 0.744 sec, 994.6 kb/call avg, 0.7 msec/call avg
rotating indices: succesfully sent SIGHUP to searchd (pid=11834).

Search.

[email protected]:~$ search quo
Sphinx 2.0.4-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinxsearch/sphinx.conf'...
index 'wiki_index': query 'quo ': returned 1000 matches of 55443 total in 0.034 sec

displaying matches:
1. document=299, weight=3493, edited=Thu Jan  1 01:33:32 1970
2. document=506, weight=3493, edited=Thu Jan  1 01:33:32 1970
3. document=629, weight=3493, edited=Thu Jan  1 01:33:32 1970
4. document=679, weight=3493, edited=Thu Jan  1 01:33:32 1970
5. document=834, weight=3493, edited=Thu Jan  1 01:33:32 1970
6. document=1390, weight=3493, edited=Thu Jan  1 01:33:32 1970
7. document=1503, weight=3493, edited=Thu Jan  1 01:33:32 1970
8. document=1897, weight=3493, edited=Thu Jan  1 01:33:32 1970
9. document=2307, weight=3493, edited=Thu Jan  1 01:33:32 1970
10. document=2512, weight=3493, edited=Thu Jan  1 01:33:32 1970
11. document=2591, weight=3493, edited=Thu Jan  1 01:33:32 1970
12. document=2789, weight=3493, edited=Thu Jan  1 01:33:32 1970
13. document=2982, weight=3493, edited=Thu Jan  1 01:33:32 1970
14. document=3897, weight=3493, edited=Thu Jan  1 01:33:32 1970
15. document=3987, weight=3493, edited=Thu Jan  1 01:33:32 1970
16. document=4275, weight=3493, edited=Thu Jan  1 01:33:32 1970
17. document=4489, weight=3493, edited=Thu Jan  1 01:33:32 1970
18. document=4871, weight=3493, edited=Thu Jan  1 01:33:32 1970
19. document=5245, weight=3493, edited=Thu Jan  1 01:33:32 1970
20. document=5413, weight=3493, edited=Thu Jan  1 01:33:32 1970

words:
1. 'quo': 55443 documents, 82096 hits

Indexer can be added to cron or run from model signal.

Sample usage in Django.

First, we need SphinxSearch API. Copy sphinxapi.py from SphinxSearch source package to PYTHONPATH.

Forms.

# coding=utf-8
from django import forms as forms


class SearchForm(forms.Form):
    q = forms.CharField(max_length=255)

Urls.

from django.conf.urls import patterns, url
from views import search


urlpatterns = patterns('',
    ...
    url(r'^search/', search, name="wiki-search")
)

Simple view.

```python line-numbers

coding=utf-8

import logging from sphinxapi import SphinxClient from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger from django.shortcuts import render from django.contrib import messages from wiki.models import Page from wiki.forms import SearchForm

def search(request): if request.GET: form = SearchForm(request.GET) query = request.GET.get(‘q’, “) s = SphinxClient() s.SetServer(‘localhost’, 9312) s.SetLimits(0, 16777215) if s.Status(): query_results = s.Query(query) total = query_results[‘total’] pages_id = [page[‘id’] for page in query_results[‘matches’]] if pages_id: results = Page.objects.filter(id__in=pages_id) else: results = None if results: paginator = Paginator(results, 25) page = request.GET.get(‘page’) try: results = paginator.page(page) except PageNotAnInteger: results = paginator.page(1) except EmptyPage: results = paginator.page(paginator.num_pages) return render(request, ‘wiki/search.html’, {‘results’: results,‘total’: total, ‘query’: query, ‘form’: form}) else: logger = logging.getLogger(‘helper’) logger.error(‘Sphinxsearch Error! %s’ % s.GetLastError()) messages.add_message(request, messages.ERROR, ‘Search server is ‘ ‘not responding. Administrator ‘ ‘has been informed.’) form = SearchForm() return render(request, ‘wiki/search.html’, {‘form’: form}) else: form = SearchForm() return render(request, ‘wiki/search.html’, {‘form’: form})


SphinxSearch really does a great job. Read the documentation to speed up and implement additional options.

Edit. Below is template that I used in a project (wiki/search.html). I removed most of the unnecessary things.

```django
<form method="get" class="well form-search form-inline center">
    {% for f in form %}
        {{ f|add_class:"input-xxlarge search-query"|attr:"placeholder:Enter a search term.." }}
        <button class="btn" type="submit">Search</button>
    {% endfor %}
</form>
{% if query %}
    <p>Your search for "{{query}}" returned results: {{total}}</p>
    <hr />
    {% for result in results.object_list %}
        <p class="top20">&bull; <a href="{{ result.get_absolute_url }}">{{ result.name|wikiname }}</a></p>
        <p>{{ result.description }}</p>
    {% empty %}
        <div class="alert-message info">
            <p>No results.</p>
        </div>
    {% endfor %}
    {% if results %}
            <div class="pager">
                <ul>
                    {% if results.has_previous %}
                        <li><a href="?q={{ query }}&amp;page={{ results.previous_page_number }}">&larr; previous page</a></li>
                    {% endif %}
                    <li>Page {{ results.number }} from {{ results.paginator.num_pages }}</li>
                    {% if results.has_next %}
                        <li><a href="?q={{ query }}&amp;page={{ results.next_page_number }}">next page &rarr;</a></li>
                    {% endif %}
                </ul>
            </div>
    {% endif %}
{% endif %}