Sometime ago I needed to create a search in my project. I tested many search engines like whoosh, xapian. I didn’t test Solr - configuration is terrible and it’s Java ;-) I chose SphinxSearch.
Installation. (Ubuntu 12.04)
sudo apt-get install sphinxsearch
In short, model (wiki.models) used for this example.
class Page(models.Model): name = models.CharField(max_length=255, unique=True, blank=False, null=False) content = models.TextField(blank=False, null=False) description = models.CharField(max_length=255, blank=False, null=False) created = models.DateTimeField(auto_now_add=True, editable=False, blank=False, null=False) edited = models.DateTimeField(auto_now=True, blank=False, null=False) editor = models.ForeignKey(User, blank=False, null=False) changes = models.CharField(max_length=255, blank=False, null=False)
We need to create the configuration file.
cd /etc/sphinxsearch/vim sphinx.conf
Here is my sample configuration.
source wiki_source{ type = pgsql sql_host = 127.0.0.1 sql_user = eshlox sql_pass = PASSWORD sql_db = helper sql_port = 5432 sql_query = SELECT id, edited, name, content, description FROM wiki_page sql_attr_timestamp = edited sql_query_info = SELECT * FROM wiki_page WHERE id=$id}
index wiki_index{ source = wiki_source path = /home/eshlox/sphinxsearch/helper/data/wiki_index docinfo = extern charset_type = utf-8 html_strip = 1 enable_star = 1 min_prefix_len = 2}
index wiki_rt{ type = rt rt_mem_limit = 32M path = /home/eshlox/sphinxsearch/helper/data/wiki_rt charset_type = utf-8 rt_field = description rt_field = content rt_attr_uint = gid}
indexer{ mem_limit = 32M}
searchd{ listen = 9312 log = /var/log/sphinxsearch/searchd.log query_log = /var/log/sphinxsearch/query.log read_timeout = 5 max_children = 30 pid_file = /var/run/sphinxsearch/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work binlog_path = /home/eshlox/sphinxsearch/helper/data compat_sphinxql_magics = 0}
Create the necessary directories.
eshlox@eshlox:~$ mkdir sphinxsearcheshlox@eshlox:~$ mkdir sphinxsearch/helpereshlox@eshlox:~$ mkdir sphinxsearch/helper/datachown -R sphinxsearch:sphinxsearch sphinxsearch/
Start SphinxSearch.
sudo service sphinxsearch start
Run indexer.
sudo indexer --all
Some tests. Only one core was used.
eshlox@helper ~ $ cat /proc/cpuinfo | grep -i "model name"model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHzmodel name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
eshlox@helper ~ $ cat /proc/meminfo | grep -i memtotalMemTotal: 6115396 kB
Index all - 178 records in database.
helper eshlox # /usr/bin/indexer --all --rotateSphinx 2.0.4-release (r3135)Copyright (c) 2001-2012, Andrew AksyonoffCopyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx/sphinx.conf'...indexing index 'wiki_index'...collected 179 docs, 0.3 MBsorted 0.3 Mhits, 100.0% donetotal 179 docs, 333768 bytestotal 0.145 sec, 2287288 bytes/sec, 1226.67 docs/secskipping non-plain index 'wiki_rt'...total 3 reads, 0.001 sec, 312.5 kb/call avg, 0.5 msec/call avgtotal 9 writes, 0.005 sec, 222.9 kb/call avg, 0.5 msec/call avgrotating indices: succesfully sent SIGHUP to searchd (pid=11809).
Second test.
eshlox@eshlox:~/projects/helper.multimedia.pl/project$ cat /proc/cpuinfo | grep -i "model name"model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHzmodel name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
eshlox@eshlox:~/projects/helper.multimedia.pl/project$ cat /proc/meminfo | grep -i memtotalMemTotal: 8090528 kB
Index all - 100000 records in database.
eshlox@eshlox:/etc/sphinxsearch$ sudo /usr/bin/indexer --all --rotateSphinx 2.0.4-release (r3135)Copyright (c) 2001-2012, Andrew AksyonoffCopyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...indexing index 'wiki_index'...collected 100000 docs, 157.4 MBsorted 133.2 Mhits, 100.0% donetotal 100000 docs, 157396234 bytestotal 64.419 sec, 2443309 bytes/sec, 1552.33 docs/secskipping non-plain index 'wiki_rt'...total 865 reads, 0.113 sec, 573.9 kb/call avg, 0.1 msec/call avgtotal 1027 writes, 0.744 sec, 994.6 kb/call avg, 0.7 msec/call avgrotating indices: succesfully sent SIGHUP to searchd (pid=11834).
Search.
eshlox@eshlox:~$ search quoSphinx 2.0.4-release (r3135)Copyright (c) 2001-2012, Andrew AksyonoffCopyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinxsearch/sphinx.conf'...index 'wiki_index': query 'quo ': returned 1000 matches of 55443 total in 0.034 sec
displaying matches:1. document=299, weight=3493, edited=Thu Jan 1 01:33:32 19702. document=506, weight=3493, edited=Thu Jan 1 01:33:32 19703. document=629, weight=3493, edited=Thu Jan 1 01:33:32 19704. document=679, weight=3493, edited=Thu Jan 1 01:33:32 19705. document=834, weight=3493, edited=Thu Jan 1 01:33:32 19706. document=1390, weight=3493, edited=Thu Jan 1 01:33:32 19707. document=1503, weight=3493, edited=Thu Jan 1 01:33:32 19708. document=1897, weight=3493, edited=Thu Jan 1 01:33:32 19709. document=2307, weight=3493, edited=Thu Jan 1 01:33:32 197010. document=2512, weight=3493, edited=Thu Jan 1 01:33:32 197011. document=2591, weight=3493, edited=Thu Jan 1 01:33:32 197012. document=2789, weight=3493, edited=Thu Jan 1 01:33:32 197013. document=2982, weight=3493, edited=Thu Jan 1 01:33:32 197014. document=3897, weight=3493, edited=Thu Jan 1 01:33:32 197015. document=3987, weight=3493, edited=Thu Jan 1 01:33:32 197016. document=4275, weight=3493, edited=Thu Jan 1 01:33:32 197017. document=4489, weight=3493, edited=Thu Jan 1 01:33:32 197018. document=4871, weight=3493, edited=Thu Jan 1 01:33:32 197019. document=5245, weight=3493, edited=Thu Jan 1 01:33:32 197020. document=5413, weight=3493, edited=Thu Jan 1 01:33:32 1970
words:1. 'quo': 55443 documents, 82096 hits
Indexer can be added to cron or run from model signal.
Sample usage in Django.
First, we need SphinxSearch API. Copy sphinxapi.py from SphinxSearch source package to PYTHONPATH.
Forms.
# coding=utf-8from django import forms as forms
class SearchForm(forms.Form): q = forms.CharField(max_length=255)
Urls.
from django.conf.urls import patterns, urlfrom views import search
urlpatterns = patterns('', ... url(r'^search/', search, name="wiki-search"))
Simple view.
# coding=utf-8import loggingfrom sphinxapi import SphinxClientfrom django.core.paginator import Paginator, EmptyPage, PageNotAnIntegerfrom django.shortcuts import renderfrom django.contrib import messagesfrom wiki.models import Pagefrom wiki.forms import SearchForm
def search(request): if request.GET: form = SearchForm(request.GET) query = request.GET.get('q', '') s = SphinxClient() s.SetServer('localhost', 9312) s.SetLimits(0, 16777215) if s.Status(): query_results = s.Query(query) total = query_results['total'] pages_id = [page['id'] for page in query_results['matches']] if pages_id: results = Page.objects.filter(id__in=pages_id) else: results = None if results: paginator = Paginator(results, 25) page = request.GET.get('page') try: results = paginator.page(page) except PageNotAnInteger: results = paginator.page(1) except EmptyPage: results = paginator.page(paginator.num_pages) return render(request, 'wiki/search.html', {'results': results,'total': total, 'query': query, 'form': form}) else: logger = logging.getLogger('helper') logger.error('Sphinxsearch Error! %s' % s.GetLastError()) messages.add_message(request, messages.ERROR, 'Search server is ' 'not responding. Administrator ' 'has been informed.') form = SearchForm() return render(request, 'wiki/search.html', {'form': form}) else: form = SearchForm() return render(request, 'wiki/search.html', {'form': form})
SphinxSearch really does a great job. Read the documentation to speed up and implement additional options.
Edit. Below is template that I used in a project (wiki/search.html). I removed most of the unnecessary things.
<form method="get" class="well form-search form-inline center"> {% for f in form %} {{ f|add_class:"input-xxlarge search-query"|attr:"placeholder:Enter a search term.." }} <button class="btn" type="submit">Search</button> {% endfor %}</form>{% if query %} <p>Your search for "{{query}}" returned results: {{total}}</p> <hr /> {% for result in results.object_list %} <p class="top20">• <a href="{{ result.get_absolute_url }}">{{ result.name|wikiname }}</a></p> <p>{{ result.description }}</p> {% empty %} <div class="alert-message info"> <p>No results.</p> </div> {% endfor %} {% if results %} <div class="pager"> <ul> {% if results.has_previous %} <li><a href="?q={{ query }}&page={{ results.previous_page_number }}">← previous page</a></li> {% endif %} <li>Page {{ results.number }} from {{ results.paginator.num_pages }}</li> {% if results.has_next %} <li><a href="?q={{ query }}&page={{ results.next_page_number }}">next page →</a></li> {% endif %} </ul> </div> {% endif %}{% endif %}