After looking around for a search engine for Nightingale, and comparing features between all the various ones (from using tsearch2 on postgres to Sphinx to Solr to …) I’ve settled on Solr. Configuring and running Solr was (much) easier than expected at the start. After about a day of hacking around, I got a nice tag-browser running, with Alax-ified searching through the tags. A bit more hacking around and I decided I would write a mash-up of all existing Python Solr wrappers. SolPython and solr.py, the one that’s included with Solr, seemed very unpythonic and little developed. PySolr on the other hand looked very nice, but there were some things in it I thought vould be better. Particularly, i wondered why the authors (two known Python/Django devs) used the XML parsing and didn’t use the JSON output. When you search in Solr, you can tell it to reply you in a number of output formats. They chose the XML output, I rewrote to use the JSON output, and allowed for more output parsers to be written / plugged in.
Neither Solr.py or PySol has classes for wrapping the search parameters. After reading through the docs I added a lightweight wrapper for a lot of the parameters.
We keep track of searches in freesound, so we can “replay” those searches for testing purposes, and after a bit of testing I found out some interesting things. Using Solr and a relatively heavy set of output features (I want to see a lot of “faceting”), I tested a batch of 100K searches. It looks like I can run 50 queries per second on my macbook pro. As the set of documents in Freesound is relatively small (“only” 50K sounds), everything fits very nicely in a very small cache (only 128MB), inluding all faceting data.
As before this source code is also open source, but -as Xavier gave me the go-ahead- this one is BSD instead og GPL. I will continue to release “support code” under the BSD license.
The code can be found here: http://iua-share.upf.edu/svn/nightingale/trunk/sandbox/solr/solr.py
The example code I used for benchmarking here: http://iua-share.upf.edu/svn/nightingale/trunk/sandbox/solr/freesound_test.py
Python/Solr people, feel free to send me any feedback!