Freesound has been struggling to regain stability over the last few weeks but we are happy to announce that we finally made it:) The problems started about a month ago when we realized our DB configuration was completely wrong for our hardware and the kind of traffic we are serving. After fiddling around with it we settled for a setup that seemed to work OK. Even so, problems didn’t stop with the sounds page 404ing at times, resulting in the whole site 404ing. We’ve finally pinned this down to some horribly constructed queries our Django ORM was doing for us…
After we rewrote those queries the load average of the DB server was reduced dramatically as you can see from this beautiful Munin graph which shows the load average before (pre 18:00) and after (post 18:00) we deployed the version with the fixed queries 🙂
Essentially these changes made the site more stable but our problems were far from over. Load average on our webserver machine was off the roof, making impossible for any sane sysadmin to sleep… While checking web servers logs we found repeated error of broken connections… This time the culprit seemed to be the flup module that Django uses by default to serve fastcgi. All efforts to isolate this problem proved fruitless and we finally decided to ditch flup and move to Gunicorn which in general is a good idea concerning how things are moving in the Python world. After a period of testing we deployed to production the 3rd of November and are happy to announce that the results are really good! Have a look at the graph below. Load has dropped from peaking at 11! to peaking at 4! and being most of the time at around 1. Which is expected and acceptable for the limited hardware we are running on.
So sorry for the long technical post but we really felt we had to give an explanation for the poor service you’ve seen lately. With this stability we’re back at working on application bugs and even better new features 🙂