The Python Package Index (PyPI) uses the very cool sqlite engine to store the database of package information. Sqlite is cool because it's so simple to use and self-contained. Unfortunately, it's not a multi-user database. This means it locks the database when anyone accesses it. This caused PyPI some problems because ... well, PyPI is much more popular than I'd anticipated :)
After some brief analysis, I found:
- The RSS feed gets hit about every 30 seconds or so (on average)
- Some other PyPI page is hit at around the same frequency
- About every third of those other hits is to the browse code, and the browse code was slow - taking up to 30 seconds to complete a request
Of course, this is all using averages, so during times of peak requests (ie. lunchtime in the US ;) then the rates are higher. And the combination of many requests and slow code result in users seeing "sorry, the database is locked".
To remedy this, I've:
- Cached the RSS feed, so it only rarely has to hit the database
- Significantly improved the speed (and accuracy while I was at it) of the browsing code
So hopefully things will run much smoother. Please, go kick the tyres and let me know if I've broken anything :)