requests-cache: OperationalError: database is locked with SQLite on PythonAnywhere

Recently in my app I have reduced the quantity of embedded data I ask from an API. This leads to smaller requests (and was necessary in some cases as their endpoint would time out), but also a lot of new small requests to get the extra information. Which often ends up being identical and can be triggered near-simultaneously.

In theory this wouldn’t be an issue, as all those extra calls would simply use the cache, (in fact, it should lead to less non-cached requests, which is a net-gain for me), but instead I find myself suddenly hitting a lot of sqlite3.OperationalError: database is locked. Which I’m assuming might be due to simultaneous write attempts.

I need my cache to work across web-workers, store potentially very large responses (so no DynamoDB) and I don’t have access to Redis on my host. How should I go about resolving my issue? My session object:

session = CachedSession(
    expire_after=timedelta(days=1),
    allowable_codes=(200, 500),
    ignored_parameters=["status", "video-only", "embed"],
    backend=configs.cached_session_backend,  # None, so the default SQLite
    fast_save=True,
    use_temp=True)
session.mount("https://", HTTPAdapter(pool_maxsize=RATE_LIMIT + 1))  # 101

Relates to #514

Related question, I see there’s a new filesystem backend. How does it compare with SQLite in terms of performance (looking up hundreds if not thousands of cached requests) and size? The documentation doesn’t really tell why I’d want that over SQLite.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 27 (14 by maintainers)

Most upvoted comments

redislite was so easy to implement I feel dumb not trying it sooner… Props to you for making an interface that’s so easy to swap backends. And this way I can use a Redis backend on PythonAnywhere, and an SQLite one on my local machine https://github.com/Avasam/speedrun.com_global_scoreboard_webapp/commit/f8967a4baf7debada047a80897705ef5b4fd445a

It seems to be working great, and quickly, at first glance, I’ll go ahead and stress-test this.

This issue could be mitigated by using a nested directory structure based on the MD5 hash of the key

A fan-out file cache like you’re describing is definitely something I’d be willing to add (or accept a PR for), if there’s a need for it. That should be an additional backend, a subclass of the existing FileCache class rather than a replacement for it, since the current single-directory storage is a bit more convenient for the case where you want to manually view your response data.

The use case for that would basically be when you want to both: A) Use cached response data outside of requests-cache B) Efficiently store a large number of responses

If B is all you need, SQLite is still going to be faster.

Edit: Made a separate issue for this, for reference: #604

0.8.1 is the last version that does not exhibit this behavior. It starts happening at 0.8.2dev0. I would like to reopen this issue as a bug now that I’m able to replicate it confidently. (Probably one of those changes in requests_cache/cache_keys.py 5a907bc…e9f425d)

Great!

redislite was so easy to implement I feel dumb not trying it sooner…

Well, we learned a bit more from doing things the hard way, eh? I added some notes to the SQLite backend docs based on this. Also added an example of using redislite, for others’ reference.

That feature was added very recently, see also https://github.com/reclosedev/requests-cache/issues/564#issuecomment-1090847370 for how to do it in older versions of requests-cache. wal mode is saved, so you could also issue it once with sqlite3 against the cache database and see if it helps.

Just released another patch today. It includes a few thread safety improvements that may or may not be related to your issue. Give v0.9.3 a try and let me know if you still get OperationalErrors.

In the mean time, when requests-cache works with requests-ratelimiter, does it take into account that a response may be taken from cache and never hit the API when applying the rate limit?

Exactly, it will only count requests sent over the network, not cache hits.

Alright, I’ll close this for now, but we can reopen it if it comes up again.