dj-stripe: Race conditions in Customer.get_or_create()
Since landing #440 I’ve been trying to figure out the race conditions in Customer.get_or_create().
The issue is the same as #429, except that in this case we don’t have a stripe ID to play with. Let’s look at the code:
@classmethod
def get_or_create(cls, subscriber, livemode=djstripe_settings.STRIPE_LIVE_MODE):
try:
return Customer.objects.get(subscriber=subscriber, livemode=livemode), False
except Customer.DoesNotExist:
return cls.create(subscriber), True
When this path is reached by two threads at the same time and the customer does not exist in the local db, both of them reach the last line, which creates the customer. The Stripe API, having no concept of our per-user unique, will happily create both of them. dj-stripe will crash on the second thread because the unique_together constraint fails… but the customer is still created upstream; not only are we in a desynced state now, but the stripe db is polluted with a useless and confusing object.
So, what to do? The correct acid-compliant fix is to create the Customer object in the db beforehand. But… we don’t have a stripe_id for it yet. We can save it with an empty stripe id, or a random one, but there is still a synchronicity problem: The second thread will be able to get the customer object, but the data on it is not current. That’s detectable and not a problem right? Well, no, because we dont have a stripe_id, we can’t sync it.
I’m a bit stumped. My first idea is we could create the Customer beforehand, attach the user id in the metadata and use that for synchronization? This is part of a larger overall issue of keeping customers in sync between downstream and upstream: If a customer is deleted locally, it’s impossible to retrieve the upstream object.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 47 (42 by maintainers)
Yeah, I considered select_for_update(). I ended up not using it simply because Stripe themselves advised using their Idempotency Key system for this scenario, which I figured we would want to introduce anyway.
Yep no problem at all.
There’s no way this is a performance issue even at extreme scales. We’re interpolating directly in Python, not at the DB level.
Yes, this is more of an issue. Right now though I’m not doing a delete, so it’s a single insert per cycle. Deletes can be batched every 24 hours (when the keys themselves expire); when they happen depends very much on the action.
I’m not sure the insert is significant even at scale. If it is though, that’s why there’s the possibility to switch to another idempotency key backend.
I have my concerns on this. It’s worth mentioning that because Idempotency Keys have a lifetime of 24 hours, they’re completely safe to just leave in the db. The only issue is when the table grows large enough that queries slow down (which takes a very long time for the type of model this is).
Also remember that currently, because it’s only used for customer creation, the idempotency key table will never grow larger than your user table (x2 if you’re doing live + test mode in the same db which you probably aren’t at that scale 😛). If you clean it up regularly, it’ll never grow past your daily new customer count (a subset of your daily new user count). This makes it, right now, a very small table even if you’re Ebay.
This will change of course when we get to using those keys for other actions, but we still haven’t gotten to a design for those 😃
I don’t believe I am misunderstanding. I realise that Stripe isn’t involved at all and I realise that they use the idempotency key to guarantee that the same request won’t happen twice, but only if they get the same key from us twice. What I’m saying is that the UUID that we generate will be lost, despite it being created first, despite the same input key of “customer:create:1234” being used, despite the request being sent to Stripe, if anything causes that transaction to rollback. The next time we try to do the exact same actions it will result in a brand new uuid. So the point is, we have no way to send them the same UUID again in the disaster scenario. Maybe using
Customer
is a bad example of this, or I’m just awful at explaining … or it isn’t a valid concern. Be worth seeing what the others think, but concurrency and consistency of anything is typically a nightmare. 😃@kavdev Tell me I’m an idiot and I’ll be quiet. 😉
@kavdev Good question on the key fields.
To be honest it looks like most fields should apply to the idempotency key. In fact if we did just start with all of the keys, that would still result in a stronger guarantee of idempotence than having no idempotency key at all. I’d be in two minds if we should make it a whitelist or a blacklist. If it’s a whitelist, we’ll explicitly list the keys that contribute. If it’s a blacklist, we’ll explicitly list the keys that don’t.
The intention isn’t really to open them up to derived classes as a primary use case, it’s more to enable easier implementation and maintenance. In saying that, there might be things that users would like to contribute to idempotence that is out of our control and understanding (e.g. a metadata value that the expects to contribute). I personally don’t have a use case for changing the fields from what would be defined as the standard set.