sequelize: Comparing timestamps can lead to data inconsistencies in soft deleted records
I’ve written about this before a few times but I will summarize my concerns again here. Apologies for eschewing the issue template and for the length of this post, but I wanted to use this as a place for summarizing all the issues around Sequelize’s practice of comparing timestamps for soft deletes.
Use of client time instead of DB time
Before #8485, Sequelize would soft delete records using client’s time, but query for them using database’s time (by “client’s time” I mean client of the database, e.g. an app server, ad hoc script, etc). This lead to inconsistencies that have been well documented; namely if client and DB clocks aren’t in sync, you can potentially see a soft deleted record.
#8485 normalized this so that both delete and query operations use the client’s time. This is good in that you no longer have to worry about time zone differences or clock drift between the DB and the client.
But the safer choice would arguably be to use the DB’s time rather than the client’s. This is because a typical app will have a number of clients (load balanced app instances, ad hoc scripts running on dev machines, etc), all of which need to keep their clocks in sync with each other, compared to only a handful of DB instances (and often only 1).
This problem still exists when using DB time if you have multiple database instances in a cluster, but typically the number of DB instances will be far less than the number of clients. Additionally, DB instances tend to be pretty homogeneous and under tight control, so keeping their clocks in sync would tend to be much easier than keeping client clocks in sync.
That being said, I still think it’s still dangerous to rely on DB clocks being in sync, especially when you have multiple DB instances and non-monotonic time sources.
Non-monotonic time source
Javascript’s Date
object is not monotonic, i.e. in the presence of NTP, leap seconds, users manually adjusting the system time, etc, it can tick backwards. When that happens, your previously deleted records are now scheduled to be deleted in the future.
For single node deployments, this can potentially be fixed by using something like process.hrtime
as a time source. But clock drift between nodes is still a problem.
Clock drift between nodes
Even if we use a monotonic time source, keeping multiple clocks in sync is really hard. Google resorts to atomic clocks to do it! The linked article mentions that NTP, which I can imagine is more typical than atomic clocks for the majority of Sequelize’s users, will typically give you an upper bound for clock offsets at “somewhere between 100ms and 250ms.”
Prior art
I haven’t researched this too much, but a cursory search reveals that alternatives like ActiveRecord’s ActsAsParanoid do not compare timestamps and instead compare to a sentinel value. I’d be curious to see if they’re consciously not supporting future deletes or have just never implemented it. Regardless, if Sequelize’s behavior in this space really is unique, that seems like more evidence that it’s not a safe practice. Especially when it’s for a feature (scheduled deletes) that few may use.
Conclusion
I think scheduled soft deletes weaken the consistency of Sequelize. It is very possible to see a soft deleted record 250ms or more after it was soft deleted. Sequelize should consider changing the default behavior to something safer.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 9
- Comments: 16 (9 by maintainers)
@papb In Sequelize v4, if deletedAt was set to a timestamp in the future, then a record would not be considered as deleted. This was useful to schedule records for deletion. But as pointed out in this issue, it can lead to consistency issues. Thus, it was removed in v5 in favor of considering a record as deleted if deletedAt is set at all.
We are trying to move to Sequelize 4 also, and running into this same issue. I similarly believe this was a mistake to force upon everyone by default. We also have indices that use ´deletedAt is not null`, and I would really prefer to keep those. We have no interest in some soft-delete in the future, which I think is not a real use case for the paranoid system.
Could this not be made an option to enable/disable at least?
I agree with @crispkiwi, my team has also recently run into issues with the new default
deletedAt
time comparison. While it is nice to have this as a feature, we’d prefer to be able to disable it. We can use a hack like this one to disable the comparison:However, this breaks
Model.restore()
,Instance.restore()
, and who knows what else. We are using MySQL which AFAIK does not haveInfinity
. Also, it seems unnatural that alldeletedAt
fields of undeleted records have to be set to a certainnot-null
value just to get better query performance, which should IMHO be the default.Not OP, but if they are asking how to emulate v4’s future deletion feature, then that’s a valid question.
Assuming that’s what you want to do, I’ve had success monkey patching
Model._paranoidClause
. Caveat: I did that to fix v4’s consistency issue, I don’t know if it’s feasible in v5 to emulate v4 behavior.Thanks for suggesting the above @gabegorelick. I’ve had a look at the
_paranoidClause
function on theModel
class. What you suggested would indeed work. Setting a default value on thedeletedAt
property removes the timestamp comparison from lookups completely, instead falling back to a straightWHERE deletedAt = ‘defaultValue’
.This raises the question, why do the time comparison at all? Setting a default value should not have the unintended consequence of negating the scheduled deletion “feature”. It again leads me to believe that it was, at least in part, an afterthought added on to a fix for a “semantic problem”.
I’m not sure if want to use this as a workaround for my scenario of soft deletion free indexes, as this has the potential to change because the behaviour is confusing.
I hope you don’t mind me bringing up these points here, I don’t want to derail the discussion from all your initial points which are spot on.
If you make
deletedAt
NOT NULL and add a default value likeInfinity
(which Sequelize recently added support for), then Sequelize will generate queries closer to what you want (WHERE deletedAt != 'Infinity'
). Not using NULL also means any unique constraints that includedeletedAt
will work in SQL compliant DBs like Postgres where NULL <> NULL.Comparing timestamps for soft deleted records has made the sequelize v4 migration a real pain. I believe it was a mistake to introduce this. I hope this can be changed back in v5, until then we may have to run a fork.
The main issue for us is that soft deleted records cannot be excluded from an index. Under v3 we have the condition
WHERE deletedAt IS NULL
on all our indexes. We obviously want to keep them slim and including soft deleted records would add unnecessary bloat. The change to include a timestamp comparison in the lookup has a big implication. Our existing indexes won’t be used and updating them so lookups don’t have to do a full scan would include all soft deleted records in the index.We encountered this problem attempting to migrate to v4, my initial concerns raised here https://github.com/sequelize/sequelize/issues/7884
The original change https://github.com/sequelize/sequelize/issues/5880 was aimed at solving a semantic problem. I don’t think the consequences of this change were properly thought through. Supporting future deletions of records seems like an afterthought - functionality that in my opinion belongs in the application layer, not the ORM.
deletedAt
was providing useful metadata for auditing and acting as the boolean for the deleted state. Semantically, I didn’t see it as a problem.The issues we have had trying to migrate and the problems outlined by @gabegorelick outweigh the benefit to the few who actually want the feature of future deletions. I do hope this can be addressed.