realm-core: Table::set() should not record transaction log entries for setting to the current value

The end-user motivation for this is https://github.com/realm/realm-cocoa/issues/3489. The short summary is that upserting a json blob into a Realm currently triggers notifications even if none of the values actually changed. This has also emerged as a problem for sync, as the redundant sets are still recorded in the transaction log and so use bandwidth and CPU time on all clients.

This is something we should do for 3.0. It’s a breaking change, so it needs to happen now or not anytime soon, and I don’t think anyone particularly wants the current behavior.

Conceptually this can be as simple as making Table::set() be:

template<typename T>
void Table::set(size_t col, size_t row, T value)
{
    if (get<T>(col, row) == value)
      return;
    // do set
}

For some column types it may be worth pushing the check in deeper to avoid redundant validation/bptree traversals/etc. should benchmarking indicate that this actually has meaningful overhead.


There’s a few other ways we could handle this other than making changes in core, none of which I feel are viable.

  1. Filter out sets to the current value in the notification code.

    I took a stab at implementing this and it turned out to be very complicated and/or slow due to that the code processing the transaction log doesn’t have access to the Realm in the state that it would be at the point where each instruction is handled, so a lot of additional state needs to be tracked to either support lookups there or to compare the fields afterwards. Also doesn’t help the sync case at all.

  2. Special-case it in the createOrUpdate logic.

    This inherently imposes inconsistency in our API, which is something we should try to avoid.

  3. Make self-assignments no-ops at the ObjectStore layer.

    Currently not a practical option due to that not all property sets actually go through the object store, and mandating that bindings never call Table::set() directly would be a large project for the 3.0 timeframe.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 36 (34 by maintainers)

Most upvoted comments

After thinking about it a bit more, I think the merge algorithm should be able to generate the desired instructions in all cases — the main drawback being that work is required inside the merge algorithm in the first place…

If the existing value is identical, Array::set returns early before CoW.

For sync, it does create a behavioral change, because of “last write wins”. This means that if a different client wrote to the field between the previous set instruction and the one you are throwing away, the other client will now “win” with this change.

By the way, we are implementing log compaction in Sync to address the same situation (redundant instructions), so the motivation to do this should be purely for Core performance. 😃

By the way, it would be fine from our perspective to skip writing to the database, but still emit the instruction (potentially reducing amount of CoW-memory), but also potentially not making any difference at all, since the commit log entry must still be written). I would love to see if that has any impact on benchmarks. 😃