redis-oplog: A fork or redis-oplog for infinite scalability
@evolross @SimonSimCity @maxnowack (pls feel free to tag more people)
- We faced major issues with redis-oplog in production on AWS Elatic-Beakstalk, out-of-memory & disconnects from redis. After some research we found that redis-oplog duplicates data (2x for each observer) and re-duplicates for each observer (even if it’s the same collection and same data)
- Also, DB hits were killing us, each update required multiple hits to update the data (to avoid race conditions). This is also another major negative – not scalable
- Finally, we couldn’t read from MongoDB secondaries (given we read much more often than writes, it would result in much higher scalability)
Also Redis-oplog is slowly going into disinvestment
We create a fork (not public yet, considering my options) which does the following (more technical notes below)
- Uses a single timed cache, which is also the same place you run ‘findOne’ / ‘find’ from one – so full data consistency
- Uses redis to transmit changes to other instance caches – consistency again
- During updates, we mutate the cache and send the changed fields to the DB – instead of the current
find,update, thenfindagain which has 2 more hits than needed - Same for
insert- we build the doc and send it to the othe instances - We use secondary reads in our app – there are potential race conditions in extreme case we are working on using redis as a temp cache of changes
RESULTS:
- We reduced the number of meteor instances by 3x
- Faster updates as less data is sent to redis and much fewer DB hits
- We substantially reduced the load on our DB instances – from 80% to 7% on primary
Here is the technical:
- A single data cache at the collection-level stores the full doc, the multiplexer sends to client data fields based on the projector (i.e.
fields: {}option).collection.findOnefetches from that cache – results in cache hits of 85-98% for our App - That cache is timed, timer resets whenever data is accessed
- Within
Mutatorwe mutate what is in the cache ourselves (if it’s not there, we pull it from DB and mutate) - in other words, we don’t do anupdatefollowed by afindso usually a single db hit (update). We also do a diff to only dispatch fields that have changed. Same thing with insert, we build the doc and dispatch it fully to redis. - We send to redis all the fields that have changed, the redis subscriber uses that data to extend the data that is stored within its cache (or pull from DB then extend data from update event). Inserts are trusted and stored in cache.
- We now use secondary DB reads which results in much higher scalability. This is why we have #3 and #4 above, we trust redis and cache over db reads to avoid race conditions. We do get race conditions every once in a while (e.g. new subs and reads), and we know where they would occur and catch them there. Otherwise, we always trust the cache vs data read from the DB
QUESTION: Is this of interest? Should we have a community version of redis-oplog that we all maintain together?
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 10
- Comments: 32 (19 by maintainers)
New repo is online: https://github.com/ramezrafla/redis-oplog
Please don’t use in production until you are sure it is working for you in a staging environment with multiple servers.
@edemaine
But you nailed it, to avoid race conditions the current redis-oplog is cumbersome and heavy in db hits
@afrokick There are some serious departures from the current approach. Some developers are happy with the way it is and I don’t want to disturb them. Also, I don’t like the swiss-army knife approach. The code was very complex to please a lot of people. There were bugs, old stuff etc.
Jack of all trades master of none 😃
@ramezrafla and everyone. I repeat and stress this out: RedisOplog is not going in disinvestment
We are taking care of big issues and we still want to ensure a good experience. The option you mention can be easily implemented as an iteration over “protection against race conditions” kind of subscription. We need to merge it in. Your solution overrides completely the race-condition protection. Most of my clients even private can handle with the current version of RedisOplog and use of channels subscriptions for over 10k users, CPU is stable and with Race Condition Proofing.