slate: Unbounded Cache causing some Memory Issues

Currently we memoize calls to select functions, but there is no limit to how much we store. Our caches uses immutable which is a bit of a memory hog. Eventually, the size of the cache can grow to be so large that it causes an out of memory error in V8.

screen shot 2017-11-14 at 10 55 18 am

doing some light memory profiling, the amount of memory that is created by normalizing is pretty heavy. At any rate, if we let the cache grow and grow, eventually it will hit a size that the browser can’t collect and it will just crash.

Couple questions:

Can we be more selective on the functions we memoize?
Can we be more selective of when we memoize those functions? Should normalizing cause a memoize?
Can we prune the cache over time?

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 5
Comments: 21 (14 by maintainers)

Most upvoted comments

Hey folks—I did some more research re: memoiziation effectiveness to look for some quick wins. I instrumented the memoize util to track 1) the number of cache insertions and cache hits for each memoized property in each class and 2) the computational cost of computing the property (by running the underlying function 20x and timing it) and then just used the editor for a while.

(Source: https://github.com/bengotow/slate/blob/memoization-metrics/packages/slate/src/utils/memoize.js#L68)

I think these stats would vary based on the Slate plugins you’re using. The editor I’m testing has a dozen or so instances of slate-auto-replace, uses slate-soft-break and slate-edit-list, and has a custom spellcheck decorator and some other things. Hopefully I’m not doing anything particularly odd and it’s a good representation of the kitchen sink. 😄

Column Meanings:

Cache Hit %: Reads / (Reads + Saves) - overall “usefulness” of this value being cached.
Time to Calculate: The number of milliseconds, on average, it took Slate to run the underlying method.
Time Savings: The number of reads (cache hits) * the calculation time.

Based on this data, I believe the following properties are very easy to calculate and are not worth caching (caching them produces little time savings). These are at the bottom of the graph above which is sorted by Time Savings:

Node.hasVoidParent
text.getMarksAsArray
Node.getLastText
text.validate
Node.getNextText
Node.getDescendantAtPath
Node.getBlocksAtRange
Node.hasNode
Node.getOffset
text.getMarks
Node.getInsertMarksAtRange
schema.getParentRules
Node.isLeafInline
stack.getPluginsWith
Node.getKeys

The following cached properties have a very low hit rate (<20%), meaning Slate is caching them a lot and using the cached value infrequently, so it’s a poor use of space in RAM:

Node.validate
Node.areDescendantsSorted
Node.getBlocksAtRange
Node.getPath
Node.getDescendantAtPath
Node.getInsertMarksAtRange
Node.getNextText
text.getMarks
Node.getKeys
Node.getOffset

(Node.getFragmentAtRange would have been in these sets, but it has a very high computation time so it might be an exception to the rule.)

On the other hand, it’s nice to see things like Node.getParent are producing massive speed improvements by being cached—95% hit rate and 283s of computation time saved. I only typed for ~3min to build these results, sounds like memoizing that almost doubled performance.

bengotow on Jan 31, 2018

@bengotow this is extremely awesome! Thank you for digging into that. These results are super helpful for thinking about this, and should absolutely help us make some quick savings on memoizing lots of these pieces. Before I get into dissecting, a few questions…

Is there any way to also factor into the results tables the overall size in memory that is taken up by the different caches? (Or even not inside the table, but just to see which cached methods are the ones taking up the most size over time.)
Is the “Time to Calculate (ms)” column there the average? (I’m assuming yes, which is great!)
@bengotow does your use case make heavy use of schemas to normalize the document? I know slate-edit-list has schema information, but is there any other schema use in your example?

(Also, I’m going to specifically call out PR’s that I’d accept so far from this, just because I think it might make it more clear, since I’ve got a decent amount of context, and sometimes these kinds of research are hard to know when to trust data vs. familiarity. Hope that makes sense!)

One of the first things I notice, is that we have pair methods where one version returns a mutable value and the other returns the equivalent mutable value. For example getKeysAsArray and getKeys. We created these I believe because it seemed that often working on immutable values was slower, so internally we use the mutable arrays whenever possible.

But, this definitely adds duplication in the memoization. For example, getKeysAsArray is used a lot and has a high cache hit rate. But getKeys isn’t used (almost at all) in core, so we’re memoizing those potentially large sets of keys for no reason.

And even if it is read by external code, converting an array of keys to a set probably isn’t that hard, and it will just be cached at the layer directly below of getKeysAsArray, saving all of the iteration of the tree which is the big cost anyways. So for the first change…

👌 Change 1: I’d accept a PR that removed memoization on node.getKeys, and also changed these two lines in core to use getKeysAsArray instead of getKeys.

Okay, so continuing on that train of thought, here are the comparisons on time savings written up for the “arrays vs. immutables” caching:

node.getKeys
  - array: 98,654
  - immutable: 0

node.getTexts
  - array: 56,901
  - immutable: 8,751

node.getMarksAtRange
  - array: 24,363
  - immutable: 3,306

node.getCharactersAtRange
  - array: 24,255
  - immutable: 1,725

node.getInlinesByType
  - array: 11,399
  - immutable: 134

node.getActiveMarksAtRange
  - array: 3,447
  - immutable: 6,513

node.getTextsAtRange
  - array: 2,436
  - immutable: 2,079

node.getInlinesAtRange
  - array: 2,376
  - immutable: 372

node.getInsertMarksAtRange
  - array: 540
  - immutable: 0

node.getBlocksAtRange
  - array: 45
  - immutable: 1

text.getMarks
  - array: 18
  - immutable: 0

All but one of them has the array variant saving the most time. Again this is probably because most of the core usage of these functions works on the array variant, and not the immutable one for speed reasons.

The only exception is getActiveMarksAtRange, where it seems like the immutable variant is actually saving us more time. And I think this is due to the immutable version being used in the Value model directly, so it’s seeing more opportunity to have cache hits.

What this tells us—which is already known but good to remember—is that this data is impacted by how much the code paths are used. Anything that is used in core for its main calculations is going to see high use, and thus high chance for cache hits to play a factor. But if there are any paths that are highly depended on in user land but not in core, we might not see them show up. (This is tempered by @bengotow’s use case, which is hopefully a kitchen-sink type use case.)

Okay, so with that, I think there are two more changes I’d accept:

👌 Change 2: I’d accept a PR that removed the memoization on the immutable variant of any method that also has an AsArray variant directly backing it. I believe that list would be:

node.getKeys
node.getTexts
node.getMarksAtRange
node.getCharactersAtRange
node.getInlinesByTypeAsArray
node.getActiveMarksAtRange
node.getTextsAtRange
node.getInlinesAtRange
node.getInsertMarksAtRange
node.getBlocksAtRange
text.getMarks

(I’ve included getActiveMarksAtRange in this list, because I think even though we’re seeing some decent cache hits from it, those hits will simply fall to the *AsArray layer below, and we should still maintain most of the gains I think, while reducing some of the memory size.)

👌 Change 3: I’d accept a PR that looked in the Node, Value and Text models for any previous methods that were using the non-AsArray variant that could be changed to use the AsArray variant. This change would ensure that even after we’ve removed the memoization on the immutable variants that we’re still using the most-cached path. For example, this line is a contender because getCharactersAtRange returns a Immutable.List and we’re just iterating over it once, so it could easily be swapped for a regular array. (The only tricky part here is if the immutable value is an Immutable.Set, in which case the regular array won’t be de-duped, so we can’t rely on it.)

So far, after those changes, using the two lists that @bengotow compiled of candidates for removal, we’re left with:

node.hasVoidParent
node.getLastText
text.validate
node.getNextText
node.getDescendantAtPath
node.hasNode
node.getOffset
schema.getParentRules
node.isLeafInline
stack.getPluginsWith

node.validate
node.areDescendantsSorted
node.getPath
node.getDescendantAtPath
node.getNextText
node.getOffset

For node.getLastText, I’m curious about that one. It seems like getFirstText is seeing massive savings, and I wonder if getLastText is not purely because it’s used less, or whether the code paths aren’t hitting it here. I’m reticent to remove the memoization on one but not both, for consistency.

For node.validate and text.validate I’m also curious about these. I’d have assumed that they’d see lots of cache hits if there are recursive schema validations happening. I think we need more information before we can remove these.

Given that, I’ll wait for responses before digging more!

Thank you @bengotow!

ianstormtaylor on Jan 31, 2018

Replied over there about controlling the undo stack size.

@YurkaninRyan with this bug fixed, could you let us know if the unbounded memory issues are fixed? Thanks!

ianstormtaylor on Nov 19, 2017

Off the top of my head, some random ideas for things that could lead to the larger-than-ideal memory usage:

The memoize implementation being incorrect and leaky somehow…
Memoizing too many under-used methods that aren’t actually giving us enough performance benefits to be worth it. (We can absolutely get rid of any memoized methods that don’t actually give us real performance improvements for the 90% use cases, but research needs to show which they are.)
Memoizing methods that often get called with highly-variable arguments, resulting in poor cache hits and lots of cached values building up in memory. (Right now memoized methods are split into argument-less and argument-ed methods, so that would be a place to start looking.)
The History stacks keeping around references to objects that themselves have larger caches, which prevents them from being garbage-collected. (This is probably a compounding factor, and combined with another factor could result in the large multiples being seen.)

That was just a quick list, so there are probably others.

For my own use, right now my order of priority for my focus is functionality > performance > memory, so this isn’t something I’m going to be investigating myself soon. But I am happy to advise and help merge investigations of others.

Would love to hear others’s thoughts!

Also, if anyone has good resources on how to approach memory-leak research that would be awesome. I’m pretty proficient in performance testing, but not on the memory side.

ianstormtaylor on Nov 15, 2017