ebisu: Half-life does not meaningfully increase after reps in some conditions
Apologies in advance if this is a non-issue. My hunch is this is a failure on my part to understand the methods or documentation.
I’m coming from Anki and Memrise. After each rep, Anki will predict your “half-life” increases significantly, maybe doubles or more. Memrise seems to follow a similar pattern.
When testing how ebisu might behave in similar conditions, it seems that review spacing increases, but only very gradually.
The following short code assumes I’m reviewing a card every time it hits around 75% success. Suppose I succeed every single time. The code then prints the ratio between the prior review period and the next review period.
import ebisu
model = (3, 3, 1)
m2pd = ebisu.modelToPercentileDecay
ur = ebisu.updateRecall
new_model = model
last_test_time = 0
for i in range(30):
test_time = m2pd(new_model, 0.75)
new_model = ur(new_model, 1, 1, test_time)
if i > 2:
print(round(test_time / last_test_time, 3))
last_test_time = test_time
Based on ebbinghaus’s work and my own performance in Anki, I’d expect those review periods to more than double every time, but I’m not seeing that. The ratios maybe back off by 110%, usually lower.
I take your point from another comment that you don’t like scheduling reviews, that ebisu’s strength is that it frees you from scheduling.
But this seems like it would still be an issue even with unscheduled reviews. It would predict that very strong memories are in the worst decile much more quickly than it should.
I’m probably just missing a core aspect of the algorithm, so sorry for the confusion. Maybe you manually double t
after each review or something, or just use t
as a coefficient to some other backoff function, I’m not sure.
Would appreciate a heads up as to where I went wrong, or let me know this behavior is just expected. Maybe the algorithm is purely backwards looking and doesn’t try to take into account a rep’s ability to strengthen a memory.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 17 (8 by maintainers)
I apologize for being literally this guy:
when it comes to figuring out a solution to this issue (closely related to #43). I experimented with some heavyweight solutions, before stepping back and trying to see the big picture and coming up with https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/demo.py
Before talking about that, here’s a recap of the problem. I’ve given some explanation for how I see the problem that @cyphar saw and raised in #43 there but here’s how I see the fundamental problem.
Suppose we learn a fact at midnight and model our memory with Ebisu model
[a, b, t]
, i.e., recall probabilityt
hours after midnight isBeta(a, b)
. Then one hour later we do a quiz, and callebisu.updateRecall
to get a new[a2, b2, t2]
model. I didn’t realize this all these years until @cyphar patiently broke it down for me on Reddit a few months ago, but the new posterior model still only refers to recallt2
hours after midnight. It doesn’t encode our belief as of now, after an hour has elapsed. Ebisu generates an increasingly accurate estimate of recall after midnight without ever moving to an hour after midnight, a day after, etc., which is why we sawSo we need some way to convert a posterior for quizzes after midnight to a posterior for quizzes after 1 am, and that’s what both @brownbat above and others have asked for.
Forgive me for being so dense to understand this and slow to think of a solution! My incompetence at mathematics is truly gargantuan.
I don’t yet have a great solution. But in https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/demo.py I have a framework to help evaluate possible ways to do this translation from midnight to after midnight of our belief on recall.
I picked a very Anki-like translation: after
ebisu.updateRecall
, just boost the resulting model’s halflife by a fixed factor. The code does things a bit fancier, see these lines that show how,1.4 - (1.4-1)/2
)1.4 + (1.4-1)/2)
)Obviously this can be greatly improved. The goal of Ebisu is to not use magic numbers, to use statistical analysis to estimate these numbers, etc. But https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/demo.py includes a bunch of machinery to evaluate this and other proposed ways to update Ebisu posteriors. If you can think of a better way to boost the models after quizzes, we can test it here.
This is by testing the proposed changes on real data and computing probabilistic likelihoods. In a nutshell, what demo.py does is:
collection.anki2
, a SQLite database),[initialAlphaBeta, initialAlphaBeta, initialHalflife]
) and somebaseBoost
, it sums up the log-probabilities returned byebisu.predictRecall
for each quiz. This is the likelihood of that model (initialAlphaBeta
,initialHalflife
, andbaseBoost
).Then we sweep over different values of these parameters,
initialAlphaBeta
,initialHalflife
, andbaseBoost
, and we can make plots that look like this:This plot shows, for a range of initial halflives (x axis), and a few different boosts (1.0 to 2.0, shown as different colors), the likelihood for a specific card I had with 27 quizzes (23 of them correct). (I fixed
initialAlphaBeta=2
because it doesn’t really matter.) Some notes:https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/demo.py will also generate bigger charts, like this:
If you run it as is, demo.py will look for
collection.anki2
(which is a file inside the APKG files that Anki generates—APKG is just a zip file, so if you unzip it, you’ll get thiscollection.anki2
SQLite database plus your images/sounds), load the reviews that correspond to actual undeleted cards, generate a training vs testing set (important for accurately benchmarking competing algorithms), calculate the likelihoods for a bunch of different halflife-boost combinations, and make a few plots.I’m planning on finding a better method to boost the models after Ebisu’s update, but the way I’ll know that they’re better is that they’ll achieve higher likelihoods on more flashcards than worse methods.
Ideas right now:
baseBoost
applied after all quizzes, the boost needs to be dynamic and time-sensitive. I.e., if you review a mature card five times in five minutes, you shouldn’t be boosting the halflife by1.4**5 = 5.4
.updateRecall
to do a quick coarse local update of the model, and maybe once a day or once a week, you can run arecalibrate
function that takes all quizzes for this flashcard (or all flashcards) and updates these magic numbers by finding which numbers maximize likelihood.I know the script is pretty long, as is this comment, but I wanted to share some detailed thoughts and code about how I’m planning to evaluate proposed algorithms for boosting models after Ebisu’s posterior update, i.e., detailing how to use likelihood to evaluate parameters and algorithms.
I really like how Ebisu right now is an automatic algorithm that just works given a couple of input numbers, and I’d like to find a way to do this boosting that retains the minimal mathematical nature of Ebisu currently, but we shall see!
I put some setup and run instructions at the top of https://github.com/fasiha/ebisu-likelihood-analysis/blob/main/demo.py, if you have time, please check it out!
With the (still in beta) v3 Anki scheduler, you can implement custom scheduling plugins in JavaScript. This would allow you to use custom schedulers even with AnkiDroid and the iOS version of Anki.
I suspect once the work on ebisu is finished, it’d be fairly easy to port the code to the v3 scheduler (and update the existing ebisu add-on). If no-one else is planning to do it, I’d be happy to.
Ah, ok, very helpful!
So the model is:
And, sure, we all know that in reality, the forgetting rate isn’t really constant at first. But ebisu’s predictions strengthen slowly over time, and the forgetting rate may eventually hit some plateau and become constant eventually, so they converge.
The downside: very inaccurate predictions at first, maybe for the majority of the reps in a card’s life. The workaround: just ignore that completely, and focus on the sorting of cards, don’t worry about the prediction numbers too much. Just do the most at risk cards first, and don’t bother looking at the odds.
Is that right?
That seems perfectly reasonable for organizing reviews and workflows, and might be all most folks really need.
But it seems like you could pretty easily could get much more accurate predictions throughout the entire lifecycle of a card, if you wanted that. Shouldn’t change your workflow much, so maybe just cosmetic. But more accurate predictions seem useful for their own sake, if you could get them cheaply.
To do that… Suppose instead of predicting a fixed term for half-life, you instead predict some coefficient that we multiply the half-life by after each review.
So here’s what that model looks like:
In the beginning, half-life predictions would be much more accurate. Memories in the beginning (per Ebbinghaus) tend to exponentially increase in duration. In the long run, memories may revert to a fixed interval, like a year, and the coefficient would need to slowly slide back down to 1 too at that point, giving a fixed interval.
The biggest improvement on your workflow would be that very new cards that you perform very well on will drop in priority more quickly, which should give you more useful work per review.
The biggest risk would be that the coefficient wouldn’t revert to 1 quickly enough as the memory matures, leading to very long intervals with no new data to catch that they’re not getting reviewed enough.
I am NOT/NOT recommending any change to ebisu, which is really well implemented and produces very consistent output for lots of its users now.
But… I might try to implement a wrapper for targeting increasing half-lifes for one of my own projects. Happy to let you know how it goes if it sounds intriguing at all.
* well, ok, at least that the half-life period is a consistent interval, even if decay is a curve. † The first assumption here is also very imperfect! But should be accurate in more situations, I think many more.