grammY: grammY conversations (Why Scenes Are Bad)

Admittedly, that title was clickbait. It seems like you are interested in scenes or wizards, and other programming patters that allow you to define conversational interfaces like they were a finite-state machine (FSM, wiki).

THIS IS A A LOT OF TEXT. Please still read everything before you comment. Let’s try to keep the signal-to-noise ratio up high on this one 😃

What Is This Issue

One of the features that get requested most often are scenes. This issue shall:

  1. Bring everyone on the same page what people mean when they say scenes.
  2. Why there is not going to be a traditional implementation of scenes for grammY, and how we’re trying to do better.
  3. Update you about the current progress.
  4. Introduce two novel competing concepts that could both turn out as better than scenes.
  5. Serve as a forum to discuss where we want to take this library regarding grammY conversations/scenes.

1. What are scenes?

A chat is a conversational interface. This means that the chat between the user and the bot evolves over time. Old messages stay relevant when processing current ones, as they provide the context of the conversation that determines how to interpret messages.

< /start
>>> How old are you?
< 42
>>> Cool, how old is your mother?
< 70
>>> Alright, she was 28 when you were born!

Note how the user sends two messages, and both are numbers. We only know that those two numbers mean two different things because we can follow the flow of the conversation. The two age numbers are following up two different questions. Hence, in order to provide a natural conversational flow, we must store the history of the chat, and take it into account when interpreting messages.

Note that Telegram does not store the chat history for bots, so you should store them yourself. This is often done via sessions, but you can also use your own database.

In fact, we often don’t need to know the entire chat history. The few most recent messages are often enough to remember, as we likely don’t have to care about what the user sent back in 2018. It is therefore common to construct state, i.e. a small bit of data that stores where in the conversation where are. In our example, we would only need to store if the last question was about the age of the user, or about the age of their mother.

Scenes are a way to express this conversational style by allowing you to define a finite-state machine. Please google what this is, it is essential for the following discussion. The state is usually stored in the session data. They achieve this by isolating a part of the middleware into a block that can be entered and left.

Different bot frameworks have different syntax for this, but it typically works roughly like this (explanatory code, do not try to run):

// Define a separate part of the middleware handling.
const scene = new Scene('my-scene')
scene.command('start', ctx => ctx.reply('/start command from inside scene'))
scene.command('leave', ctx => ctx.scene.leave()) // leave scene

// Define regular bot.
const bot = new Bot('secret-token')
bot.use(session())
bot.use(scene)
bot.command('start', ctx => ctx.reply('/start command outside of scene'))
bot.command('enter', ctx => ctx.scene.enter('my-scene')) // enter scene

bot.start()

This could result in the following conversation.

< /start
>>> /start command outside of scene
< /enter
< /start
>>> /start command from inside scene
< /leave
< /start
>>> /start command outside of scene

In a way, every scene defines one step of the conversation. As you can define arbitrarily many of these scenes, you can define a conversational interface by creating a new instance of Scene for every step, and hence define the message handling for it.

Scenes are a good idea. The are a huge step forward from only defining dozens of handlers on the same middleware tree. Bots that do not use scenes (or a similar form of state management) are effectively forgetting everything that happened in the chat immediately after they’re done handling a message. (If they seem like they remember their context, then this is more or less a workaround which relies on a message that you reply to, inline menus, or other information in order to avoid state management.)

2. Cool! So what is the problem?

Scenes effectively reduce the flow of a conversation to being in a state, and then transitioning into another state (ctx.scene.enter('goto')). This can be illustrated by translating scenes into routers:

const scene = new Router(ctx => ctx.session.scene)

// Define a separate part of the middleware handling.
const handler = new Composer()
scene.route('my-scene', handler)
handler.lazy(ctx => {
  const c = new Composer()
  c.command('start', ctx => ctx.reply('/start command from inside scene'))
  c.command('leave', ctx => ctx.session.scene = undefined) // leave scene
  return c
})

// Define regular bot.
const bot = new Bot('secret-token')
bot.use(session())
bot.use(scene)
bot.command('start', ctx => ctx.reply('/start command outside of scene'))
bot.command('enter', ctx => ctx.session.scene = 'my-scene') // enter scene

bot.start()

Instead of creating new Scene objects, we simply create new routes, and obtain the same behaviour with minimally more code.

This may work if you have two states. It may also work for three. However, the more often you instantiate Scene, the more states you add to your global pool of states, between which you’re jumping around arbitrarily. This quickly becomes messy. It takes you back to the old days of defining a huge file of code without indentation, and then using GOTO to move around. This, too, works at a small scale, but considering GOTO harmful led to a paradigm shift that substantially advanced programming as a discipline.

In Telegraf, there are some ways to mitigate the problem. For example, once could add a way to group some scenes together into a namespace. As an example, Telegraf calls the Scene from above a Stage, and uses the word scene to group together several stages. It also allows you to force certain stages into a linear history, and calls this a wizard, in analogy to the multi-step UI forms.

With grammY, we try to rethink the state of the art, and to come up with original solutions to long standing problems. Admitting that Update objects are actually pretty complex objects led us to giving powerful tools to bot developers: filter queries and the middleware tree were born, and they are widely used in almost all bots. Admitting that sending requests is more than just a plain HTTP call (at least when you’re working with Telegram) led us to developing API transformer functions: a core primitive that drastically changes how we think about plugins and what they can do. Admitting that long polling at scale is quite hard led us to grammY runner: the fastest long polling implementation that exists, outperforming all other JS frameworks by far.

Regarding conversational interfaces, the best we could come up with so far is GOTO. That was an okay first step a few years ago. Now, it is time to admit that this is harmful, and that we can do better.

3. So what have we done about this so far?

Not too much. Which is why this issue exists. So far, we’ve been recommending people to combine routers and sessions, rather than using scenes, as it does not use much more code, and providing the same plain old scenes for grammY is not ambitious enough.

There is a branch in this repository that contains some experiments with the future syntax that could be used, however, the feedback for it was mixed. It does bring some improvements to the situation as it provides a structure between the different steps in the conversation. Unfortunately, the resulting code is not too readable, and it makes things that belong together end up in different places of the code. It is always cool if the things that are semantically linked can be written close to each other.

As a consequence of this lack of progress, we need to have a proper discussion with everyone in the community in order to develop a more mature approach. The next section will suggest two ideas, one of them is the aforementioned one. Your feedback and ideas will impact the next step in developing conversational interfaces. Please speak up.

4. Some suggestions

Approach A: “Conversation Nodes”

This suggestion is the one the we’ve mentioned above. Its main contribution is to introduce a more implicit way of defining scenes. Instead of creating a new instance of a class for every step, you can just call conversation.wait(). This will internally create the class for you. As a result, you can have a more natural way of expressing the conversation. The wait calls make it clear where a message from the user is expected.

Here is the example from the top again. Handling invalid input is omitted intentionally for brevity.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

This provides a simple linear flow that could be illustrated by

O
|
O
|
O

We can jump back and forth using ctx.conversation.forward(3) or ctx.conversation.backward(5). The wait calls optionally take string identifiers if you want to jump to a specific point, rather than giving a relative number of steps.

Next, let us see how we can branch out, and have an alternative way of continuing the conversation.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.forward()
})

conversation.wait()

// start a new sub-conversation
const invalidConversation = conversation.filter(ctx => isNaN(parseInt(ctx.msg.text))).diveIn()
invalidConversation.on('message', ctx => ctx.reply('That is not a number, so I will assume you sent me the name of your pet'))
invalidConversation.wait()
// TODO: continue conversation about pets here

// Go on with regular conversation about age:
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.forward()
})

conversation.wait()
conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

We have now defined a conversation that goes like this:

O
|
O
| \
O O

That way, we can define conversation flows.

There are a number of improvements that could be done to this. If you have any concrete suggestions, please leave them below.

Approach B: “Nested Handlers”

Newcomers commonly try out something like this.

bot.on('start', async ctx => {
  await ctx.reply('How old are you?')
  bot.on('message', ctx => { /* ... */ })
})

grammY has a protection against this because it would lead to a memory leak, and eventually OOM the server. Every received /start command would add a handler that is installed globally and persistently. All but the first are unreachable code, given that next isn’t called inside the nested handler.

It would be worth investigating if we can write a different middleware system that allows this.

const conversation = new Conversation()
conversation.on('start', async ctx => {
  await ctx.reply('How old are you?')
  conversation.on('message', ctx => { /* ... */ })
})

This would probably lead to deeply nested callback functions, i.e. bring us back to callback hell, something that could be called the GOTO statement of asynchronous programming.

What could we do to mitigate this?

Either way, this concept is still tempting. It is very intuitive to use. It obviously cannot be implemented with exactly the above syntax (because we are unable to reconstruct the current listeners on the next update, and we obviously cannot store the listeners in a database), but could try to figure out if small adjustments could make this possible. Internally, we would still have to convert this into something like an FSM, but maybe one that is generated on the fly. The dynamic ranges of the menu plugin could be used as inspiration here.

5. We need your feedback

Do you have a third idea? Can we combine the approaches A and B? How would you change them? Do you think the examples are completely missing the point? Any constructive feedback is welcome, and so are questions and concerns.

It would be amazing if we could find the right abstraction for this. It exists somewhere out there, we just have to find it.

Thank you!

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 12
  • Comments: 38 (38 by maintainers)

Most upvoted comments

@IlyaSemenov sorry for the long delay. I did not have much time for this in past few months, and also I started having more and more doubts about this. I have concluded that my idea above is bad. You are right. We have already talked about some of the disadvantages here, and I thought of several more, which I will skip now.

Thanks for your work on scenes, I appreciate it! This is great, it is probably the best implementation of scenes that I’ve seen so far. I found your observations especially valuable. It’s cool to see that composition and nesting based around a concept of waiting and resuming is well-suited. However, I did not change my mind about scenes in general. I still can’t read the code in a linear fashion (because every step has a different callback), and I have to know a lot about the plugin to make sense of what all the GOTO jumping does. This still interrupts the reading flow all the time.

This is why I created a new package with a completely different strategy than discussed at any point before. It is very much not usable yet, but there is an example bot in the README which outlines the concept. It can do the wait/resume thing, supports nesting, loops, functions, branching, recursion, error-handling with try-catch-finally, composition across modules, and so on. (We basically get all of this for free, it does not need to be supported explicitly.) It is trivial to implement the system you talked about which lets you resume based on external events (but I wanted to publish before adding that, so it’s not supported yet). It has type safe local sessions, even across functions, and the code is more concise than any of the ideas discussed above. Also, it is very easy to understand the code even if you have never seen the package before.

Naturally, there are a few pitfalls. The main thing is that side-effects outside of grammY must be wrapped in function calls. Basically, you can communicate with the user and do everything with the Bot API, but external things which perform side-effects or are non-deterministic must be wrapped in another function call. Some people may know this pattern from React, there is is called useEffect. It is not pretty, so it must be documented well, but it is a known thing and I think people will cope with it.

I published the (unfinished) code at https://github.com/grammyjs/conversations.

Thank you for brainstorming here! I will close this issue now. There will be more things to talk about, but they are going to be specific to conversations, so they can be discussed in new issues in the new repository.

I hope you give this thing a try, I’d be looking forward to your feedback and criticism!

Approach C: “Wait and Resume”

On first sight, this suggestion may look similar to Approach A. However, it actually behaves very differently and it makes more sense to start explaining it from scratch.

Remember from https://grammy.dev/advanced/middleware that middleware is a tree in grammY. This tree will be traversed in depth-first order.

We regard a conversation also as a hierarchy of different flows. Always start with question Q0, if the user says YES, continue with Q1, else continue with Q2.

We suggest to use the former hierarchy to express the latter.

Here is the example from the top again. Handling invalid input is omitted intentionally for brevity.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.resume()
})

conversation.wait()

conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.resume()
})

conversation.wait()

conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

This provides a simple linear flow that could be illustrated by

O
|
O
|
O

We can interrupt the middleware execution by inserting a wait call. We can call resume to continue traversing the middleware tree. In a way, calling ctx.conversation.resume is like calling next in the traditional middleware tree. (Please leave a comment if you think the method should be renamed to ctx.conversation.next in order to display this similarity.)

The wait calls optionally take string identifiers if you want to jump to a specific point, rather than only resuming after the next wait call.

Next, let us see how we can branch out, and have an alternative way of continuing the conversation.

const conversation = new Conversation('age-at-birth')

conversation.command('start', async ctx => {
  await ctx.reply('How old are you'))
  ctx.conversation.resume()
})

conversation.wait()

// start a new sub-conversation
const invalidConversation = conversation.filter(ctx => isNaN(parseInt(ctx.msg?.text ?? '')))
invalidConversation.on('message', ctx => ctx.reply('That is not a number, so I will assume you sent me the name of your pet. What animal is it?'))
invalidConversation.wait()
// TODO: continue conversation about pets here

// Go on with regular conversation about age:
conversation.on('message:text', async ctx => {
  ctx.session.age = parseInt(ctx.msg.text, 10)
  await ctx.reply('Cool, how old is your mother?')
  ctx.conversation.resume()
})
conversation.wait()

conversation.on('message:text', async ctx => {
  const age = parseInt(ctx.msg.text, 10)
  await ctx.reply(`Alright, she was ${age - ctx.session.age} when you were born!`)
  ctx.conversation.leave()
})

We have now defined a conversation that goes like this:

O
| \
O  O
|
O

That way, we can define conversation flows.

Note that the conversation will remain between two wait calls until you explicitly call ctx.conversation.resume. It may be desirable to let the conversation resume automatically, and instead let people remain explicitly. Here is how that could work:

const conversation = new Conversation('word-of-the-year', { autoResume: true })

conversation.on('message', async ctx => {
  await ctx.reply('What is your favourite English word?))
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.word = ctx.msg.text
  await ctx.reply('Why do you want this word to become the word of the year?'))
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.reason = ctx.msg.text
  await ctx.reply('Is there anything else you would like to say?'), {
    reply_markup: new InlineKeyboard().text('Nope', 'no')
  })
})
conversation.wait()
conversation.on('message', async ctx => {
  ctx.session.comment = ctx.msg.text
  await ctx.reply('Thank you for you submission'))
})
conversation.callbackQuery('no', ctx => ctx.reply('Skipped. Thanks for the submission!'))

Personally, I think this is so far the best option we have. I prefer it over the other two approaches. What do you think?

I kind of implemented “nested handlers” approach in https://github.com/IlyaSemenov/grammy-scenes

I’ve extracted it to a npm package from one of my actual bot projects.

I am not 100% happy with the API and architecture I ended up with, but it allows to write concise code, while still being quite flexible (it extends the built-in Composer class, thus allowing to fallback to generic scene.use(...) for advanced middleware use cases).

One of the things I’m not sure in is handling the scenes context. I’m not convinced whether scenes should have separate first-class contexts, or if we should simply direct user to use “normal” sessions for their context data. In my initial prototypes I went with the former, which (back then) resulted in very complex Typescript generics, and it also wasn’t straightforward when to clear the context and when to keep it. So I ditched all that in favour of “normal” sessions with some simple typings (see example in README) — but I’m not exactly happy about that. First class scene contexts still seem to be a better fit to me, they just need to be carefully implemented.

How to conditionally skip a step?

Using the normal if statement:

main_scene.scene("step1", (scene) => {
	scene.on("message:text", async (ctx) => {
		if (ctx.message.text === "secret") {
			await ctx.reply(`Greetings, mylord`)
			ctx.scenes.move("step3")
		} else {
			await ctx.reply(`Hello`)
			ctx.scenes.move("step2")
		}
	})
})

How to repeat several steps?

You just don’t move off the current step:

const main_scene = new Scene()

main_scene.scene<
	MainBotContext &
		SessionFlavor<{
			main_step1?: {
				names: string[]
			}
		}>
>("step1", (scene) => {
	scene.enter(async (ctx) => {
		await ctx.reply(`Send me 3 names.`)
	})
	scene.on("message:text", async (ctx) => {
		if (!ctx.session.main_step1) {
			ctx.session.main_step1 = { names: [] }
		}
		ctx.session.main_step1.names.push(ctx.message.text)
		if (ctx.session.main_step1.names.length >= 3) {
			await ctx.scenes.move("step2")
		}
	})
})

How would something like using the captcha look?

const captcha_scene = new Scene<
	Context &
		SessionFlavor<{
			captcha?: {
				answer: string
				next_scene: string
			}
		}>
>()
captcha_scene.enter(async (ctx, next_scene) => {
	const { answer, image } = await generateCaptcha()
	ctx.session.captcha = { answer, next_scene }
	await ctx.replyWithPhoto(image)
	await ctx.reply(`Enter letter you see on image above`)
})
captcha_scene.on("message:text", async (ctx) => {
	if (ctx.message.text === ctx.session.captcha?.answer) {
		await ctx.scenes.move(ctx.session.captcha.next_scene)
	} else {
		await ctx.reply(`Please try again.`)
	}
})

const rocket_scene = new Scene()
rocket_scene.enter((ctx) => ctx.scenes.inner("captcha", "launch"))
rocket_scene.scene("captcha", captcha_scene)
rocket_scene.scene("launch", (scene) => {
	scene.enter(async (ctx) => {
		await ctx.reply(`Launching rocket!`)
	})
})

This indeed requires passing the ID of the next (sub)step, but is that really a big deal? The code is still concise and reusable.


If I read move('foo'), I have no idea where to keep on reading.

Most of the time you read the next (sub)scene block. Since that is a 95% use case, this could indeed be improved, with ctx.scene.next() or something along the lines. Respectfully, I agree that naming a scene (a conversation step) should not be necessary. Names should be optional for jumps and such.

  1. How does one actually enter (start) a conversation? Please come up with sample top-level bot instance integration.

The only way I see is that one always uses a single top-level conversation with bot.use(topLevelConversation) which in turns handles /commands. Is that what you have in mind?

  1. How do you see this:
mainConversation.command('launch_rocket', captcha, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})

surviving server restart? What if a user takes a 10 minutes break before selecting a month in the captcha sub-conversation, and CI/CD restarts the server during that time?

The command handler is by its nature momentary, it either completes or fails… unless I’m missing something.

  1. This proposed API is not exactly obvious:
// First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait() // <--- what do we "wait" for here? a message (above), or text/photo (below)?
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ })
// <--- why don't we have wait here?

I’d rather suggest something like:

conversation.step(step => {
  step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
  step.on('message:text', ctx => { /* ... */ })
  step.on('message:photo', ctx => { /* ... */ })
  step.on('message', ctx => { /* ... */ })
})
conversation.step(step => {
  step.on('message', ctx => { /* ... */ })
})

This is immediately obvious what are the conversation steps and how the handlers are grouped. There is also no discrepancy in the number of wait calls. (Essentially, that is what I did in grammy-scenes).

How Approach C Resembles Imperative Code

It struck me that we are getting full flexibility with statements, branching, loops, functions, and recursion using Approach C. In other words, we have the flexibility of code when designing how conversations work. This is a little bit revolutionary and maybe not too obvious on first sight. I would like to illustrate how it will work.

Statements

A statement in conversations is simply all middleware between two wait calls. Example:

// First statement
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Second statement
conversation.on('message:text', ctx => { /* ... */ })
conversation.on('message:photo', ctx => { /* ... */ })
conversation.on('message', ctx => { /* ... */ })
conversation.wait()
// Third statement
conversation.on('message', ctx => { /* ... */ })

These individual blocks of handlers act like the individual instructions of a conversation.

Branching

This has already been described in the post above.

In short, you can use filter calls in the middleware (or branch or on or custom middleware or whatever) to perform branching the middleware tree. The branches of the middleware tree translate to the branches of the conversation hierarchy.

Loops

Iteration is possible by recursion (preferred, see below), or by jumping back to a fixed step. Admittedly, this is rather a jump statement than an actual loop. (If you have a use case for for/while/do-while loop and a syntax suggestion, we may add syntactic sugar abstracts away from the jumps.)

Example of jumping back:

conversation.on('message', ctx => { /* ... */ })
conversation.wait('begin-loop') // loop starts here
conversation.on('message', ctx => { /* ... */ }) // first loop statement
conversation.wait()
conversation.on('message', ctx => { /* ... */ }) // second loop statement
conversation.wait()
conversation.filter(condition, ctx => ctx.conversation.resume('begin-loop')) // loop condition
conversation.on('message', ctx => { /* ... */ }) // broke out of loop

Functions

You can define reusable parts of conversations. This lets you define sub-conversations. They can be included in different places of the main conversation, and nested as deep as you like. This is the same concept as functions in code: reusable named pieces of control flow.

In order to illustrate this, lets define a small part of the conversation that verifies the identity of the user by asking for their birthday.

const captcha = new Conversation('captcha', { autoResume: true })
captcha.use(async ctx => {
  await ctx.reply('Please enter your birthday to proceed. Day of month?', {
    reply_markup: numberKeyboard(1, 31) // builds a keyboard 31 buttons with numbers, omitted here
  })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.day = ctx.callbackQuery.data
  await ctx.reply('Month?', { reply_markup: numberKeyboard(1, 12) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.month = ctx.callbackQuery.data
  await ctx.reply('Year?', { reply_markup: numberKeyboard(1900, new Date().getFullYear()) })
})
captcha.wait()
captcha.on('callback_query:data', async ctx => {
  ctx.session.captcha.year = ctx.callbackQuery.data
  await verifyBirthday(ctx, ctx.captcha) // checks birthday against a database, calls `ctx.conversation.leave` if incorrect
})

We now have defined a “function” for checking the birthday. We can “call” the function by passing it as conversational middleware to another conversation.

const mainConversation = new Conversation('rocket-launcher')

mainConversation.command('launch_rocket', captcha, async ctx => {
  await ctx.reply('You are verified, launching rocket 🚀')
})
mainConversation.command('purge', captcha, async ctx => {
  await ctx.reply('You are verified, purging all data')
})

Note how we can use the same captcha in different places. It behaves just like a middleware tree does, and it is closely integrated with grammY’s middleware tree.

Recursion

It should be possible to use the middleware recursively. This implies some sort of call stack, but it isn’t quite clear yet if we actually need to store it or not.

In order to use recursion, we could for example restart the captcha with another attempt if the verification fails. In the last step, we could do:

captcha.on('callback_query:data', async (ctx, next) => {
  ctx.session.captcha.year = ctx.callbackQuery.data
  await next()
})
// `verify` checks the birthday in `ctx.session.captcha` against a database, and returns `true` if it is correct
captcha.filter(verifyBirthday, ctx => ctx.conversation.resume()
// else: restart captcha
captcha.use(async (ctx, next) => {
  await ctx.reply('Wrong birthday, try again!')
  await next()
})
captcha.use(captcha)

This will re-enter the captcha until it is correct.

In reality, you probably also want to allow the user to exit the captcha manually, this was omitted here for brevity.