pulldown-cmark: The lifetimes on BrokenLinkCallback are wrong

Here is a simple callback which marks every link as working:

fn callback<'a>(link: BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)> {
    Some(("#".into(), link.reference.into()))
}

On its own, it typechecks fine. Unfortunately, this doesn’t work with Parser::with_broken_link_callback:

fn f(txt: &str) {
    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }
}
error: implementation of `FnOnce` is not general enough
   --> src/lib.rs:8:80
    |
8   |       for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    |                                                                                  ^^^^^^^^^^^^^ implementation of `FnOnce` is not general enough
    | 
   ::: /home/joshua/.local/lib/rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:219:1
    |
219 | / pub trait FnOnce<Args> {
220 | |     /// The returned type after the call operator is used.
221 | |     #[lang = "fn_once_output"]
222 | |     #[stable(feature = "fn_once_output", since = "1.12.0")]
...   |
227 | |     extern "rust-call" fn call_once(self, args: Args) -> Self::Output;
228 | | }
    | |_- trait `FnOnce` defined here
    |
    = note: `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}` must implement `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>`
    = note: ...but `FnOnce<(pulldown_cmark::BrokenLink<'_>,)>` is actually implemented for the type `for<'a> fn(pulldown_cmark::BrokenLink<'a>) -> Option<(pulldown_cmark::CowStr<'a>, pulldown_cmark::CowStr<'a>)> {callback}`

error: aborting due to previous error

The issue is that BrokenLinkCallback is typed as having the same lifetime as its outputs: https://github.com/raphlinus/pulldown-cmark/blob/e97974b8d76195c953f0d427e8725ef9ad1a0c17/src/parse.rs#L1270 That means that the callback can’t e.g. be passed to two different parsers, because the first call will fix a set lifetime:

 fn f(txt: &str) {
    let mut callback = |link: BrokenLink<'_>| -> Option<(CowStr<'_>, CowStr<'_>)> {
        Some(("#".into(), link.reference.to_owned().into()))
    };

    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }

    for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
    }
}
error[E0499]: cannot borrow `callback` as mutable more than once at a time
  --> src/lib.rs:11:80
   |
8  |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
   |                                                                                ------------- first mutable borrow occurs here
...
11 |     for _ in Parser::new_with_broken_link_callback(txt, Options::empty(), Some(&mut callback)) {
   |                                                                                ^^^^^^^^^^^^^
   |                                                                                |
   |                                                                                second mutable borrow occurs here
   |                                                                                first borrow later used here

The fix I was thinking of was something like this:

diff --git a/src/parse.rs b/src/parse.rs
index d6388b1..bccd68c 100644
--- a/src/parse.rs
+++ b/src/parse.rs
@@ -129,12 +129,12 @@ pub struct BrokenLink<'a> {
 }
 
 /// Markdown event iterator.
-pub struct Parser<'a> {
-    text: &'a str,
+pub struct Parser<'input, 'callback: 'input> {
+    text: &'input str,
     options: Options,
     tree: Tree<Item>,
-    allocs: Allocations<'a>,
-    broken_link_callback: BrokenLinkCallback<'a>,
+    allocs: Allocations<'input>,
+    broken_link_callback: BrokenLinkCallback<'callback>,
     html_scan_guard: HtmlScanGuard,
 
     // used by inline passes. store them here for reuse
@@ -1266,8 +1267,8 @@ pub(crate) struct HtmlScanGuard {
     pub declaration: usize,
 }
 
-pub type BrokenLinkCallback<'a> =
-    Option<&'a mut dyn FnMut(BrokenLink) -> Option<(CowStr<'a>, CowStr<'a>)>>;
+pub type BrokenLinkCallback<'b> =
+    Option<&'b mut dyn for<'a> FnMut(BrokenLink<'a>) -> Option<(CowStr<'a>, CowStr<'a>)>>;
 
 /// Markdown event and source range iterator.
 ///

This does two things:

  1. Separates the lifetime of the link from the lifetime of the callback (by changing &'a FnMut() -> &'a str to &'b for<'a> FnMut() -> &'a str).
  2. Separates the lifetime of the link from the lifetime of the link (by adding a new 'callback lifetime).

Unfortunately, this uncovers that the change can’t work:

error[E0597]: `link_label` does not live long enough
   --> src/parse.rs:457:64
    |
145 |   impl<'a, 'b> Parser<'a, 'b> {
    |        -- lifetime `'a` defined here
...
450 |                                       .or_else(|| {
    |                                                -- value captured here
...
457 |                                                       reference: link_label.as_ref(),
    |                                                                  ^^^^^^^^^^ borrowed value does not live long enough
...
460 | /                                                 callback(broken_link).map(|(url, title)| {
461 | |                                                     (link_type.to_unknown(), url, title)
462 | |                                                 })
    | |__________________________________________________- returning this value requires that `link_label` is borrowed for `'a`
...
503 |                               }
    |                               - `link_label` dropped here while still borrowed

The issue is that link_label is only alive for the duration for the length of a single loop iteration, not the lifetime of the input. Even though it’s parameterized by a lifetime 'input, it has a Box variant, so if it gets dropped the compiler has to conservatively assume the entire link is invalid.

I don’t have a solution for this, I think it will require a redesign of the parser. But people are running into this in the real world: https://github.com/rust-lang/rust/pull/79781/files#r537769663

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (8 by maintainers)

Most upvoted comments

Hey @Veykril,

Apologies for the delay in getting back to you. The master branch should be ready for use and it could be published as new point release. The reason that hasn’t happened yet is that there was still an option to be explored where the callback function would be a type parameter (especially around code bloat and embedability). I hope to sort that out soon.

Edit: to clarify: we could release now and do another point release later if we chose to parameterize the callback function, but that would require people tracking the latest releases to update their callback code twice. That’s the exact reason we haven’t published yet.

Interesting. That makes for a good argument to use generics. I’ll have to investigate how large the binary size cost is whenever different callbacks are used.

Okay, looking forward to that. I will try to understand more deeply in the meantime.

edit:

It should, but it does not, because BrokenLinkCallback ties the lifetime of the callback to the lifetime of the outputs. Since the outputs are dropped in the loop (for _), the callback can’t be used after the loop ends.

I understand this now. Thank you for this apt and concise explanation.