lychee: Lychee don't panic, invalid URL's are fine.

From the pleasant discussion with @mre here https://github.com/lycheeverse/lychee-action/issues/82, I intend to work on trying to parse the input URL before passing it to reqwest in order to prevent lychee from panicking, skip the invalid URL and move on to the next one.

Bonus:

  • If we can track these invalidated URL’s and show them to users at the end so that they are able to fix or ignore them for future runs

Looking for feedback and @vipulgupta2048 will work on this soon (The email will remind me to work on this)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 25 (17 by maintainers)

Commits related to this issue

Most upvoted comments

@vipulgupta2048, no worries and thanks for looking into it.

Originally I thought this was possible after the discussion with @lebensterben above:

 let result = panic::catch_unwind(|| self.reqwest_client.execute(request));

let fut = match result {
    Ok(s) => s,
    Err(_) => return Status::Error(ErrorKind::Panic),
};

match fut.await {
    Ok(ref response) => Status::new(response, self.accepted.clone()),
    Err(e) => e.into(),
}

but I’m getting an error:

   --> lychee-lib/src/client.rs:482:22
    |
482 |         let result = panic::catch_unwind(|| self.reqwest_client.execute(request));
    |                      ^^^^^^^^^^^^^^^^^^^ `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
    |

So I searched around a bit and this was recommended:

match self.reqwest_client.execute(request).catch_unwind().await? {
    Ok(ref response) => Status::new(response, self.accepted.clone()),
    Err(e) => e.into(),
}

It polls the future in the background. But I’m getting another error:

error[E0277]: the type `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary
   --> lychee-lib/src/client.rs:482:22
    |
482 |         let result = panic::catch_unwind(|| self.reqwest_client.execute(request));
    |                      ^^^^^^^^^^^^^^^^^^^ `UnsafeCell<tokio::util::linked_list::Pointers<tokio::time::driver::entry::TimerShared>>` may contain interior mutability and a reference may not be safely transferrable across a catch_unwind boundary

There is comment in the original thread:

This works when may_panic, as shown in the example is a future created in this module. However it does not work if the code in may_panic waits on another future which was returned by another method where futures:FutureExt was not brought into scope. In that case, catch_unwind complains that the future is not transferrable across a unwind boundary because the future returned by the async function itself waits on that other future which is not transferrable. Any ideas what can be done to resolve this?

(emphasis mine)

No clue how to resolve this. 😕

@mre Exactly. We should stop ignoring any missing panic docs or error docs in lychee-lib before it gets to 1.0.

Well, the other alternative is to parse the URIs like reqwest would do and return an error before passing it to reqwest. That would be a waste of resources and be useless for most URLs, but at least it wouldn’t panic. That was our plan in the beginning.

At this moment we need to wait for the upstream to merge the fix or use a patched dependency.

I will give it a shot soon, haven’t checked it out yet.

Closing. Feel free to reopen if the problem still exists.

Here’s an example of how this is panicking with an Archive.org URL today:

thread 'tokio-runtime-worker' panicked at 'a parsed Url should always be a valid Uri: InvalidUri(InvalidUriChar)', /cargo/registry/src/github.com-1ecc6299db9ec823/reqwest-0.11.8/src/into_url.rs:70:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' panicked at 'Cannot send request: SendError(Ok(Request { uri: Uri { url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("web.archive.org")), port: None, path: "/web/20070312014327/https://developer.yahoo.com/weather/", query: None, fragment: None } }, source: FsPath("./dennisdel/content/blog/using-the-google-weather-api-pros-and-cons-so-far.md"), element: Some("a"), attribute: Some("href") }))', lychee-bin/src/commands/check.rs:85:14
+ exit_code=101

Yeah they look fine. Maybe the issue is with a different link that was sent before. It panics and then Tokio closes the channel and after that it tries to send the legit links above but the channel is already closed.

ACK - thank you so much, @mre and @lebensterben. There are hundreds of URLs that panic for me (mostly ones from Archive.org that are of the form https://web.archive.org/some_stub/https://my_archived_url), so I’ll wait for the upstream change to be merged for now. Thank you for all your hard work and help!

Is there a temporary workaround one can apply?

@dend, the only workaround we have at the moment is to exclude the panicking URLs. 😞