lychee: Lychee don't panic, invalid URL's are fine.
From the pleasant discussion with @mre here https://github.com/lycheeverse/lychee-action/issues/82, I intend to work on trying to parse the input URL before passing it to reqwest in order to prevent lychee from panicking, skip the invalid URL and move on to the next one.
Bonus:
- If we can track these invalidated URL’s and show them to users at the end so that they are able to fix or ignore them for future runs
Looking for feedback and @vipulgupta2048 will work on this soon (The email will remind me to work on this)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 25 (17 by maintainers)
Commits related to this issue
- Gracefully handle invalid URIs With the upgrade to `reqwest` 0.12, we can finally handle a long-standing issue, when Urls could not be parsed to Uris. Previously, we would panic, but we can now handl... — committed to lycheeverse/lychee by mre 2 months ago
- Gracefully handle invalid URIs (#1414) With the upgrade to `reqwest` 0.12, we can finally handle a long-standing issue, when Urls could not be parsed to Uris. Previously, we would panic, but we can... — committed to lycheeverse/lychee by mre 2 months ago
@vipulgupta2048, no worries and thanks for looking into it.
Originally I thought this was possible after the discussion with @lebensterben above:
but I’m getting an error:
So I searched around a bit and this was recommended:
It polls the future in the background. But I’m getting another error:
There is comment in the original thread:
(emphasis mine)
No clue how to resolve this. 😕
@mre Exactly. We should stop ignoring any missing panic docs or error docs in lychee-lib before it gets to 1.0.
Well, the other alternative is to parse the URIs like reqwest would do and return an error before passing it to reqwest. That would be a waste of resources and be useless for most URLs, but at least it wouldn’t panic. That was our plan in the beginning.
At this moment we need to wait for the upstream to merge the fix or use a patched dependency.
I will give it a shot soon, haven’t checked it out yet.
Closing. Feel free to reopen if the problem still exists.
Here’s an example of how this is panicking with an Archive.org URL today:
Yeah they look fine. Maybe the issue is with a different link that was sent before. It panics and then Tokio closes the channel and after that it tries to send the legit links above but the channel is already closed.
ACK - thank you so much, @mre and @lebensterben. There are hundreds of URLs that panic for me (mostly ones from Archive.org that are of the form
https://web.archive.org/some_stub/https://my_archived_url), so I’ll wait for the upstream change to be merged for now. Thank you for all your hard work and help!@dend, the only workaround we have at the moment is to exclude the panicking URLs. 😞