lighthouse: Counterintuitive `warp` error for `Unsupported endpoint version`

Description

When hitting the /eth/v2/validators/blocks/{slot} endpoint with an invalid randao_reveal (and verify_randao=true or empty) Lighthouse will return this error:

$ curl "http://localhost:5052/eth/v2/validator/blocks/1000000"
{"code":400,"message":"BAD_REQUEST: Unsupported endpoint version: v2","stacktraces":[]}

This makes zero sense, as the endpoint is cleared declared supporting v2 here:

https://github.com/sigp/lighthouse/blob/2983235650811437b44199f9c94e517e948a1e9b/beacon_node/http_api/src/lib.rs#L1928-L1935

Adding a panic to unsupported_endpoint_version to get a backtrace shows some funky warp stuff going on:

thread 'tokio-runtime-worker' panicked at 'unsupported_version_rejection: v2', beacon_node/http_api/src/version.rs:60:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/e092d0b6b43f2de967af0887873151bb1c0b18d3/library/core/src/panicking.rs:142:14
   2: http_api::version::unsupported_version_rejection
   3: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
   4: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   5: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   6: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   7: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
   8: <warp::filter::and_then::AndThenFuture<T,F> as core::future::future::Future>::poll
   9: <futures_util::future::try_future::into_future::IntoFuture<Fut> as core::future::future::Future>::poll
  10: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  11: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  12: <F as futures_core::future::TryFuture>::try_poll
  13: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  14: <warp::filter::and::AndFuture<T,U> as core::future::future::Future>::poll
  15: <warp::filter::or::EitherFuture<T,U> as core::future::future::Future>::poll
  16: <warp::filter::recover::RecoverFuture<T,F> as core::future::future::Future>::poll
  17: <warp::filters::log::internal::WithLogFuture<FN,F> as core::future::future::Future>::poll
  18: <warp::filters::cors::internal::WrappedFuture<F> as core::future::future::Future>::poll
  19: scoped_tls::ScopedKey<T>::set
  20: <warp::filter::service::FilteredFuture<F> as core::future::future::Future>::poll
  21: hyper::proto::h1::dispatch::Dispatcher<D,Bs,I,T>::poll_catch
  22: <hyper::server::conn::upgrades::UpgradeableConnection<I,S,E> as core::future::future::Future>::poll
  23: <hyper::server::server::new_svc::NewSvcTask<I,N,S,E,W> as core::future::future::Future>::poll
  24: tokio::runtime::task::core::CoreStage<T>::poll
  25: tokio::runtime::task::harness::Harness<T,S>::poll
  26: std::thread::local::LocalKey<T>::with
  27: tokio::runtime::thread_pool::worker::Context::run_task
  28: tokio::runtime::thread_pool::worker::Context::run
  29: tokio::macros::scoped_tls::ScopedKey<T>::set
  30: tokio::runtime::thread_pool::worker::run
  31: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  32: tokio::runtime::task::harness::Harness<T,S>::poll
  33: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

i.e. it seems that warp is doing some counterintuitive backtracking through a few and_then/and/or until it hits the unsupported_version_endpoint for a different endpoint (one that is v1 only). This makes little sense to me, and would probably require someone to dig pretty deep into warp to understand why this happens. I think this is similar to another open issue (https://github.com/sigp/lighthouse/issues/3112) related to our apparent misuse of warp combinators.

Finally, if verify_randao=false is set or the randao reveal is valid then no error occurs:

$ curl "http://localhost:5052/eth/v2/validator/blocks/3574858?verify_randao=false"
{ .. }

Version

v2.5.0

Related issues

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Update: Adding recover() to each filter didn’t solve the backtracking issue. I’ll try the Mixin approach next.

We could maybe achieve the same thing as then_or_error by sprinkling recovers immediately after every and_then, is that what you had in mind? Wasn’t thinking of this approach in particular, but I think this would be more optimal than writing a custom Mixin.

new compiler releases would subtly change the recursion limit behaviour and cause compilation to fail. We haven't had any compilation issues since adopting the boxing. I’m following you, neat workaround for the compiler issues.

Assuming no concerns, I’ll update an endpoint to use recover() after the and_then; then we can see how it looks/behaves

Hey @muang0, the pattern we found that works is this one:

https://github.com/sigp/lighthouse/blob/dfcb3363c757671eb19d5f8e519b4b94ac74677a/beacon_node/http_api/src/lib.rs#L1248-L1274

i.e. change the and_then to a then so that there’s no possibility of returning a Rejection. This prevents warp from backtracking when a handler errors.

The complication is that, then expects a Response as the return value, not a Result<Response, Rejection> so we need to convert the Rejections that were previously returned into responses, which is what that warp_utils::reject::handle_rejection accomplishes. The abstraction I’m thinking of is something like:

then_or_error(|| {
   // this function `f` returns a `Result<T, Rejection>`, which `then_or_error` will convert to a
   // `Response`. We can hopefully do a find and replace of `and_then` for `then_or_error`,
   // with minimal other changes.
   f()
})

The implementation of then_or_error might look like this:

use warp::{filters::BoxedFilter, Filter, reply::Reply};

// Mixin trait (add methods to all warp types implementing Filter)
pub trait ThenOrError {
   fn then_or_error<G, R>(self, g: G) -> BoxedFilter<(Response,)> 
      where G: Future<Item = Result<R, Rejection>>,
            R: Reply,
   {
       self.then(|| {
           match f.await {
               Ok(result) => result.into_response(),
               Err(e) => match warp_utils::reject::handle_rejection(e).await { 
                 Ok(reply) => reply.into_response(), 
                 Err(_) => warp::reply::with_status( 
                     StatusCode::INTERNAL_SERVER_ERROR, 
                     eth2::StatusCode::INTERNAL_SERVER_ERROR, 
                 )
                 .into_response(), 
             },
           }
       })
       .boxed()
   }
}

impl<F> ThenOrError for F: Filter {}

This might require some tweaks to get working.

We are also working on a big overhaul of the HTTP API here: https://github.com/sigp/lighthouse/pull/4462, so it might make sense for that to stabilise before rolling out this change (or we can raise a PR based on #4462 and merge this afterwards). I’m away for the next two weeks, but if you feel like working on it in the meantime I hope this is enough info to make progress

@michaelsproul I’m very very sorry for the delayed response from my end, I’ve had a rough summer with a couple things that came up unexpectedly that needed my full attention for the past month. I’m just now back in my normal environment and getting my legs underneath me again so to speak. I noticed #4597 has someone working on it already. If there’s anything else lightweight and low-priority that could use help I’m happy to take a look; else I’ll keep an eye on the backlog for a good 1st issue to pop up.

It may be that we want to avoid and_then for our handlers and instead use then.

@jmcph4 and I got this working on https://github.com/sigp/lighthouse/pull/4316. Our approach avoids returning a warp::reject::Reject for errors in the handlers themselves. For now we’re converting the rejects into error responses using handle_rejection, like this:

https://github.com/sigp/lighthouse/blob/73b30a64472455fa555896b70559b5a55c335db3/beacon_node/http_api/src/lib.rs#L1232-L1264

Jack will fix just the new v2/blocks endpoints in that PR, and this issue can remain open until we roll out the fix across the codebase. We might want to write a small abstraction to minimise the amount of code duplication (something like then_or_reject, then_or_error).

I’ll look into this one and follow up, thx