eventing: Broker doesn't retry on "unexpected EOF" reading replies

Describe the bug Having a broker (mtbroker backed by InMemoryChannel, or KafkaChannel) with retries configured

  delivery:
    backoffDelay: PT1S
    backoffPolicy: linear
    retry: 20

and triggers subscribing knative services.

One of the ksvcs, written in golang, returns new events as replies.

The applications sporadically loses events when stream is unexpectedly terminated becase of a golang bug affecting the knative serving service

(see https://github.com/golang/go/issues/40747 , https://github.com/knative/serving/issues/6146 )

with the following error in mt-broker-filter logs:

{"level":"error","ts":"2021-01-28T17:13:15.221Z","logger":"mt_broker_filter","caller":"filter/filter_handler.go:220","msg":"failed to write response","error":"unexpected EOF","stacktrace":"knative.dev/eventing/pkg/mtbroker/filter.(*Handler).send\n\t/home/maschmid/go/src/knative.dev/eventing/pkg/mtbroker/filter/filter_handler.go:220\nknative.dev/eventing/pkg/mtbroker/filter.(*Handler).ServeHTTP\n\t/home/maschmid/go/src/knative.dev/eventing/pkg/mtbroker/filter/filter_handler.go:202\ngo.opencensus.io/plugin/ochttp.(*Handler).ServeHTTP\n\t/home/maschmid/go/src/knative.dev/eventing/vendor/go.opencensus.io/plugin/ochttp/server.go:92\nknative.dev/pkg/network/handlers.(*Drainer).ServeHTTP\n\t/home/maschmid/go/src/knative.dev/eventing/vendor/knative.dev/pkg/network/handlers/drain.go:88\nnet/http.serverHandler.ServeHTTP\n\t/usr/lib/golang/src/net/http/server.go:2843\nnet/http.(*conn).serve\n\t/usr/lib/golang/src/net/http/server.go:1925"}

The broker doesn’t seem to attempt a retry in this case and the event the ksvc tried to reply is effectively lost.

Expected behavior The broker with configured retries should attempt to retry when “unexpected EOF” error happens reading a reply.

To Reproduce Hard to reproduce, the EOF event is very sporadic. In this case larger events (~30kB) are send by multiple producers. I believe this problem only occurs when using a ksvc that responds with events in an HTTP reply.

Knative release version eventing 0.19.2

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (22 by maintainers)

Most upvoted comments

Just making sure I understand what’s happening first 😃

We send an event, function responds but something goes wrong and we get EOF and the event is not retried? My knee jerk reaction is that we should retry that, just like at the protocol level failure we’d retry, why not retry at the network level?