kubeclient: watching stops without any notification

Already saw multiple times that the watcher just stops … without crashing / notifying … Idk how to reproduce that, but it happens regularly … and it does not happen for kube-proxy so either there is a bug in this library or kube-proxy go-lang code has some smart disconnect handling

atm using below and calling .restart every x minutes

class KubernetesWatcher
  def initialize(kuber_client, namespace)
    @kuber_client = kuber_client
    @namespace = namespace
  end

  def watch(&block)
    loop do
      @watcher = @kuber_client.watch_endpoints(namespace: @namespace)
      @watcher.each(&block)
    end
  end

  def restart
    @watcher.finish
  end
end

idk how to fix/debug this further but wanted to raise awareness.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Comments: 22 (7 by maintainers)

Commits related to this issue

Most upvoted comments

confirmed that the block just stops … so having a reconnect by default or optional would be nice … atm I’m just doing loop do ... watch ... end

Hey, So I just wanted to add my latest finding here, following a k8s upgrade.

The logs are below, but long story short, the latest resource is 44924022, however if you use that as your starting point, k8s returns 410 GONE (because these particularly resources haven’t been updated in quite some time).

The only way to get a watcher started then is to use 0, which returns you ALL the objects.

You’ll then need to filter the returned objects to be >= 44924022.

It’s quite shit really, as you’re potentially returning a lot of objects from the k8s api, especially when the connection times out so frequently (seemingly every 90seconds or so for CRDs in particular)

I, [2018-10-16T22:01:26.119618 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 44924022
E, [2018-10-16T22:01:26.882520 #98517] ERROR -- : [App::ApiWatcher#block in watch_from] /apis/atcloud.io/v1/services getting 410 GONE responses for the latest offset 44924022, will restart from 0 which is the next known offset.  Some events may have been missed!
I, [2018-10-16T22:01:26.882659 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 0
I, [2018-10-16T22:01:27.160224 #98517]  INFO -- : [Handlers::Slack#create_apis_atcloud_io_v1_services] shippr-simple
I, [2018-10-16T22:01:27.763382 #98517]  INFO -- : [SlackAPI#parse] 200 OK: ok
I, [2018-10-16T22:01:27.763739 #98517]  INFO -- : [Handlers::Slack#create_apis_atcloud_io_v1_services] platform-testing
I, [2018-10-16T22:01:28.021200 #98517]  INFO -- : [SlackAPI#parse] 200 OK: ok
W, [2018-10-16T22:03:00.473269 #98517]  WARN -- : [App::Runner#block in setup_watch_thread] /apis/atcloud.io/v1/services stopped, will restart from 44891278
I, [2018-10-16T22:03:00.473339 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 44891278
E, [2018-10-16T22:03:00.916934 #98517] ERROR -- : [App::ApiWatcher#block in watch_from] /apis/atcloud.io/v1/services getting 410 GONE responses for the latest offset 44891278, will restart from 44924022 which is the next known offset.  Some events may have been missed!
I, [2018-10-16T22:03:00.917068 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 44924022
E, [2018-10-16T22:03:01.411459 #98517] ERROR -- : [App::ApiWatcher#block in watch_from] /apis/atcloud.io/v1/services getting 410 GONE responses for the latest offset 44924022, will restart from 0 which is the next known offset.  Some events may have been missed!
I, [2018-10-16T22:03:01.411611 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 0
I, [2018-10-16T22:03:01.641530 #98517]  INFO -- : [Handlers::Slack#create_apis_atcloud_io_v1_services] platform-testing
I, [2018-10-16T22:03:01.978240 #98517]  INFO -- : [SlackAPI#parse] 200 OK: ok
I, [2018-10-16T22:03:01.978561 #98517]  INFO -- : [Handlers::Slack#create_apis_atcloud_io_v1_services] shippr-simple
I, [2018-10-16T22:03:02.520127 #98517]  INFO -- : [SlackAPI#parse] 200 OK: ok
W, [2018-10-16T22:04:30.996193 #98517]  WARN -- : [App::Runner#block in setup_watch_thread] /apis/atcloud.io/v1/services stopped, will restart from 44924022
I, [2018-10-16T22:04:30.996243 #98517]  INFO -- : [App::ApiWatcher#watch_from] /apis/atcloud.io/v1/services watch will start from offset: 44924022
E, [2018-10-16T22:04:31.654225 #98517] ERROR -- : [App::ApiWatcher#block in watch_from] /apis/atcloud.io/v1/services getting 410 GONE responses for the latest offset 44924022, will restart from 0 which is the next known offset.  Some events may have been missed!
  • ignore first events after restart … not perfect but maybe better if missing an event is not that critical
  loop do
      ignore_until = Time.now.to_f + 0.5 # re-processing happens in the first 0.4s
      kuber_client.watch_endpoints.each do |e|
        next if Time.now.to_f < ignore_until
        puts e
      end
  end
  • store the timestamp of the last received event and reject all before that on restart