manageiq: DRB error : connection timeout

Hi,

We got an error with DRBd that doesn’t start with the appliance, i got the following logs :

[----] E, [2017-01-12T06:44:15.394235 #32307:73797c] ERROR -- : EMS [] as [AKIAJAK6YKET7IZL6TBA] ID [150807] PID [32307] GUID [6e0e011e-d8bc-11e6-94b7-06dc150d810d] Error heartbeating to MiqServer because DRb::DRbConnError: Connection reset by peer Worker exiting.
[----] I, [2017-01-12T06:44:15.479441 #32244:73797c]  INFO -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::RefreshWorker#log_status) [Refresh Worker for Cloud/Infrastructure Providers: AWS Singapore] Worker ID [150800], PID [32244], GUID [6dd2de40-d8bc-11e6-94b7-06dc150d810d], Last Heartbeat [2017-01-12 11:44:12 UTC], Process Info: Memory Usage [311087104], Memory Size [650801152], Proportional Set Size: [213718000], Memory % [2.03], CPU Time [137.0], CPU % [0.06], Priority [27]
[----] E, [2017-01-12T06:44:15.479840 #32244:73797c] ERROR -- : EMS [] as [AKIAJAK****] ID [150800] PID [32244] GUID [6dd2de40-d8bc-11e6-94b7-06dc150d810d] Error heartbeating to MiqServer because DRb::DRbConnError: Connection reset by peer Worker exiting.
[----] I, [2017-01-12T06:44:15.510840 #32253:73797c]  INFO -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::RefreshWorker#log_status) [Refresh Worker for Cloud/Infrastructure Providers: AWS Sao Paulo] Worker ID [150801], PID [32253], GUID [6dd7f54c-d8bc-11e6-94b7-06dc150d810d], Last Heartbeat [2017-01-12 11:44:12 UTC], Process Info: Memory Usage [311148544], Memory Size [651853824], Proportional Set Size: [213737000], Memory % [2.03], CPU Time [136.0], CPU % [0.06], Priority [27]
[----] E, [2017-01-12T06:44:15.511207 #32253:73797c] ERROR -- : EMS [] as [AKIAJAK****] ID [150801] PID [32253] GUID [6dd7f54c-d8bc-11e6-94b7-06dc150d810d] Error heartbeating to MiqServer because DRb::DRbConnError: Connection reset by peer Worker exiting.

@jrafanie looked it up, and It looks like the server process was failing when trying to sync_workers for one of the worker classes, possibly for the cinder/swift providers. For some reason, calling authentications on the provider are nil instead of being an empty array since it’s Rails relation. It looks like a bug.

/var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:26:in `authentication_userid_passwords': private method `select' called for nil:NilClass (NoMethodError)
	from /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:356:in `available_authentications'
	from /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:189:in `authentication_type'
	from /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:344:in `authentication_best_fit'
	from /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:99:in `authentication_status_ok?'
	from /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:21:in `select'
	from /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:21:in `all_valid_ems_in_zone'
	from /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:26:in `desired_queue_names'
	from /var/www/miq/vmdb/app/models/mixins/per_ems_worker_mixin.rb:32:in `sync_workers'
	from /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:52:in `block in sync_workers'
	from /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `each'
	from /var/www/miq/vmdb/app/models/miq_server/worker_management/monitor.rb:50:in `sync_workers'
	from /var/www/miq/vmdb/app/models/miq_server.rb:158:in `start'
	from /var/www/miq/vmdb/app/models/miq_server.rb:249:in `start'
	from /var/www/miq/vmdb/lib/workers/evm_server.rb:65:in `start'
	from /var/www/miq/vmdb/lib/workers/evm_server.rb:92:in `start'
	from /var/www/miq/vmdb/lib/workers/bin/evm_server.rb:4:in `<main>'

You can see the full discussion / details here : http://talk.manageiq.org/t/drb-error-connection-timeout/2025

Thank you !

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 15 (13 by maintainers)

Commits related to this issue

Most upvoted comments

yes @fvillain, the next tab of euwe will contain this fix.

It was backported to euwe as part of #12878 here