apm-agent-ruby: Couldn't establish connection to APM Server: "#"
Describe the bug
The apm agent log says:
Couldn't establish connection to APM Server:
with the following two messages:
<NoMethodError: undefined method 'flush' for nil:NilClass>
and sometimes I see
#<Errno::ESPIPE: Invalid seek>
With loglevel 0 I see
Closing request with reason timeout
and
Closing writer with reason timeout
I have had this problem on and off for some time. It works on our staging environment. It used to work in production as well but stopped a few days ago, just like that. In Kibana I can see the graph drop to zero and then silence.
Edit: I just noticed our staging machine’s apm data also took a nosedive yesterday. I’ll spare you the screen shots. Unless you want them.
Environment
- OS: Alpine on Docker on Debian 9
- Ruby version: 2.5.8
- Rails: 4.2.11
- APM Server version: 7.6.1
- Agent version: 3.6.0 (tried with several others too)
- Http Gem version: 4.4.1
Additional context
-
The resources on the server look just fine.
-
I am able to reach the apm_server by curl from anywhere I try. I even tried using sending a GET with HTTParty from a controller to apm_server.
-
A strange thing is that if I change the service_name in elastic_apm.yml and start the app (in docker), the new servicename shows up in kibana. So something comes through.
-
I see three spikes in yesterdays Requests per minute graph where a single transaction seems to have made it through. This could have happened after app restarts.
-
Currently a few other rails (5) apps and a sinatra app reports to the apm_server with no issues.
-
I got the full queue message and remedied that by increasing the pool size. This did not fix the failing connection issue, however. Now I’m seeing the
Queue is full (256 items)
message again though. Maybe that’s because it doesn’t connect to the server and empty the queue. I don’t know.
I think it’s the flush call in lib/elastic_apm/transport/connection/http.rb in the method request(method, url, body: nil, headers: nil)
that throws the NoMethodError. On my local machine I added a try
for the flush call. That stopped the error, but of course didn’t fix the issue.
I don’t know if this helps, but I wanted to be thorough:
39: def request(method, url, body: nil, headers: nil)
40: byebug
=> 41: @client.send(
42: method,
43: url,
44: body: body,
45: headers: (headers ? @headers.merge(headers) : @headers).to_h,
(byebug) method
:post
(byebug) url
"http://myserver:8200/intake/v2/events"
(byebug) body
#<IO:fd 10>
(byebug) headers
{:"User-Agent"=>"elastic-apm-ruby/3.6.0 http.rb/4.4.1 ruby/2.5.1", "Content-Type"=>"application/x-ndjson", "Transfer-Encoding"=>"chunked", "Content-Encoding"=>"gzip"}
-
Agent config options
Click to expand
service_name: 'MyApp' server_url: "http://myserver:8200" breakdown_metrics: true capture_body: 'all' instrumented_rake_tasks: ['mytask:task1'] log_level: 1 log_path: 'log/elastic_apm.log' logger: <%= Logger.new('log/elastic_apm.log') %> pool_size: 4
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 23 (14 by maintainers)
@gowthamgts That’s not an error. That’s the expected behaviour. However, you are not the first person to read it as an error, so we’ll change the wording of the message in a future version. Sorry for the confusion.
Hi @Ingstrup! Sounds to me like it could be an issue with Rails 4.2? I’ll investigate a bit and see if I can dig something up.
What server lib are you using? Puma? Passenger?