rbspy: bug: Early exits, duration parameter not respected in some cases
In some cases, we see that rbspy exits early, both with and without the --duration parameter.
Without the duration parameter, it can sometimes exit with an exit code of 0. The target process has not exited, and there is no apparent error in the output.
With the duration parameter, we can sometimes see it exit well before the duration has elapsed:
$ ./rbspy-x86_64-unknown-linux-gnu record -p 908389 -d 120
Time since start: 31s. Press Ctrl+C to stop.
Summary of profiling data so far:
% self % total name
81.04 81.14 select [c function] - (unknown)
9.38 9.38 pop [c function] - (unknown)
2.42 2.42 sleep [c function] - (unknown)
0.77 0.77 wait_readable [c function] - (unknown)
0.77 0.77 initialize [c function] - (unknown)
0.58 0.58 _query [c function] - (unknown)
0.45 2.00 new [c function] - (unknown)
0.19 0.52 initialize - /artifacts/bundle/ruby/2.7.0/gems/graphql-1.12.13/lib/grap
0.19 0.19 tcp_listener_stats [c function] - (unknown)
0.13 0.13 split [c function] - (unknown)
0.10 0.10 match? [c function] - (unknown)
0.06 5.58 public_send [c function] - (unknown)
0.06 3.93 block in run - /app/app/models/engine.rb
0.06 1.39 block in coerce_into_values - /artifacts/bundle/ruby/2.7.0/gems/graphql
0.06 0.68 synchronize [c function] - (unknown)
0.06 0.48 block in rules_from_db - /app/app/models/ordered_ruleset.rb
0.06 0.06 legacy_connection_handling - /artifacts/bundle/ruby/2.7.0/gems/activere
0.06 0.06 inheritance_column - /artifacts/bundle/ruby/2.7.0/gems/activerecord-6.1
0.06 0.06 eval [c function] - (unknown)
0.06 0.06 clock_gettime [c function] - (unknown)
1.8% (56/3109) of stack traces were sampled late because we couldn't sample at expected rate, results may be inaccurate. Current rate: 100. Try sampling at a lower rate with `--rate`.
Wrote raw data to /root/.cache/rbspy/records/rbspy-2021-07-22-1cem3j2wTP.raw.gz
Writing formatted output to /root/.cache/rbspy/records/rbspy-2021-07-22-yrFdFN7bCV.flamegraph.svg
In this example the target pid was still alive after rbspy exited.
I would expect that if no duration parameter is given, that rbspy waits until it is interrupted. If a duration parameter is given, it should wait until the duration has elapsed before exiting.
Any ideas @acj ?
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 27 (25 by maintainers)
Commits related to this issue
- Upgrade remoteprocess to 0.4.5 This version includes more tolerant error handling when thread locking fails, which has caused rbspy to exit early when profiling very busy applications (#334). — committed to rbspy/rbspy by acj 3 years ago
- Add ruby test program that produces a lot of thread churn Supports the investigation into #334. — committed to rbspy/rbspy by acj 3 years ago
- Wait for process to exit before collecting results This should fix an order of operations issue that sometimes causes "send on closed channel", zombie child process, and other flaky behavior. (#333, ... — committed to rbspy/rbspy by acj 3 years ago
- Wait for process to exit before collecting results This should fix an order of operations issue that sometimes causes "send on closed channel", zombie child process, and other flaky behavior. (#333, ... — committed to acj/rbspy by acj 3 years ago
The upstream fix landed, and I just released 0.8.1. Please give it a try when you can. If the EPERM issue still appears, I think we can simplify #337 to specifically handle that case.
I updated #337 and included the test program @acj posted above. It seems this also exposed another case of spurious exits, but from
EPERM, which it also appears is being swallowed in the existing subprocess test.I’m having a pretty similar issue, rbspy will detach thinking the process has ended. This is confirmed by setting
RUST_LOG=debug, which gives me the following:I’ll try to add some additional logging to a local build to get a better idea of what’s happening.