ingress-nginx: segfault due to opentracing
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/.): no
What keywords did you search in NGINX Ingress controller issues before filing this one? (If you have found any duplicates, you should instead reply there.): nginx, opentracing, zipkin, segfault
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
NGINX Ingress controller version: 0.10.2
Kubernetes version (use kubectl version
):
> k version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.6", GitCommit:"4bc5e7f9a6c25dc4c03d4d656f2cefd21540e28c", GitTreeState:"clean", BuildDate:"2017-09-15T22:14:38Z", GoVersion:"go1.9", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.12", GitCommit:"3bda299a6414b4866f179921610d6738206a18fe", GitTreeState:"clean", BuildDate:"2017-12-29T08:39:49Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: 8 CPU, 32GB RAM, VMWare vRealize
- OS (e.g. from /etc/os-release):
> cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.3 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.3"
- Kernel (e.g.
uname -a
):
> uname -a
Linux ... 3.10.0-693.11.1.el7.x86_64 #1 SMP Fri Oct 27 05:39:05 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
- Install tools: Docker, Kubernetes
- Others: nginx-ingress resource limits
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
nginx-ingress configmap values
enable-opentracing: "true"
zipkin-collector-host: zipkin.tools
zipkin-collector-port: '80'
zipkin-service-name: ingress-controller
What happened: worker process exiting due to segfault
What you expected to happen: no segfaults, no worker restarts
How to reproduce it (as minimally and precisely as possible): Happens almost every 2 seconds
{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11075 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.142872722Z"}
{"log":"2018/03/20 18:34:54 [alert] 23#23: worker process 11109 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:54.517182591Z"}
{"log":"2018/03/20 18:34:57 [alert] 23#23: worker process 11143 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:57.923391335Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11178 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.194986234Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11212 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.557907914Z"}
{"log":"2018/03/20 18:34:58 [alert] 23#23: worker process 11246 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:58.816366263Z"}
{"log":"2018/03/20 18:34:59 [alert] 23#23: worker process 11280 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:34:59.19434845Z"}
{"log":"2018/03/20 18:35:02 [alert] 23#23: worker process 11314 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:02.943432758Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11349 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.206377447Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11383 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.56327926Z"}
{"log":"2018/03/20 18:35:03 [alert] 23#23: worker process 11417 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:03.81832701Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11451 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.92230984Z"}
{"log":"2018/03/20 18:35:08 [alert] 23#23: worker process 11486 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:08.979320448Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11520 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.231301619Z"}
{"log":"2018/03/20 18:35:13 [alert] 23#23: worker process 11555 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:13.791207004Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11589 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.156415018Z"}
{"log":"2018/03/20 18:35:14 [alert] 23#23: worker process 11623 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:14.5254932Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11657 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.272065288Z"}
{"log":"2018/03/20 18:35:18 [alert] 23#23: worker process 11692 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:18.53148589Z"}
{"log":"2018/03/20 18:35:19 [alert] 23#23: worker process 11726 exited on signal 11 (core dumped)\n","stream":"stderr","time":"2018-03-20T18:35:19.517368566Z"}
Anything else we need to know:
> tail -f nginx-ingress-controller-3228907275-p158b_kube-system_nginx-ingress-controller-45af43a95ce0ff5989dfe5783d5457214952edc83dc5de1e6745b436c7c5e23b.log | grep 'opentracing' -C3
2018/03/20 17:51:09 [debug] 23719#23719: *680808 http script set $service_name
2018/03/20 17:51:09 [debug] 23719#23719: *680808 post rewrite phase: 4
2018/03/20 17:51:09 [debug] 23719#23719: *680808 generic phase: 5
2018/03/20 17:51:09 [debug] 23719#23719: *680808 extraced opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing request span for 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 starting opentracing location span for \"/\"(00007FE651FDDD00) in request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 injecting opentracing span context from request 00007FE65CCFD050
2018/03/20 17:51:09 [debug] 23719#23719: *680808 adding opentracing header \"x-b3-traceid:09e429b7d83c3090\" in request 00007FE65CCFD050
2018/03/20 17:51:09 [alert] 23#23: worker process 23719 exited on signal 11 (core dumped)
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:13 op:1 ev:00002001
2018/03/20 17:51:09 [debug] 23753#23753: epoll add event: fd:14 op:1 ev:00002001
After analysis of the coredump it’s visible that the issue in the opentracing insert_header block:
#0 0x0000561139f929ec in ngx_list_push ()
#1 0x00007f7a1adbb6ab in ngx_opentracing::inject_span_context(opentracing::v1::Tracer const&, ngx_http_request_s*, opentracing::v1::SpanContext const&) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#2 0x00007f7a1adbcf81 in ngx_opentracing::OpenTracingRequestInstrumentor::OpenTracingRequestInstrumentor(ngx_http_request_s*, ngx_http_core_loc_conf_s*, ngx_opentracing::opentracing_loc_conf_t*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#3 0x00007f7a1adbdce6 in ngx_opentracing::on_enter_block(ngx_http_request_s*) () from /etc/nginx/modules/ngx_http_opentracing_module.so
#4 0x000056113a097a33 in ngx_http_core_generic_phase ()
#5 0x000056113a093645 in ngx_http_core_run_phases ()
#6 0x000056113a08f03e in ngx_http_process_request ()
#7 0x000056113a09068c in ngx_http_process_request_line ()
#8 0x000056113a0a6be1 in ngx_epoll_process_events.lto_priv ()
#9 0x000056113a0c45df in ngx_process_events_and_timers ()
#10 0x000056113a0a8405 in ngx_worker_process_cycle ()
#11 0x000056113a0be07f in ngx_spawn_process ()
#12 0x000056113a0a9715 in ngx_master_process_cycle ()
#13 0x0000561139f8a654 in main ()
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (15 by maintainers)
Yep. Will take a look.
Hey @aledbf - I have plans to put in a fix based off of this answer I got on the nginx mailing list: http://mailman.nginx.org/pipermail/nginx-devel/2018-March/011008.html
On another cluster, I can’t reproduce it, and the only difference which I noticed is the: