kubeflow: ambassadors are crashed and cannot be created
Hi,
I used ansible to setup my kubenetes cluster from binary, and also followed user guide to setup kubeflow. But I found that ambassador cannot be created.
ambassador logs are also attached. Thanks
root@master:~# kubectl get pod -n=kubeflow
NAME READY STATUS RESTARTS AGE
ambassador-7987df44b9-962wh 1/2 CrashLoopBackOff 16 1h
ambassador-7987df44b9-nnf2w 1/2 CrashLoopBackOff 16 1h
ambassador-7987df44b9-p2zp9 1/2 CrashLoopBackOff 16 1h
tf-hub-0 1/1 Running 0 1h
tf-job-operator-78757955b-gkv52 1/1 Running 0 1h
root@master:~# kubectl -n=kubeflow logs ambassador-7987df44b9-962wh ambassador
./entrypoint.sh: set: line 63: can't access tty; job control turned off
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
2018-03-05 18:47:53 kubewatch 0.26.0 INFO: Merging config inputs from /etc/ambassador-config
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: update: including key k8s-dashboard-kubeflow.yaml
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: Scheduling restart
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: Changes detected, regenerating envoy config.
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: Wrote k8s-dashboard-kubeflow.yaml to /etc/ambassador-config-1/k8s-dashboard-kubeflow.yaml
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: generating config with gencount 1
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => k8s-dashboard-kubeflow.yaml
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => service k8s-dashboard, namespace kubeflow
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: new from --internal--
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:47:54 kubewatch 0.26.0 INFO: CLUSTER cluster_kubernetes_dashboard_kube_system_otls: new from k8s-dashboard-kubeflow.yaml.1
2018-03-05 18:47:59 kubewatch 0.26.0 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6135db0898>: Failed to establish a new connection: [Errno -3] Try again',))
2018-03-05 18:47:59 kubewatch 0.26.0 INFO: Scout reports {"latest_version": "0.26.0", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f6135db0898>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1520275674.049535}
[2018-03-05 18:47:59.072][8][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-03-05 18:47:59.072][8][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-03-05 18:47:59.078][8][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-03-05 18:47:59.078][8][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
2018-03-05 18:47:59 kubewatch 0.26.0 INFO: Configuration /etc/ambassador-config-1-envoy.json valid
2018-03-05 18:47:59 kubewatch 0.26.0 INFO: Moved valid configuration /etc/ambassador-config-1-envoy.json to /etc/envoy-1.json
AMBASSADOR: starting diagd
AMBASSADOR: starting Envoy
AMBASSADOR: waiting
PIDS: 9:diagd 10:envoy 11:kubewatch
[2018-03-05 18:47:59.227][12][info][main] source/server/server.cc:184] initializing epoch 0 (hot restart version=9.200.16384.127.options=capacity=16384, num_slots=8209 hash=228984379728933363)
[2018-03-05 18:47:59.508][12][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-03-05 18:47:59.599][12][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-03-05 18:47:59.599][12][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
[2018-03-05 18:47:59.600][12][info][main] source/server/server.cc:359] starting main dispatch loop
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Merging config inputs from /etc/ambassador-config
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Merging config inputs from /etc/ambassador-config-1
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Loaded /etc/ambassador-config-1/k8s-dashboard-kubeflow.yaml
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED default/kubernetes
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED kubeflow/tf-hub-0
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED kubeflow/tf-hub-lb
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED kubeflow/ambassador
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED kubeflow/ambassador-admin
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Event: ADDED kubeflow/k8s-dashboard
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: update: including key k8s-dashboard-kubeflow.yaml
2018-03-05 18:48:00 kubewatch 0.26.0 INFO: Scheduling restart
/usr/lib/python3.6/site-packages/urllib3/connectionpool.py:858: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
2018-03-05 18:48:00 diagd 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => k8s-dashboard-kubeflow.yaml
2018-03-05 18:48:00 diagd 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => service k8s-dashboard, namespace kubeflow
2018-03-05 18:48:00 diagd 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: new from --internal--
2018-03-05 18:48:00 diagd 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:48:00 diagd 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:48:00 diagd 0.26.0 INFO: CLUSTER cluster_kubernetes_dashboard_kube_system_otls: new from k8s-dashboard-kubeflow.yaml.1
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: Processing 1 changes
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: Wrote k8s-dashboard-kubeflow.yaml to /etc/ambassador-config-2/k8s-dashboard-kubeflow.yaml
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: generating config with gencount 2
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => k8s-dashboard-kubeflow.yaml
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: PROCESS: k8s-dashboard-kubeflow.yaml.1 => service k8s-dashboard, namespace kubeflow
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: new from --internal--
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: CLUSTER cluster_127_0_0_1_8877: referenced by --internal--
2018-03-05 18:48:05 kubewatch 0.26.0 INFO: CLUSTER cluster_kubernetes_dashboard_kube_system_otls: new from k8s-dashboard-kubeflow.yaml.1
2018-03-05 18:48:05 diagd 0.26.0 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc173d6be48>: Failed to establish a new connection: [Errno -3] Try again',))
2018-03-05 18:48:05 diagd 0.26.0 INFO: Scout reports {"latest_version": "0.26.0", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc173d6be48>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1520275680.312504}
2018-03-05 18:48:10 kubewatch 0.26.0 WARNING: Scout: could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff1e95810f0>: Failed to establish a new connection: [Errno -3] Try again',))
2018-03-05 18:48:10 kubewatch 0.26.0 INFO: Scout reports {"latest_version": "0.26.0", "exception": "could not post report: HTTPSConnectionPool(host='kubernaut.io', port=443): Max retries exceeded with url: /scout (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff1e95810f0>: Failed to establish a new connection: [Errno -3] Try again',))", "cached": false, "timestamp": 1520275685.161717}
[2018-03-05 18:48:10.183][24][info][upstream] source/common/upstream/cluster_manager_impl.cc:132] cm init: all clusters initialized
[2018-03-05 18:48:10.183][24][info][config] source/server/configuration_impl.cc:55] loading 1 listener(s)
[2018-03-05 18:48:10.189][24][info][config] source/server/configuration_impl.cc:95] loading tracing configuration
[2018-03-05 18:48:10.189][24][info][config] source/server/configuration_impl.cc:122] loading stats sink configuration
2018-03-05 18:48:10 kubewatch 0.26.0 INFO: Configuration /etc/ambassador-config-2-envoy.json valid
2018-03-05 18:48:10 kubewatch 0.26.0 INFO: Moved valid configuration /etc/ambassador-config-2-envoy.json to /etc/envoy-2.json
unable to initialize hot restart: previous envoy process is still initializing
starting hot-restarter with target: /application/start-envoy.sh
forking and execing new child process at epoch 0
forked new child process with PID=12
got SIGHUP
forking and execing new child process at epoch 1
forked new child process with PID=25
got SIGCHLD
PID=25 exited with code=1
Due to abnormal exit, force killing all child processes and exiting
force killing PID=12
exiting due to lack of child processes
AMBASSADOR: envoy exited with status 1
Here's the envoy.json we were trying to run with:
{
"listeners": [
{
"address": "tcp://0.0.0.0:80",
"filters": [
{
"type": "read",
"name": "http_connection_manager",
"config": {
"codec_type": "auto",
"stat_prefix": "ingress_http",
"access_log": [
{
"format": "ACCESS [%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% \"%REQ(X-FORWARDED-FOR)%\" \"%REQ(USER-AGENT)%\" \"%REQ(X-REQUEST-ID)%\" \"%REQ(:AUTHORITY)%\" \"%UPSTREAM_HOST%\"\n",
"path": "/dev/fd/1"
}
],
"route_config": {
"virtual_hosts": [
{
"name": "backend",
"domains": ["*"],"routes": [
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/check_ready","prefix_rewrite": "/ambassador/v0/check_ready",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/check_alive","prefix_rewrite": "/ambassador/v0/check_alive",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/ambassador/v0/","prefix_rewrite": "/ambassador/v0/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_127_0_0_1_8877", "weight": 100.0 }
]
}
}
,
{
"timeout_ms": 3000,"prefix": "/k8s/ui/","prefix_rewrite": "/",
"weighted_clusters": {
"clusters": [
{ "name": "cluster_kubernetes_dashboard_kube_system_otls", "weight": 100.0 }
]
}
}
]
}
]
},
"filters": [
{
"name": "cors",
"config": {}
},{"type": "decoder",
"name": "router",
"config": {}
}
]
}
}
]
}
],
"admin": {
"address": "tcp://127.0.0.1:8001",
"access_log_path": "/tmp/admin_access_log"
},
"cluster_manager": {
"clusters": [
{
"name": "cluster_127_0_0_1_8877",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://127.0.0.1:8877"
}
]},
{
"name": "cluster_kubernetes_dashboard_kube_system_otls",
"connect_timeout_ms": 3000,
"type": "strict_dns",
"lb_type": "round_robin",
"hosts": [
{
"url": "tcp://kubernetes-dashboard.kube-system:443"
}
],
"ssl_context": {
}}
]
},
"statsd_udp_ip_address": "127.0.0.1:8125",
"stats_flush_interval_ms": 1000
}AMBASSADOR: shutting down
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 18 (9 by maintainers)
Commits related to this issue
- Restricting controller access with Gatekeeper (#344) — committed to arrikto/kubeflow by richardsliu 5 years ago
@gxfun @pineking @jiaanguo We met the same issue, you should make sure that your dns is working properly. In my scene, pods can access the internet, but can’t resolve the domain ‘kubernaut.io’. After configuring a upstream nameservers (such as 8.8.8.8) in my kubernetes dns, everything goes fine. We use coreDns take the place of kube-dns, you can find how to config upstream nameservers here: kube-dns: https://kubernetes.io/blog/2017/04/configuring-private-dns-zones-upstream-nameservers-kubernetes/ core-dns: https://coredns.io/plugins/kubernetes/
Thanks @cquptEthan for the solution.
For those who just got started with kubernetes in general. Here’s what you need to do
You’ll need to add
8.8.8.8
at two places,upstream
andproxy
might need to manuall delete the coredns pods to get the changes to be registered