alluxio: When installing alluxio through helm, UnknownHostException appeared
Alluxio Version: alluxio:2.3.0
Describe the bug Following the guide https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-Kubernetes.html to install alluxio.Error s occurd. I had set multi master3. Pod status is :
[root@k8s-master01 ~]# kubectl -n spark get pods
NAME READY STATUS RESTARTS AGE
alluxio-fuse-j8rrm 1/1 Running 0 10s
alluxio-fuse-rrcrn 1/1 Running 0 10s
alluxio-fuse-wfrkr 1/1 Running 0 10s
alluxio-master-0 0/2 Running 0 10s
alluxio-worker-4kmbq 0/2 CrashLoopBackOff 1 10s
alluxio-worker-5ng5f 0/2 CrashLoopBackOff 1 10s
alluxio-worker-8rvbz 0/2 CrashLoopBackOff 1 10s
Then i got endpoints:
[root@k8s-master01 ~]# kubectl -n spark describe endpoints alluxio-master-0
Name: alluxio-master-0
Namespace: spark
Labels: app=alluxio
chart=alluxio-0.6.5
heritage=Helm
release=alluxio-1597374253
role=alluxio-master
service.kubernetes.io/headless=
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2020-08-14T03:04:36Z
Subsets:
Addresses: 10.244.5.96
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
embedded 19200 TCP
rpc 19998 TCP
job-rpc 20001 TCP
web 19999 TCP
job-web 20002 TCP
job-embedded 20003 TCP
Events: <none>
Logs for master :
[root@k8s-master01 ~]# kubectl -n spark logs alluxio-master-0 -c alluxio-master
2020-08-14 03:04:20,037 INFO BlockMasterFactory - Creating alluxio.master.block.BlockMaster
2020-08-14 03:04:20,037 INFO FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster
2020-08-14 03:04:20,037 INFO MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster
2020-08-14 03:04:20,037 INFO MetaMasterFactory - Creating alluxio.master.meta.MetaMaster
2020-08-14 03:04:20,037 INFO TableMasterFactory - Creating alluxio.master.table.TableMaster
2020-08-14 03:04:20,113 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=alluxio-master-0, rack=null)
2020-08-14 03:04:20,114 INFO UnderDatabaseRegistry - Loading udb jars from /opt/alluxio-2.3.0/lib
2020-08-14 03:04:20,137 INFO UnderDatabaseRegistry - Registered UDBs: hive,glue
2020-08-14 03:04:20,139 INFO LayoutRegistry - Registered Table Layouts: hive
2020-08-14 03:04:20,146 INFO ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.3.0/lib
2020-08-14 03:04:20,174 INFO ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.3.0/extensions
2020-08-14 03:04:20,942 INFO ProcessUtils - Starting Alluxio master @alluxio-master-0:19998.
2020-08-14 03:04:20,943 INFO RaftJournalSystem - Initializing Raft Journal System
2020-08-14 03:04:20,951 INFO JournalStateMachine - Initialized new journal state machine
2020-08-14 03:04:21,062 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-08-14 03:04:21,063 INFO RaftJournalSystem - Starting Raft journal system. Cluster addresses: [alluxio-master-0:19200, alluxio-master-1:19200, alluxio-master-2:19200]. Local address: alluxio-master-0:19200
2020-08-14 03:04:21,288 INFO GrpcMessagingServer - Successfully started messaging server at: alluxio-master-0:19200
2020-08-14 03:04:21,299 INFO ServerContext - alluxio-master-0:19200 - Transitioning to FOLLOWER
2020-08-14 03:04:21,307 INFO AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-08-14 03:04:21,333 INFO NettyUtils - EPOLL_MODE is available
Aug 14, 2020 3:04:21 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<4>: (alluxio-master-2:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-2, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-2: Name does not resolve
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-2: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
Aug 14, 2020 3:04:21 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<6>: (alluxio-master-1:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-1, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-1: Name does not resolve
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-1: Name does not resolve
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
2020-08-14 03:04:31,458 INFO FollowerState - alluxio-master-0:19200 - Polling members [ServerMember[type=ACTIVE, status=AVAILABLE, serverAddress=alluxio-master-2:19200, clientAddress=null], ServerMember[type=ACTIVE, status=AVAILABLE, serverAddress=alluxio-master-1:19200, clientAddress=null]]
Aug 14, 2020 3:04:31 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<10>: (alluxio-master-1:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-1, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-1
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-1
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
logs for worker:
[root@k8s-master01 ~]# kubectl -n spark logs alluxio-worker-55n5h -c alluxio-worker
2020-08-14 05:18:17,159 INFO NettyUtils - EPOLL_MODE is available
Aug 14, 2020 5:18:22 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<1>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0: Try again
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
2020-08-14 05:18:22,395 WARN RetryUtils - Failed to load cluster default configuration with master (attempt 1): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master-0:19998 to load cluster default configuration values: UNAVAILABLE: Unable to resolve host alluxio-master-0
Aug 14, 2020 5:18:22 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<3>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
Logs for fuse:
[root@k8s-master01 ~]# kubectl -n spark logs alluxio-fuse-pmr58
umount: /mnt/alluxio-fuse: not mounted
Starting alluxio-fuse on local host.
2020-08-14 05:18:17,532 INFO TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=192.168.37.162, rack=null)
2020-08-14 05:19:13,220 INFO NettyUtils - EPOLL_MODE is available
Aug 14, 2020 5:19:33 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<1>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0: Temporary failure in name resolution
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0: Temporary failure in name resolution
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
Aug 14, 2020 5:19:33 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<3>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
... 5 more
}
To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)
Expected behavior A clear and concise description of what you expected to happen.
Urgency Describe the impact and urgency of the bug.
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (4 by maintainers)
Shutdown flannel checksum by using
ethtool -K flannel.1 tx-checksum-ip-generic off
seem to solve this issue.Flannel checksum is there for a reason. How is this a remedy even let alone a solution? At ebst its a temp hack and its been 2 years Alluxio should have fixed this …
But same issue in v2.8.1 today