alluxio: When installing alluxio through helm, UnknownHostException appeared

Alluxio Version: alluxio:2.3.0

Describe the bug Following the guide https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-Kubernetes.html to install alluxio.Error s occurd. I had set multi master3. Pod status is :

[root@k8s-master01 ~]# kubectl -n spark get pods
NAME                                        READY   STATUS             RESTARTS   AGE
alluxio-fuse-j8rrm                          1/1     Running            0          10s
alluxio-fuse-rrcrn                          1/1     Running            0          10s
alluxio-fuse-wfrkr                          1/1     Running            0          10s
alluxio-master-0                            0/2     Running            0          10s
alluxio-worker-4kmbq                        0/2     CrashLoopBackOff   1          10s
alluxio-worker-5ng5f                        0/2     CrashLoopBackOff   1          10s
alluxio-worker-8rvbz                        0/2     CrashLoopBackOff   1          10s

Then i got endpoints:

[root@k8s-master01 ~]# kubectl -n spark describe  endpoints alluxio-master-0 
Name:         alluxio-master-0
Namespace:    spark
Labels:       app=alluxio
              chart=alluxio-0.6.5
              heritage=Helm
              release=alluxio-1597374253
              role=alluxio-master
              service.kubernetes.io/headless=
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2020-08-14T03:04:36Z
Subsets:
  Addresses:          10.244.5.96
  NotReadyAddresses:  <none>
  Ports:
    Name          Port   Protocol
    ----          ----   --------
    embedded      19200  TCP
    rpc           19998  TCP
    job-rpc       20001  TCP
    web           19999  TCP
    job-web       20002  TCP
    job-embedded  20003  TCP

Events:  <none>

Logs for master :

[root@k8s-master01 ~]# kubectl -n spark logs alluxio-master-0 -c alluxio-master
2020-08-14 03:04:20,037 INFO  BlockMasterFactory - Creating alluxio.master.block.BlockMaster 
2020-08-14 03:04:20,037 INFO  FileSystemMasterFactory - Creating alluxio.master.file.FileSystemMaster 
2020-08-14 03:04:20,037 INFO  MetricsMasterFactory - Creating alluxio.master.metrics.MetricsMaster 
2020-08-14 03:04:20,037 INFO  MetaMasterFactory - Creating alluxio.master.meta.MetaMaster 
2020-08-14 03:04:20,037 INFO  TableMasterFactory - Creating alluxio.master.table.TableMaster 
2020-08-14 03:04:20,113 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=alluxio-master-0, rack=null)
2020-08-14 03:04:20,114 INFO  UnderDatabaseRegistry - Loading udb jars from /opt/alluxio-2.3.0/lib
2020-08-14 03:04:20,137 INFO  UnderDatabaseRegistry - Registered UDBs: hive,glue
2020-08-14 03:04:20,139 INFO  LayoutRegistry - Registered Table Layouts: hive
2020-08-14 03:04:20,146 INFO  ExtensionFactoryRegistry - Loading core jars from /opt/alluxio-2.3.0/lib
2020-08-14 03:04:20,174 INFO  ExtensionFactoryRegistry - Loading extension jars from /opt/alluxio-2.3.0/extensions
2020-08-14 03:04:20,942 INFO  ProcessUtils - Starting Alluxio master @alluxio-master-0:19998.
2020-08-14 03:04:20,943 INFO  RaftJournalSystem - Initializing Raft Journal System
2020-08-14 03:04:20,951 INFO  JournalStateMachine - Initialized new journal state machine
2020-08-14 03:04:21,062 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-08-14 03:04:21,063 INFO  RaftJournalSystem - Starting Raft journal system. Cluster addresses: [alluxio-master-0:19200, alluxio-master-1:19200, alluxio-master-2:19200]. Local address: alluxio-master-0:19200
2020-08-14 03:04:21,288 INFO  GrpcMessagingServer - Successfully started messaging server at: alluxio-master-0:19200
2020-08-14 03:04:21,299 INFO  ServerContext - alluxio-master-0:19200 - Transitioning to FOLLOWER
2020-08-14 03:04:21,307 INFO  AbstractPrimarySelector - Primary selector transitioning to SECONDARY
2020-08-14 03:04:21,333 INFO  NettyUtils - EPOLL_MODE is available
Aug 14, 2020 3:04:21 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<4>: (alluxio-master-2:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-2, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-2: Name does not resolve
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-2: Name does not resolve
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}
Aug 14, 2020 3:04:21 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<6>: (alluxio-master-1:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-1, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-1: Name does not resolve
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-1: Name does not resolve
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}
2020-08-14 03:04:31,458 INFO  FollowerState - alluxio-master-0:19200 - Polling members [ServerMember[type=ACTIVE, status=AVAILABLE, serverAddress=alluxio-master-2:19200, clientAddress=null], ServerMember[type=ACTIVE, status=AVAILABLE, serverAddress=alluxio-master-1:19200, clientAddress=null]]
Aug 14, 2020 3:04:31 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<10>: (alluxio-master-1:19200)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-1, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-1
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-1
	at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}

logs for worker:

[root@k8s-master01 ~]# kubectl -n spark logs alluxio-worker-55n5h  -c alluxio-worker
2020-08-14 05:18:17,159 INFO  NettyUtils - EPOLL_MODE is available
Aug 14, 2020 5:18:22 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<1>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0: Try again
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0: Try again
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}
2020-08-14 05:18:22,395 WARN  RetryUtils - Failed to load cluster default configuration with master (attempt 1): alluxio.exception.status.UnavailableException: Failed to handshake with master alluxio-master-0:19998 to load cluster default configuration values: UNAVAILABLE: Unable to resolve host alluxio-master-0
Aug 14, 2020 5:18:22 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<3>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0
	at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}

Logs for fuse:

[root@k8s-master01 ~]# kubectl -n spark logs alluxio-fuse-pmr58
umount: /mnt/alluxio-fuse: not mounted
Starting alluxio-fuse on local host.
2020-08-14 05:18:17,532 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=192.168.37.162, rack=null)
2020-08-14 05:19:13,220 INFO  NettyUtils - EPOLL_MODE is available
Aug 14, 2020 5:19:33 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<1>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0: Temporary failure in name resolution
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0: Temporary failure in name resolution
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}
Aug 14, 2020 5:19:33 AM io.grpc.internal.ManagedChannelImpl$NameResolverListener handleErrorInSyncContext
WARNING: [Channel<3>: (alluxio-master-0:19998)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host alluxio-master-0, cause=java.lang.RuntimeException: java.net.UnknownHostException: alluxio-master-0
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:436)
	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:272)
	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:228)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: alluxio-master-0
	at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:646)
	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:404)
	... 5 more
}

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior A clear and concise description of what you expected to happen.

Urgency Describe the impact and urgency of the bug.

Additional context Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (4 by maintainers)

Most upvoted comments

Shutdown flannel checksum by using ethtool -K flannel.1 tx-checksum-ip-generic off seem to solve this issue.

Shutdown flannel checksum by using ethtool -K flannel.1 tx-checksum-ip-generic off seem to solve this issue.

Flannel checksum is there for a reason. How is this a remedy even let alone a solution? At ebst its a temp hack and its been 2 years Alluxio should have fixed this …

But same issue in v2.8.1 today