hudi: [BUG] Unable to execute HTTP request | connection timeout issues
Describe the problem you faced Using hudi in the spark streaming job. Jobs are getting failed due to - HTTP connection timeout:
A clear and concise description of the problem. hudi version: 0.12 table type: COW ingestion mode: INSERT
-
above problem is faced with hudi 0.12 when metadata is enabled - true
-
with metadata - false not seeing this connection limit issue
-
we are running multiple streaming queries/jobs in one spark job. and setting the connection value to 1000 sometimes helps and sometimes does not, due to this job is getting killed spark.hadoop.fs.s3a.connection.maximum: “1000”
-
metadata false behavior is helping but
- time spent on parallel listing paths is comparatively high. [from ~2s to ~1 min]
- Question: will this[time spent on parallel listing paths] time increase, wrt data [size/files] increase?
-
tried with hudi 0.13 version
- hudi 0.13 version is not creating any connection limit issues. But LTS is 0.12.2
Thanks for reading the issue, Need help to check if any other solutions can try or which behavior is more recommended to use in production.
To Reproduce
Steps to reproduce the behavior:
- start spark streaming job with COW table type , metadata enable and have multiple streaming queries [50+]
Expected behavior There seems to be a connection leaks issue in metadata with 0.12.2. A clear and concise description of what you expected to happen.
Environment Description
-
Hudi version : 0.12.2 / 0.13
-
Spark version : 3.3.0
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS…) : S3
-
Running on Docker? (yes/no) : no
Additional context
Add any other context about the problem here.
Stacktrace
2023-03-14 07:04:14 WARN o.a.s.storag.BlockManager.logWarning (Logging.scala:73) [task 0.3 in stage 1115.0 (TID 2558)]: Putting block rdd_2569_0 failed due to exception org.apache.hudi.exception.HoodieException: Exception when reading log file .
2023-03-14 07:04:14 WARN o.a.s.storag.BlockManager.logWarning (Logging.scala:73) [task 0.3 in stage 1115.0 (TID 2558)]: Block rdd_2569_0 could not be removed as it was not found on disk or in memory
2023-03-14 07:04:14 ERROR o.a.s.execut.Executor.logError (Logging.scala:98) [task 0.3 in stage 1115.0 (TID 2558)]: Exception in task 0.3 in stage 1115.0 (TID 2558)
org.apache.hudi.exception.HoodieException: Exception when reading log file
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1216) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1162) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5453) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5400) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1289) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1286) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2203) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2142) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:715) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:195) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:475) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFileReader.<init>(HoodieLogFileReader.java:114) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:110) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:223) ~[__app__.jar:?]
... 29 more
Caused by: com.amazonaws.thirdparty.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:316) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.thirdparty.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:282) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at sun.reflect.GeneratedMethodAccessor257.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362]
at com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.conn.$Proxy51.get(Unknown Source) ~[?:?]
at com.amazonaws.thirdparty.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:190) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1343) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1154) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:811) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:779) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:753) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:713) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:695) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:559) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:539) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5453) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5400) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1372) ~[aws-java-sdk-bundle-1.12.170.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$4(S3AFileSystem.java:1289) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:1286) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2223) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2203) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2142) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hadoop.fs.s3a.S3AFileSystem.open(S3AFileSystem.java:715) ~[hadoop-aws-3.2.1-amzn-8.jar:?]
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.open(HoodieWrapperFileSystem.java:195) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFileReader.getFSDataInputStream(HoodieLogFileReader.java:475) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFileReader.<init>(HoodieLogFileReader.java:114) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:110) ~[__app__.jar:?]
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:223) ~[__app__.jar:?]
... 29 more
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 2
- Comments: 15 (7 by maintainers)
Thanks, but for 0.14.0 we do many improvements to the MDT, let’s see whethe the issue could be solved.