risingwave: Failed in get: Hummock error: ObjectStore failed with IO error
Describe the bug
Slack link:
https://risingwave-labs.slack.com/archives/C048NM5LNKX/p1671603133726859
Namespace: rwc-3-longevity-20221220-180642
Pod: risingwave-compute-2
2022-12-20T18:11:34.875003Z ERROR risingwave_storage::monitor::monitored_store: Failed in get: Hummock error: ObjectStore failed with IO error Internal error: read "rls-apse1-eks-a-rwc-3-longevity-20221220-180642/255/1.data" in block Some(BlockLocation { offset: 11008083, size: 37158 }) failed, error: timeout: error trying to connect: HTTP connect timeout occurred after 3.1s
backtrace of `ObjectError`:
0: <risingwave_object_store::object::error::ObjectError as core::convert::From<risingwave_object_store::object::error::ObjectErrorInner>>::from
at ./risingwave/src/object_store/src/object/error.rs:38:10
1: <T as core::convert::Into<U>>::into
at ./rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/convert/mod.rs:726:9
2: <risingwave_object_store::object::error::ObjectError as core::convert::From<aws_smithy_http::result::SdkError<E>>>::from
at ./risingwave/src/object_store/src/object/error.rs:81:9
3: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
at ./rustc/bdb07a8ec8e77aa10fb84fae1d4ff71c21180bb4/library/core/src/result.rs:2108:27
4: <risingwave_object_store::object::s3::S3ObjectStore as risingwave_object_store::object::ObjectStore>::read::{{closure}}
at ./risingwave/src/object_store/src/object/s3.rs:351:20
5: <async_stack_trace::StackTraced<F,_> as core::future::future::Future>::poll
6: risingwave_object_store::object::MonitoredObjectStore<OS>::read::{{closure}}
at ./risingwave/src/object_store/src/object/mod.rs:643:13
7: risingwave_object_store::object::ObjectStoreImpl::read::{{closure}}
at ./risingwave/src/object_store/src/object/mod.rs:334:9
8: risingwave_storage::hummock::sstable_store::SstableStore::sstable::{{closure}}::{{closure}}::{{closure}}
at ./risingwave/src/storage/src/hummock/sstable_store.rs:346:25
9: risingwave_common::cache::LruCache<K,T>::lookup_with_request_dedup::{{closure}}::{{closure}}
at ./risingwave/src/common/src/cache.rs:818:58
10: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
at ./root/.cargo/registry/src/github.com-1ecc6299db9ec823/tracing-0.1.37/src/instrument.rs:272:9
To Reproduce
No response
Expected behavior
No response
Additional context
Or this is expected?
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 16 (16 by maintainers)
In terms of network bandwidth, neither of these 2 failed case reaches 10Gbps limit.
rwc-3-longevity-20230131-171156
rwc-3-longevity-20230201-170952
Will look into s3 client SDK’s connection cache/pool impl first, if any.No, SDK uses hyper.rwc-3-longevity-20230131-171156 seems not caused by bandwidth this time.
In rwc-3-longevity-20230104-180851 we are using c5a.8xlarge (10 Gbps network capacity) for compute nodes. The rate of (node_network_transmit_bytes + node_network_receive_bytes) does reach 10Gbps. We’d better impose more fixing besides merely increasing node’s network capacity.
Testing larger retry max attempts.
Some info:
Let’s wait and see another test running with higher network capacity.
DNS quotas
Update: DNS cache is already enabled in our cloud env. So we won’t hit DNS quotas even there is bursting S3 requests during MV creation.
Agree not a good idea
As the kernel manages CPU resources by setting parallelism not more than the number of CPUs, and manages memory resources by having
GlobalMemoryManagerevict states from time to time, both in a proactive way,it feels sort of strange if it manages network resources in a reactive way, although being proactive seems a more difficult task indeed.
made-up cases:
Increase the connect_timeout does work around this issue (3.1s by default, I use 60s which is a large enough but not a practical value). But I’m not sure it’s a good idea to increase it. Can this kind of IO error be more of an indicator that current cluster size cannot handle the workload, and we should consider to reduce the load of each worker node?