"!!! longevity Result!!!"
Today's date:2024-01-22
Result FAIL
Pipeline Message Test v1.6.1-rc
TestBed kubebench/3264g-medium-3cn-all-affinity
RW Version v1.6.1-rc
Test Start time 2024-01-21 17:00:08
Test End time 2024-01-22 05:02:24
Namespace usrlngvty-20240121-165055
Queries nexmark_q0,nexmark_q1,nexmark_q2,nexmark_q3,nexmark_q4,nexmark_q5,nexmark_q7,nexmark_q8,nexmark_q9,nexmark_q10,nexmark_q12,nexmark_q14,nexmark_q15,nexmark_q16,nexmark_q17,nexmark_q18,nexmark_q19,nexmark_q20,nexmark_q21,nexmark_q22,nexmark_q101,nexmark_q102,nexmark_q103,nexmark_q104,nexmark_q105
Grafana Metric https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=usrlngvty-20240121-165055&from=1705856408000&to=1705899744000
Grafana Logs https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=Logging:%20test-useast1-eks-a&var-namespace=usrlngvty-20240121-165055&from=1705856408000&to=1705899744000
Memory Dumps to S3 https://s3.console.aws.amazon.com/s3/buckets/test-useast1-mgmt-bucket-archiver?region=us-east-1&prefix=k8s/usrlngvty-20240121-165055/&showversions=false
Buildkite Job https://buildkite.com/risingwave-test/longevity-test/builds/954
Crash Container logs https://rw-qa-artifacts-public.s3.us-east-1.amazonaws.com/longevity/954_logs.txt
Report https://rw-qa-artifacts-public.s3.us-east-1.amazonaws.com/longevity/954_report.txt
================================================================================
Restarted/Crashed Containers Details
================================================================================
CONTAINER crashed/Restarted: benchmark-risingwave-compute-c-0 restart_count:1 phase:Running status:True
CONTAINER crashed/Restarted: benchmark-risingwave-compute-c-2 restart_count:1 phase:Running status:True
It could because we disable the memtable spill because of the inconsistent issue, which can cause the backpressure insensitive 0119 on release-1.6 https://github.com/risingwavelabs/risingwave/commit/06058f046de38e91078c1d54f880eccfc934f3de 0112 on main https://github.com/risingwavelabs/risingwave/commit/89a8297ff9082737ccf01f853e2ea3458413de28
May be due to an extra copy of block during decode after #13558. The fix #14786 just gets merged. Let’s wait for today’s longevity to see whether the situation is improved.
No. We made a mistake. The ~1.5GB memory was not taken by Materialize executor itself, but by the children executors. Rust’s async is based on polling, so the children executors are under Materialize in the stack tree.
In the below graph, the highlighted boxes contain keyword “stream::executor::” i.e. they are all stream executors.
benchmark-risingwave-compute-c-2_1705890487-2024-01-22-02-28-06.auto.heap.collapsed.zip
The process was killed by the OOM killer. You can find the logs here.
CONSTRAINT_MEMCGindicates it was killed because of exceeding the memory limit.The total memory of the machine also seems to be sufficient.
node exporter: https://grafana.test.risingwave-cloud.xyz/d/LDL_mx9nz123/kubernetes-node-exporter?orgId=1&var-DS_PROMETHEUS=P2453400D1763B4D9&var-node=ip-10-0-43-102.ec2.internal&var-internal_ip=10.0.43.102&var-diskdevices=[a-z]%2B|nvme[0-9]%2Bn[0-9]%2B|mmcblk[0-9]%2B&from=1705893102767&to=1705983763575