beam: [Failing Test]: Various TPC-DS queries throw NPEs using SparkRunner
What happened?
Various TPC-DS queries started throwing NPEs with the SparkRunner some while back (see here):
java.lang.NullPointerException
at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:903)
at org.apache.beam.sdk.util.WindowedValue$TimestampedWindowedValue.<init>(WindowedValue.java:312)
at org.apache.beam.sdk.util.WindowedValue$TimestampedValueInGlobalWindow.<init>(WindowedValue.java:329)
at org.apache.beam.sdk.util.WindowedValue.of(WindowedValue.java:95)
Without looking further into the underlying root cause, this seems to be related to #27617.
Issue Failure
Failure: Test is flaky
Issue Priority
Priority: 2 (backlog / disabled test but we think the product is healthy)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 27 (27 by maintainers)
Commits related to this issue
- [runners-spark] Do not set accTimestamp to null in SparkCombineFn (#28256) — committed to je-ik/beam by je-ik 8 months ago
- [runners-spark] Do not set accTimestamp to null in SparkCombineFn (#28256) — committed to je-ik/beam by je-ik 8 months ago
- [runners-spark] Do not set accTimestamp to null in SparkCombineFn (#28256) — committed to je-ik/beam by je-ik 8 months ago
- [runners-spark] #28256 add test proving fix — committed to je-ik/beam by je-ik 8 months ago
- [runners-spark] #28256 add test proving fix — committed to je-ik/beam by je-ik 8 months ago
- [runners-spark] Do not set accTimestamp to null in SparkCombineFn (#28256) — committed to je-ik/beam by je-ik 8 months ago
- Merge pull request #29162: [runners-spark] Do not set accTimestamp to null in SparkCombineFn (#28256) — committed to apache/beam by je-ik 8 months ago
I created an issue for that #29198
Yes, I’ll fix this
Ah, I see,
gs://beam-tpcds/datasets/parquet/nonpartitioned
.Yes, I’d only like to walk the code again to be sure exactly what might be the impact of the fix. Yes, it is strange it was not caught by VR tests. I’ll look into it.