iceberg: NullPointerException on containsNan()

ContainsNan is a primitive boolean in ManifestFileUtil https://github.com/apache/iceberg/blob/6c6096d44cd1315c40e1e718c3186ba8927c3219/core/src/main/java/org/apache/iceberg/util/ManifestFileUtil.java#L45 But a boxed Boolean in GenericPartitionFieldSummary https://github.com/apache/iceberg/blob/6c6096d44cd1315c40e1e718c3186ba8927c3219/core/src/main/java/org/apache/iceberg/GenericPartitionFieldSummary.java#L45 This is also a problem because the interface for partition field summary also declares the return type to be boxed and returns null by default https://github.com/apache/iceberg/blob/6c6096d44cd1315c40e1e718c3186ba8927c3219/api/src/main/java/org/apache/iceberg/ManifestFile.java#L209-L211

Which can lead to an NPE if the GPFS = null

        Boolean x = null;
        boolean y = x; // NullPointerException
Caused by: java.lang.NullPointerException
	at org.apache.iceberg.util.ManifestFileUtil$FieldSummary.<init>(ManifestFileUtil.java:54)
	at org.apache.iceberg.util.ManifestFileUtil.summaries(ManifestFileUtil.java:150)
	at org.apache.iceberg.util.ManifestFileUtil.canContainAny(ManifestFileUtil.java:131)
	at org.apache.iceberg.ManifestFilterManager.canContainDeletedFiles(ManifestFilterManager.java:329)
	at org.apache.iceberg.ManifestFilterManager.filterManifest(ManifestFilterManager.java:285)
	at org.apache.iceberg.ManifestFilterManager.lambda$filterManifests$0(ManifestFilterManager.java:182)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
	at org.apache.iceberg.util.Tasks$Builder.access$300(Tasks.java:70)
	at org.apache.iceberg.util.Tasks$Builder$1.run(Tasks.java:310)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	... 3 more

#1872

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

@yyanyy we can always patch it later if it causes an issue, but it least it won’t be a regression i’m introducing in this pr 😛