nessie: Nessie Http parse exception

I have been using this system (nessie & iceberg & spark) all along because it has been running smoothly, so I have never changed any dependency versions. However, suddenly, when another colleague who did not have a cache built the same dockerfile, I encountered the following error while running the program to insert data into the table:

Dependency:

  • Nessie version: 0.67.0 nessie-spark-extensions-base、nessie-spark-extensions-3.3、nessie-client、nessie-model
  • Spark version: 3.3.2
  • Scala : 2.13 -iceberg: 1.3.1

Exception:

org.projectnessie.client.http.HttpClientException: Cannot parse response.
        at org.projectnessie.client.http.HttpResponse.readEntity(HttpResponse.java:56)
        at org.projectnessie.client.http.HttpResponse.readEntity(HttpResponse.java:78)
        at org.projectnessie.client.http.HttpTreeClient.getReferenceByName(HttpTreeClient.java:82)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.projectnessie.client.http.NessieHttpClient$ExceptionRewriter.invoke(NessieHttpClient.java:79)
        at jdk.proxy2/jdk.proxy2.$Proxy27.getReferenceByName(Unknown Source)
        at org.projectnessie.client.http.v1api.HttpGetReference.get(HttpGetReference.java:35)
        at org.apache.iceberg.nessie.NessieIcebergClient.loadReference(NessieIcebergClient.java:110)
        at org.apache.iceberg.nessie.NessieIcebergClient.lambda$new$0(NessieIcebergClient.java:81)
        at org.apache.iceberg.relocated.com.google.common.base.Suppliers$NonSerializableMemoizingSupplier.get(Suppliers.java:183)
        at org.apache.iceberg.nessie.NessieIcebergClient.getRef(NessieIcebergClient.java:89)
        at org.apache.iceberg.nessie.NessieIcebergClient.refresh(NessieIcebergClient.java:93)
        at org.apache.iceberg.nessie.NessieTableOperations.doRefresh(NessieTableOperations.java:115)
        at org.apache.iceberg.BaseMetastoreTableOperations.refresh(BaseMetastoreTableOperations.java:97)
        at org.apache.iceberg.BaseMetastoreTableOperations.current(BaseMetastoreTableOperations.java:80)
        at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:47)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2406)
        at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2404)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2387)
        at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
        at com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
        at org.apache.iceberg.CachingCatalog.loadTable(CachingCatalog.java:166)
        at org.apache.iceberg.spark.SparkCatalog.load(SparkCatalog.java:642)
        at org.apache.iceberg.spark.SparkCatalog.loadTable(SparkCatalog.java:160)
        at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1202)
        at scala.Option.orElse(Option.scala:477)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1201)
        at scala.Option.orElse(Option.scala:477)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1193)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1049)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1028)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
        at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
        at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1028)
        at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:987)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
        at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)
        at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)
        at scala.collection.immutable.List.foldLeft(List.scala:79)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:231)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:227)
        at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:173)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:227)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:188)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
        at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:212)
        at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
        at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:211)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
        at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:185)
        at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:510)
        at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:185)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
        at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:184)
        at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
        at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
        at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
        at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:116)
        at org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:195)
        at org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:149)
        at ai.weride.datalake_service.write_center.Insert$.$anonfun$writeToIceberg$3(Insert.scala:49)
        at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
        at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
        at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:890)
        at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
        at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
        at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
        at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
        at zio.internal.ZScheduler$$anon$4.run(ZScheduler.scala:476)
Caused by: com.fasterxml.jackson.databind.exc.InvalidDefinitionException: Cannot construct instance of `org.projectnessie.model.Reference` (no Creators, like default constructor, exist): abstract types either need to be mapped to concrete types, have custom deserializer, or contain additional type information
 at [Source: (jdk.internal.net.http.ResponseSubscribers$HttpResponseInputStream); line: 1, column: 1]
        at com.fasterxml.jackson.databind.exc.InvalidDefinitionException.from(InvalidDefinitionException.java:67)
        at com.fasterxml.jackson.databind.DeserializationContext.reportBadDefinition(DeserializationContext.java:1904)
        at com.fasterxml.jackson.databind.DatabindContext.reportBadDefinition(DatabindContext.java:400)
        at com.fasterxml.jackson.databind.DeserializationContext.handleMissingInstantiator(DeserializationContext.java:1349)
        at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserialize(AbstractDeserializer.java:274)
        at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
        at com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2051)
        at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1427)
        at org.projectnessie.client.http.HttpResponse.readEntity(HttpResponse.java:53)
        ... 86 more

BTW: I had no issues running it locally (IDE, sbt run). However, when I used sbt pack (which is also my dockerfile’s execution plan), I was able to reproduce this issue consistently.

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

In the scenario where only Nessie is used as a iceberg catalog, is it sufficient to only add nessie-spark-extensions?

Yes, see https://projectnessie.org/tools/iceberg/spark/

I’m closing this issue, because it’s not a bug. Feel free to discuss further on https://project-nessie.zulipchat.com/

@dimas-b is correct. Having iceberg-nessie AND nessie-model/client on the classpath is not supported.

Mixing iceberg-spark-runtime-3.3_2.13-1.3.1.jar and nessie-model-0.67.0.jar is not supported.

Iceberg shades Jackson (which includes serialization annotations), but nessie-model-0.67.0.jar references Jackson classes directly. This creates a conflict-prone environment where class resolution is not deterministic… This is probably why behaviour changes on restart.