crate: UPDATE query crashes node (java.lang.StackOverflowError: null)

CrateDB version: 2.1.6, 2.3.2

JVM version: openjdk version “1.8.0_151” OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12) OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version / environment description: Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-47-generic x86_64)

Problem description:

UPDATE query crashes node.

[2018-02-16T23:00:01,485][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [s1] fatal error in thread [elasticsearch[s1][bulk][T#2]], exiting
java.lang.StackOverflowError: null
	at java.util.HashMap.hash(HashMap.java:339) ~[?:1.8.0_151]
	at java.util.HashMap.get(HashMap.java:557) ~[?:1.8.0_151]
	at java.util.Collections$UnmodifiableMap.get(Collections.java:1454) ~[?:1.8.0_151]
	at io.crate.operation.scalar.DateTruncFunction.intervalAsUnit(DateTruncFunction.java:192) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.operation.scalar.DateTruncFunction.rounding(DateTruncFunction.java:170) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.operation.scalar.DateTruncFunction.compile(DateTruncFunction.java:116) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.operation.BaseImplementationSymbolVisitor.visitFunction(BaseImplementationSymbolVisitor.java:53) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.operation.BaseImplementationSymbolVisitor.visitFunction(BaseImplementationSymbolVisitor.java:39) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.analyze.symbol.Function.accept(Function.java:62) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.analyze.symbol.SymbolVisitor.process(SymbolVisitor.java:32) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.operation.InputFactory$Context.add(InputFactory.java:166) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.executor.transport.TransportShardUpsertAction.resolveSymbols(TransportShardUpsertAction.java:319) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.executor.transport.TransportShardUpsertAction.processGeneratedColumns(TransportShardUpsertAction.java:573) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.executor.transport.TransportShardUpsertAction.prepareUpdate(TransportShardUpsertAction.java:381) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.executor.transport.TransportShardUpsertAction.indexItem(TransportShardUpsertAction.java:240) ~[crate-app-2.1.6.jar:2.1.6]
	at io.crate.executor.transport.TransportShardUpsertAction.indexItem(TransportShardUpsertAction.java:261) ~[crate-app-2.1.6.jar:2.1.6]
        [repeated 100+ times]

Steps to reproduce:

SCHEMA:

CREATE TABLE IF NOT EXISTS "myschema"."notification" (
   "company_id" STRING GENERATED ALWAYS AS substr("group_id", 1, 6),
   "created_at" TIMESTAMP NOT NULL,
   "deleted_at" TIMESTAMP,
   "details" OBJECT (DYNAMIC) AS (
      "coordinates" GEO_POINT,
      "device_sn" STRING,
      "document_id" STRING,
      "geofence_id" STRING,
      "group_id" STRING,
      "limit" LONG,
      "seal_id" STRING,
      "speed" LONG,
      "user_id" STRING,
      "vehicle_id" STRING
   ),
   "group_id" STRING,
   "id" STRING NOT NULL,
   "month" TIMESTAMP GENERATED ALWAYS AS date_trunc('month', "created_at"),
   "persistent" BOOLEAN NOT NULL,
   "status" STRING NOT NULL,
   "timestamp" TIMESTAMP,
   "type" STRING NOT NULL,
   "updated_at" TIMESTAMP NOT NULL
)
CLUSTERED BY ("id") INTO 6 SHARDS
PARTITIONED BY ("month")
WITH (
   "allocation.max_retries" = 5,
   "blocks.metadata" = false,
   "blocks.read" = false,
   "blocks.read_only" = false,
   "blocks.write" = false,
   column_policy = 'dynamic',
   "mapping.total_fields.limit" = 1000,
   number_of_replicas = '1',
   "recovery.initial_shards" = 'quorum',
   refresh_interval = 1000,
   "routing.allocation.enable" = 'all',
   "routing.allocation.total_shards_per_node" = -1,
   "translog.durability" = 'REQUEST',
   "translog.flush_threshold_size" = 536870912,
   "translog.sync_interval" = 5000,
   "unassigned.node_left.delayed_timeout" = 60000,
   "warmer.enabled" = true,
   "write.wait_for_active_shards" = 'all'
)

QUERY

update "myschema"."notification"
set "deleted_at" = CURRENT_TIMESTAMP
where "timestamp" < 1518904800000 and "persistent" = false and "deleted_at" is null;

NOTE: Only fails if there are rows in table that match WHERE conditions. NOTE 2: Transforming the UPDATE into a DELETE query does not crash the node - it works as expected.

Use case:

Feature description:

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

@rps-v Great to hear that this solves your issue. Anyway this should not happen even when a node crashes with in-flight inserts. I’m closing this issue for now but also try to investigate how documents could get into this persistent version state. Thx for reporting.

@rps-v The document that causes the issue is with _id: AWGOgCvq3g58eQ8H13w2. So as quick workaround take a backup of this document with: select * from "tracknamic"."notification" where _id='AWGOgCvq3g58eQ8H13w2'; and saving the output. Then delete it: delete from "tracknamic"."notification" where _id='AWGOgCvq3g58eQ8H13w2'; and insert it again: insert into "tracknamic"."notification" (....) values(....)

This document ended up with _version=-4 which causes the stackoverflow…