accumulo: lastCompactID is inconsistent with metadata Exception

Steps to reproduce:

  1. Start a Uno Cluster with 2 tservers (I used commit b3ac225d43bacf86bb280be48de750c409f45e66)
  2. Run RW Bulk test from Accumulo Testing. I ran the following command 4 times with different log files: ~/workspace/accumulo-testing$ ./bin/rwalk Bulk.xml > /tmp/log4 2>&1 &
  3. Let tests finish running and maybe see the error
java.lang.RuntimeException: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23
	at org.apache.accumulo.tserver.tablet.Tablet.lambda$closeConsistencyCheck$4(Tablet.java:1090)
	at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:163)
	at org.apache.accumulo.tserver.tablet.Tablet.closeConsistencyCheck(Tablet.java:1085)
	at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:1003)
	at org.apache.accumulo.tserver.tablet.Tablet.split(Tablet.java:1464)
	at org.apache.accumulo.tserver.TabletServer.splitTablet(TabletServer.java:491)
	at org.apache.accumulo.tserver.TabletClientHandler.splitTablet(TabletClientHandler.java:1006)
	at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:206)
	at com.sun.proxy.$Proxy39.splitTablet(Unknown Source)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2648)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2627)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
	at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:138)
	at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54)
	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129)
	at org.apache.thrift.server.Invocation.run(Invocation.java:18)
	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52)
	at java.base/java.lang.Thread.run(Thread.java:829)

From the tserver log:

2022-05-02T08:16:04,322 [tablet.Tablet] DEBUG: Tablet 5;r13c3b;r0f20c had no dir, creating hdfs://localhost:8020/accumulo/tables/5/t-00000fa
2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for low split 5;r0d7fe;r0860c []
2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for high split 5;r0f20c;r0d7fe []
2022-05-02T08:16:04,356 [tablet.Tablet] ERROR: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23
2022-05-02T08:16:04,357 [tablet.Tablet] ERROR: Failed to do close consistency check for tablet 5<;r13c3b

Manager log:

2022-05-02T08:15:59,338 [tableOps.Utils] INFO : table 5 (6043c9b3efe07636) locked for read operation: COMPACT
2022-05-02T08:15:59,887 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) locked for write operation: MERGE
2022-05-02T08:16:00,991 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) unlocked for write
2022-05-02T08:16:01,008 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) locked for write operation: MERGE
2022-05-02T08:16:02,810 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) unlocked for write
2022-05-02T08:16:02,927 [tableOps.Utils] INFO : table 5 (33fb4a18cefb5fa1) locked for read operation: COMPACT
2022-05-02T08:16:03,659 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) locked for write operation: MERGE
2022-05-02T08:16:03,909 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) unlocked for write
2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (71a50c6c0d42b5b0) locked for read operation: COMPACT
2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (112da7f68f32e75d) locked for read operation: COMPACT
2022-05-02T08:16:08,555 [tableOps.Utils] INFO : table 5 (4de2323e0a3772eb) locked for write operation: MERGE

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

@keith-turner I also dropped the verify and END nodes from Bulk, just so the test would keep running.

I was running 4 Bulk Rwalk jobs again.

Any advice if I wanted to try to reproduce this? How many tservers were you running? You had 4 random walk processes each running the bulk graph? Was everything running on a single VM or multiple VMs?

I was able to produce this on a single EC2 Ubuntu instance, running Uno. I installed Snaphot and set export NUM_TSERVERS=2. Then once I had Uno running, I ran the following command to fire off 4 local RW processes:

for (( i=1; i<5; i++)); do ~/workspace/accumulo-testing/bin/rwalk Bulk.xml > /tmp/rwalk"$i".log 2>&1 & done

FYI I found a bug in the Bulk RW test so make sure you get the fix from accumulo-testing first.

I’ll take a look at it