accumulo: lastCompactID is inconsistent with metadata Exception
Steps to reproduce:
- Start a Uno Cluster with 2 tservers (I used commit b3ac225d43bacf86bb280be48de750c409f45e66)
- Run RW Bulk test from Accumulo Testing. I ran the following command 4 times with different log files:
~/workspace/accumulo-testing$ ./bin/rwalk Bulk.xml > /tmp/log4 2>&1 &
- Let tests finish running and maybe see the error
java.lang.RuntimeException: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23 at org.apache.accumulo.tserver.tablet.Tablet.lambda$closeConsistencyCheck$4(Tablet.java:1090) at java.base/java.util.OptionalLong.ifPresent(OptionalLong.java:163) at org.apache.accumulo.tserver.tablet.Tablet.closeConsistencyCheck(Tablet.java:1085) at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:1003) at org.apache.accumulo.tserver.tablet.Tablet.split(Tablet.java:1464) at org.apache.accumulo.tserver.TabletServer.splitTablet(TabletServer.java:491) at org.apache.accumulo.tserver.TabletClientHandler.splitTablet(TabletClientHandler.java:1006) at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$0(TraceUtil.java:206) at com.sun.proxy.$Proxy39.splitTablet(Unknown Source) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2648) at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$splitTablet.getResult(TabletClientService.java:2627) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:138) at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:54) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:129) at org.apache.thrift.server.Invocation.run(Invocation.java:18) at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) at java.base/java.lang.Thread.run(Thread.java:829)
From the tserver log:
2022-05-02T08:16:04,322 [tablet.Tablet] DEBUG: Tablet 5;r13c3b;r0f20c had no dir, creating hdfs://localhost:8020/accumulo/tables/5/t-00000fa 2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for low split 5;r0d7fe;r0860c [] 2022-05-02T08:16:04,333 [tablet.Tablet] DEBUG: Files for high split 5;r0f20c;r0d7fe [] 2022-05-02T08:16:04,356 [tablet.Tablet] ERROR: Closed tablet 5<;r13c3b lastCompactID is inconsistent with metadata : 25 != 23 2022-05-02T08:16:04,357 [tablet.Tablet] ERROR: Failed to do close consistency check for tablet 5<;r13c3b
Manager log:
2022-05-02T08:15:59,338 [tableOps.Utils] INFO : table 5 (6043c9b3efe07636) locked for read operation: COMPACT 2022-05-02T08:15:59,887 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) locked for write operation: MERGE 2022-05-02T08:16:00,991 [tableOps.Utils] INFO : table 5 (332cb19f408b595b) unlocked for write 2022-05-02T08:16:01,008 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) locked for write operation: MERGE 2022-05-02T08:16:02,810 [tableOps.Utils] INFO : table 5 (5b528b34c36ceac6) unlocked for write 2022-05-02T08:16:02,927 [tableOps.Utils] INFO : table 5 (33fb4a18cefb5fa1) locked for read operation: COMPACT 2022-05-02T08:16:03,659 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) locked for write operation: MERGE 2022-05-02T08:16:03,909 [tableOps.Utils] INFO : table 5 (56170bb68d50efd6) unlocked for write 2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (71a50c6c0d42b5b0) locked for read operation: COMPACT 2022-05-02T08:16:03,910 [tableOps.Utils] INFO : table 5 (112da7f68f32e75d) locked for read operation: COMPACT 2022-05-02T08:16:08,555 [tableOps.Utils] INFO : table 5 (4de2323e0a3772eb) locked for write operation: MERGE
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- Test to reproduce #2667 — committed to milleruntime/accumulo by milleruntime 2 years ago
- fixes user compaction stuck when producing no output This commit fixes a bug where : * user compactions take multiple compaction steps (because the tablet has many files) * the intermediate steps ... — committed to keith-turner/accumulo by keith-turner 2 years ago
- fixes user compaction stuck when producing no output (#3013) This commit fixes a bug where : * user compactions take multiple compaction steps (because the tablet has many files) * the intermed... — committed to apache/accumulo by keith-turner 2 years ago
- fixes #2667 wait for metadata write in tablet close — committed to keith-turner/accumulo by keith-turner 2 years ago
@keith-turner I also dropped the verify and END nodes from Bulk, just so the test would keep running.
I was able to produce this on a single EC2 Ubuntu instance, running Uno. I installed Snaphot and set
export NUM_TSERVERS=2
. Then once I had Uno running, I ran the following command to fire off 4 local RW processes:FYI I found a bug in the Bulk RW test so make sure you get the fix from accumulo-testing first.
I’ll take a look at it