neo: "Operation is not valid due to the current state of the object" error when consensus node handle 512 txs

Describe the bug when handling 512 transactions to generating a new block, consensus service will stuck and throw “Operation is not valid due to the current state of the object” error.

To Reproduce Steps to reproduce the behavior: neo-cli: master 51cd29fbe21abb9e1f17f64e5c6d21bc7decbbb9 neo: master ab4830cae3242e80fd60e7df6c0d9143803b7be3 neo-vm: master be2ac36bf35a3033d828e0ba0630d390599c487d MillisecondsPerBlock is 15000.

  1. build latest neo-cli with neo/neo-vm project.
  2. deploy 4 consensus node.
  3. deploy 1 external node
  4. send 50 tx/sec and last about40secs to external node.
{
  "jsonrpc": "2.0",
  "method": "sendtoaddress",
  "params": ["0x8c23f196d8a1bfd103a9dcb1f9ccf0c611377d3b", "NNa1Fgc82Qoh65TJfViqC92W8Z8AWL9YKA",0.00000001],
  "id": 1
}

Expected behavior consensus nodes are expected to handle massive txs.

Screenshots

It can be seen from the screenshot that when number of txs is 216, consensus will continue, and when number of txs is 512, then error shows. fa03d6d4efcea75043d0c4781cb51f5

[18:17:18.052] initialize: height=35 view=0 index=3 role=Primary
[18:17:33.058] timeout: height=35 view=0
[18:17:33.059] send prepare request: height=35 view=0
[18:17:33.189] OnPrepareResponseReceived: height=35 view=0 index=0
[18:17:33.299] OnPrepareResponseReceived: height=35 view=0 index=1
[18:17:33.302] send commit
[18:17:33.880] OnCommitReceived: height=35 view=0 index=1 nc=1 nf=0
[18:17:34.118] OnPrepareResponseReceived: height=35 view=0 index=2
[18:17:34.124] OnCommitReceived: height=35 view=0 index=2 nc=2 nf=0
[18:17:34.137] relay block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.164] persist block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.165] initialize: height=36 view=0 index=3 role=Backup
[18:17:49.159] OnPrepareRequestReceived: height=36 view=0 index=0 tx=512
[18:17:49.164] send prepare response
[18:17:49.890] OnPrepareResponseReceived: height=36 view=0 index=1
[18:17:49.892] send commit
[18:17:49.910] OnCommitReceived: height=36 view=0 index=1 nc=1 nf=0
[18:17:50.145] OnCommitReceived: height=36 view=0 index=0 nc=2 nf=0
[18:17:50.150] relay block: height=36 hash=0xb3be2ffab0fd2ac9f46dd8d14e5727d9d6c3f7e739cf72cf2c034d2100cc40cb tx=512
[ERROR][1/14/2020 10:17:50 AM][Thread 0005][akka://NeoSystem/user/$a] Operation is not valid due to the current state of the object.
Cause: System.InvalidOperationException: Operation is not valid due to the current state of the object.
   at Neo.Ledger.Blockchain.Persist(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 503
   at Neo.Ledger.Blockchain.OnNewBlock(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 333
   at Neo.Ledger.Blockchain.OnReceive(Object message) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 477
   at Akka.Actor.UntypedActor.Receive(Object message)
   at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
   at Akka.Actor.ActorCell.ReceiveMessage(Object message)
   at Akka.Actor.ActorCell.Invoke(Envelope envelope)
[ERROR][1/14/2020 10:17:50 AM][Thread 0027][akka://NeoSystem/user/$a] Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
Cause: [akka://NeoSystem/user/$a#456841214]: Akka.Actor.PostRestartException: Exception post restart (System.InvalidOperationException)
 ---> System.TypeLoadException: Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
 ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.InvalidOperationException: Operation is not valid due to the current state of the object.
   at Neo.Ledger.Blockchain..ctor(NeoSystem system, IStore store) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 107
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
   at Akka.Actor.Props.ActivatorProducer.Produce()
   at Akka.Actor.Props.NewActor()
   --- End of inner exception stack trace ---
   at Akka.Actor.Props.NewActor()
   at Akka.Actor.ActorCell.CreateNewActorInstance()
   at Akka.Actor.ActorCell.<>c__DisplayClass109_0.<NewActor>b__0()
   at Akka.Actor.ActorCell.UseThreadContext(Action action)
   at Akka.Actor.ActorCell.NewActor()
   at Akka.Actor.ActorCell.FinishRecreate(Exception cause, ActorBase failedActor)
   --- End of inner exception stack trace ---

f3597892b8b49a5af8c53e45d6a41b4

Platform:

  • 4 consensus nodes: n1 - Ubuntu 18.04 n2 - CentOS 7.4 n3 - Winserver 2016 n4 - Ubuntu 18.04 external node - Ubuntu 18.04

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (18 by maintainers)

Most upvoted comments

According to the issue 1, I prefer to skip ReferenceCounter if the Trigger is System

protected override bool PostExecuteInstruction(Instruction instruction)
        {
            if (Trigger == TriggerType.System) return true;
            return base.PostExecuteInstruction(instruction);
        }

Maybe we should consider to rollback the snapshot, and FAULT all txs if in OnPersist in native contracts was FAULT, otherwise we open a door for Denial Of Service. @erikzhang

image

Agree, it’s better to close the door which we may not know.

Maybe we should consider to rollback the snapshot, and FAULT all txs if in OnPersist in native contracts was FAULT, otherwise we open a door for Denial Of Service. @erikzhang

image

@shargon @eryeer was right, the root cause is ReferenceCounter.CheckZeroReferred() <= MaxStackSize return false, and leads to VM execute state as FAULT.

https://github.com/neo-project/neo-vm/blob/be2ac36bf35a3033d828e0ba0630d390599c487d/src/neo-vm/ExecutionEngine.cs#L1237-L1241

9dbe4d48b0d58f89eb721f6b74cf96a

After I changed MaxStackSize as 2 * 2 * 1024, the CN service will recover. and no exception throws. Initial balance of the sender: 100,000 gas Here is the result of changing MaxStackSize to 2 * 2 * 1024.

362b313a3df1f1e37e07a910baf4fbd

22749c0607e44db7e2498d15b314816

@shargon I reproduced this problem, and found that the problem was on PostExecuteInstruction() in ExecutionEngine. When the number of tx reached 512, the result of ReferenceCounter.CheckZeroReferred () was 2053, which was greater than MaxStackSize: 2048, which caused the check to fail, and the VM returned Fault. After enlarging MaxStackSize to a larger number, such as 4096, this problem disappeared in my machine.

@shargon It happens on all consensus nodes

The error it’s located here

image

I will review it

@cloud8little the error appear with only one CN ?