neo: "Operation is not valid due to the current state of the object" error when consensus node handle 512 txs
Describe the bug when handling 512 transactions to generating a new block, consensus service will stuck and throw “Operation is not valid due to the current state of the object” error.
To Reproduce Steps to reproduce the behavior: neo-cli: master 51cd29fbe21abb9e1f17f64e5c6d21bc7decbbb9 neo: master ab4830cae3242e80fd60e7df6c0d9143803b7be3 neo-vm: master be2ac36bf35a3033d828e0ba0630d390599c487d MillisecondsPerBlock is 15000.
- build latest neo-cli with neo/neo-vm project.
- deploy 4 consensus node.
- deploy 1 external node
- send 50 tx/sec and last about40secs to external node.
{
"jsonrpc": "2.0",
"method": "sendtoaddress",
"params": ["0x8c23f196d8a1bfd103a9dcb1f9ccf0c611377d3b", "NNa1Fgc82Qoh65TJfViqC92W8Z8AWL9YKA",0.00000001],
"id": 1
}
Expected behavior consensus nodes are expected to handle massive txs.
Screenshots
It can be seen from the screenshot that when number of txs is 216, consensus will continue, and when number of txs is 512, then error shows.
[18:17:18.052] initialize: height=35 view=0 index=3 role=Primary
[18:17:33.058] timeout: height=35 view=0
[18:17:33.059] send prepare request: height=35 view=0
[18:17:33.189] OnPrepareResponseReceived: height=35 view=0 index=0
[18:17:33.299] OnPrepareResponseReceived: height=35 view=0 index=1
[18:17:33.302] send commit
[18:17:33.880] OnCommitReceived: height=35 view=0 index=1 nc=1 nf=0
[18:17:34.118] OnPrepareResponseReceived: height=35 view=0 index=2
[18:17:34.124] OnCommitReceived: height=35 view=0 index=2 nc=2 nf=0
[18:17:34.137] relay block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.164] persist block: height=35 hash=0xa5d4b10cfaf0c6b24f598c975235b15d6399176fb4b0fb1fd41224d43a250103 tx=216
[18:17:34.165] initialize: height=36 view=0 index=3 role=Backup
[18:17:49.159] OnPrepareRequestReceived: height=36 view=0 index=0 tx=512
[18:17:49.164] send prepare response
[18:17:49.890] OnPrepareResponseReceived: height=36 view=0 index=1
[18:17:49.892] send commit
[18:17:49.910] OnCommitReceived: height=36 view=0 index=1 nc=1 nf=0
[18:17:50.145] OnCommitReceived: height=36 view=0 index=0 nc=2 nf=0
[18:17:50.150] relay block: height=36 hash=0xb3be2ffab0fd2ac9f46dd8d14e5727d9d6c3f7e739cf72cf2c034d2100cc40cb tx=512
[ERROR][1/14/2020 10:17:50 AM][Thread 0005][akka://NeoSystem/user/$a] Operation is not valid due to the current state of the object.
Cause: System.InvalidOperationException: Operation is not valid due to the current state of the object.
at Neo.Ledger.Blockchain.Persist(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 503
at Neo.Ledger.Blockchain.OnNewBlock(Block block) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 333
at Neo.Ledger.Blockchain.OnReceive(Object message) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 477
at Akka.Actor.UntypedActor.Receive(Object message)
at Akka.Actor.ActorBase.AroundReceive(Receive receive, Object message)
at Akka.Actor.ActorCell.ReceiveMessage(Object message)
at Akka.Actor.ActorCell.Invoke(Envelope envelope)
[ERROR][1/14/2020 10:17:50 AM][Thread 0027][akka://NeoSystem/user/$a] Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
Cause: [akka://NeoSystem/user/$a#456841214]: Akka.Actor.PostRestartException: Exception post restart (System.InvalidOperationException)
---> System.TypeLoadException: Error while creating actor instance of type Neo.Ledger.Blockchain with 2 args: (Neo.NeoSystem,Neo.Plugins.Storage.Store)
---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> System.InvalidOperationException: Operation is not valid due to the current state of the object.
at Neo.Ledger.Blockchain..ctor(NeoSystem system, IStore store) in D:\NEO\Github\neo\src\neo\Ledger\Blockchain.cs:line 107
--- End of inner exception stack trace ---
at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
at System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture)
at Akka.Actor.Props.ActivatorProducer.Produce()
at Akka.Actor.Props.NewActor()
--- End of inner exception stack trace ---
at Akka.Actor.Props.NewActor()
at Akka.Actor.ActorCell.CreateNewActorInstance()
at Akka.Actor.ActorCell.<>c__DisplayClass109_0.<NewActor>b__0()
at Akka.Actor.ActorCell.UseThreadContext(Action action)
at Akka.Actor.ActorCell.NewActor()
at Akka.Actor.ActorCell.FinishRecreate(Exception cause, ActorBase failedActor)
--- End of inner exception stack trace ---
Platform:
- 4 consensus nodes: n1 - Ubuntu 18.04 n2 - CentOS 7.4 n3 - Winserver 2016 n4 - Ubuntu 18.04 external node - Ubuntu 18.04
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (18 by maintainers)
According to the issue 1, I prefer to skip
ReferenceCounter
if the Trigger isSystem
Agree, it’s better to close the door which we may not know.
Maybe we should consider to rollback the snapshot, and
FAULT
all txs if inOnPersist
in native contracts was FAULT, otherwise we open a door for Denial Of Service. @erikzhang@shargon @eryeer was right, the root cause is ReferenceCounter.CheckZeroReferred() <= MaxStackSize return false, and leads to VM execute state as FAULT.
https://github.com/neo-project/neo-vm/blob/be2ac36bf35a3033d828e0ba0630d390599c487d/src/neo-vm/ExecutionEngine.cs#L1237-L1241
After I changed MaxStackSize as 2 * 2 * 1024, the CN service will recover. and no exception throws. Initial balance of the sender: 100,000 gas Here is the result of changing MaxStackSize to 2 * 2 * 1024.
@shargon I reproduced this problem, and found that the problem was on
PostExecuteInstruction()
in ExecutionEngine. When the number of tx reached 512, the result ofReferenceCounter.CheckZeroReferred ()
was 2053, which was greater thanMaxStackSize
: 2048, which caused the check to fail, and the VM returned Fault. After enlarging MaxStackSize to a larger number, such as 4096, this problem disappeared in my machine.@shargon It happens on all consensus nodes
The error it’s located here
I will review it
@cloud8little the error appear with only one CN ?