nunit-console: ConsoleRunner spawns too many agents

@CharliePoole commented on Thu Jul 28 2016

@Trass3r commented on Thu Jul 28 2016

packages/NUnit.ConsoleRunner.3.2.0/tools/nunit3-console.exe" [lots of *UnitTest.dlls]  --work=... --out=TestOutput.log --result=TestResult.xml --labels=On --framework=net-4.5 --dispose-runners --agents=8

spawns 1 nunit-agent.exe per dll even though we specified --agents=8
those agents are created serially which takes a really long time and tests aren’t run until this creation phase finishes

@CharliePoole commented on Thu Jul 28 2016

This is working as designed, but it’s possible for us to change at least the internal design.

The --agents option as defined limits the number of agents that may run simultaneously, not the total number of agents created. So, if you were to have 20 assemblies, with a limit of --agents=8 the internal sequence of execution would flow as follows:

All 20 tests are loaded. This requires creation of 20 different agents.
The tests are executed. NUnit starts execution in the first 8 agents. As each agent completes, another one is told to run until all 20 agents have completed execution.

As you point out, this slows things down a bit at the start. I see three things we could do to improve the situation, listed in increasing order of difficulty.

We could postpone loading each test until execution begins. This would not reduce the overall amount of work but would de-serialize it significantly and allow some of the loads (12 of them in my example) to occur in parallel with the execution of other assemblies. We would still create 20 agents. Note that this approach only works for the console runner, since the gui requires all tests to be loaded in order to display them. Hence, it would need to be controlled by a package setting of some kind.
We could load all the tests first, but do it in parallel using the number of agents specified. This is somewhat harder than option 1 and doesn’t sound like it would give a lot of performance benefit, since the loading of tests will tend to be io-bound.
We could load each test only as needed (option 1) and additionally reuse agents where possible. In the case of my example, we might manage to only create 8 agents. OTOH, depending on the bitness and framework requirements of each assembly, we might need more, maybe even up to 20. However, in most cases, I suspect we could significantly reduce the number of agents. This approach would require us to record the relevant info about each agent created and maintain a list of available agents. We would need to either ignore --dispose-runners and possibly deprecate it for the future, since NUnit would need to be in charge of the agent lifecycle. On the plus side, agents already have code to determine whether they are reusable for a given package.

My own inclination would be to implement option 1 in this issue and to create another issue for the future that would expand it to use option 3.

@nunit/core-team Any thoughts on this?

@rprouse commented on Thu Jul 28 2016

I prefer option 1. I think the complexity of option 3 reduces its ROI to a point where we may not want to do it.

From my own experience, we have around 30 large test suites at work. Starting all the agents and loading the tests puts unnecessary memory pressure on the machine. If agents didn’t start until we are ready to run the tests and exited afterwards, it would greatly reduce the memory requirements and allow us to run build agents with less memory.

@Trass3r commented on Thu Jul 28 2016

I interpret 1) as agent-internal. So there would still be 20 agents running concurrently. Couldn’t you do something conceptually like

Parallel.Foreach(..., MaxDegreeOfParallelism = ...)
    StartAgent();

As @rprouse said memory consumption of all those loaded agents is also considerable.

@CharliePoole commented on Thu Jul 28 2016

Option 1 is not internal to the engine. Only option 3 requires changes to the engine.

In option 1, my example would work as follows:

Spin up eight agents, load and start running tests in each of them.
As each agent completes, create a new engine to lad and run tests for the next assembly.

If --dispose-runners were in use, then only 8 engines would be in existence at one time. Without --dispose-runners, the number would grow slowly from 8 to 20.

@rprouse Can you recall why we didn’t make --dispose-runners the default? I am suspecting it was because we hoped to be able to re-use agents at some future time.

@Trass3r commented on Thu Jul 28 2016

Ah ok. Yes that would be perfect.

@rprouse commented on Thu Jul 28 2016

@CharliePoole I think you are right about --dispose-runners. Should we make it the default? I don’t know why anyone wouldn’t want to dispose of them when they are done. I always add it to large test runs.

@CharliePoole commented on Thu Jul 28 2016

We added --dispose-runners as a result of issue #308. In your initial implementation, you didn’t have a command-line option, but you added it at my request. I think we could make it a default for the console runner but we should keep the option around to avoid breaking people’s scripts.

@CharliePoole commented on Sat Jul 30 2016

Well, there’s a little hitch in the plan. When we start a run, we are supposed to issue a start-run notification to any event listeners. That event requires a count of the number of tests to be run. It’s used by the Gui to initialize the progress bar.

For the console runner, it can probably be set to zero unless somebody is depending on it.

@nunit/core-team @nunit/contributors Can you think of any problem this would cause?

@rprouse commented on Tue Aug 02 2016

I don’t see an issue setting it to zero in the console runner, but we probably don’t want to make it the default.

@CharliePoole commented on Tue Aug 02 2016

What I’ve done in general is to use just-in-time loading on an assembly by assembly basis, unless the client actually calls ITestRunner.Load or ITestRunner.Explore first. That’s what any gui will have to do anyway. For the special case where ITestRunner.Run needs a count for it’s start event, I check whether the package has already been loaded. If not, I use the zero. This seems to take care of everything without the need for any extra setting being passed in.

@shift-evgeny commented on Thu Aug 18 2016

I tried the latest code on master and specifying --agents=3 still runs one agent.exe per assembly (8 in my case). This is a problem, because all those agent processes continue to use a lot of memory and they seriously slow down our CI server. It will only become more of a problem as we add more assemblies.

What is the point of an agent executable continuing to run when it won’t be used any more? There must be a way to limit the total number of agents running at a time (processes running, whether they’re doing anything or not) and based on a description of --agents (“NUMBER of agents that may be allowed to run simultaneously”) that’s exactly what I’d expect it to do.

@Trass3r commented on Thu Aug 18 2016

Maybe you forgot --dispose-runners? The fix worked fine for me last week.

@CharliePoole commented on Thu Aug 18 2016

We did used this as the change was in process. The option limits the number of agents active at any time, not the number of processed. I’m on the road and on my phone so I’ll come back with a bit more explanation tonight. Seven discuss workarounds and possible future changes.

@shift-evgeny commented on Thu Aug 18 2016

OK, thanks @CharliePoole.

@Trass3r --dispose-runners didn’t sound relevant to this, but I tried it anyway and it made no difference.

@rprouse commented on Thu Aug 18 2016

@CharliePoole I think I misunderstood your PR for this, or it has been broken by the subsequent change that backed out some of the changes (more likely?)

What I am seeing now with latest build of master;

Running 20 test assemblies on the command line with --agents=1 => 20 agents start up, one runs at a time, then all close at the end.

Running 20 test assemblies on the command line with --agents=1 --disposerunners => 20 agents start up, one runs at a time and closes right after it is run.

My understanding of the PR was that only X agents would start where X is specified --agents=X and --disposerunners was made the default, so I should see the same behaviour with both scenarios.

Did I misunderstand the PR, or does it look like it is broken again? Wait until you get home, no rush on this @CharliePoole 😄

@rprouse commented on Thu Aug 18 2016

My understanding was that Running 20 test assemblies on the command line with --agents=1 would start one agent, run tests, close agent, start next agent, etc. for each assembly.

@CharliePoole commented on Thu Aug 18 2016

@Trass3r Please say again which release is working OK for you.

@shift-evgeny Is the problem only with latest master or with some release?

@rprouse It was intended to work as you describe with the possible exception of making --dispose-runners the default. You and I decided that was a good idea, but I don’t remember if I implemented it. In any case, something is now broken and we need to find out if what’s broken was released.

Without yet having examined the code, here’s what I think. When I added the checks that do just-in-time loading to AbstractTestRunner, I should have overridden it in AggregatingTestRunner. We expressly do not want to make sure that the entire package is loaded for this runner. Rather, we want to delegate that decision to the subordinate runners that it aggregates.

I’ll take a look to see if that’s truly the problem.

Taking it a step further, however, if anyone is expecting for us to only create a single process and re-use it, that’s not happening. If that’s what’s wanted, we should discuss it in another issue because re-use of processes is a complicated topic and may not always be possible.

Charlie

@CharliePoole commented on Thu Aug 18 2016

I’m pretty sure it’s as I described, so I’m reopening this.

@rprouse We were going to drop this change in a 3.4.2 release, but we never did one. We could still do a release after this fix is in again, or those who need it could continue to use the myget builds.

@CharliePoole commented on Thu Aug 18 2016

Actually, I need to reopen it in the other repository!

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 27 (13 by maintainers)

Most upvoted comments

https://github.com/nunit/nunit-console/milestone/2 😄

CharliePoole on Dec 6, 2016