pyvista: v=0.30.1 Multiblock creation/appending is dramatically faster than v=0.32.1 when adding meshes.
A minimally reproducible example can be used below for testing.
import pyvista as pv
import numpy as np
from time import perf_counter as pf
def create_spline():
line = np.random.randint(0, 100, (20, 3))
spline = pv.Spline(line)
tube = spline.tube(radius=1)
return tube
splines = [create_spline() for _ in range(1000)]
t = pf()
block = pv.MultiBlock(splines)
mesh = block.combine()
print (pf() - t)
mesh.plot()
The choice of a tube mesh can be replaced here with any other mesh and the problem still persists.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 15 (15 by maintainers)
First take: https://github.com/pyvista/pyvista/pull/1805. We should get this right so I don’t expect it to be merged too soon. We’ll have to make sure we’re not breaking something or missing important corner cases.
To be fair I haven’t proven it yet, but it’s a very very good guess 😃
Probably not, but the fact that you’ve flagged this is already huge help for us 😃 Thank you.
Looking at the implementation of MultiBlock, there are multiple things that jump out at me. When you initialise a MultiBlock from a sequence of meshes,
.append
is called with each mesh. In turn this doesself[index] = block
, which calls__setitem__
.However
__setitem__
ends with https://github.com/pyvista/pyvista/blob/828fa349134e036efac9f47852a1bed4bf51c929/pyvista/core/composite.py#L471-L472Note that
self.refs
is a list. So first we do anO(n)
sweep over the list to see if the new data is in there. Then if it’s not in there, we callappend
. Then at the end ofMultiBlock.append
we do that again… https://github.com/pyvista/pyvista/blob/828fa349134e036efac9f47852a1bed4bf51c929/pyvista/core/composite.py#L336-L337And there has to be one more leak here:
The first two items are the mesh itself, and the third item is the mesh in a one-element tuple…
So there’s a lot of cruft in the implementation, and now that membership comparison triggers the new heavy
__eq__
instead ofobject.__eq__
which was blasing-fast based onself is other
, this is becoming obvious. We’ll have to refactorMultiBlock
to fix this.Aaaaah I know what this is!
After https://github.com/pyvista/pyvista/pull/1593:
Preceding commit:
In between
__eq__
was implemented for datasets. This recurses into the attributes of meshes, which ends up being slow if you do it a lot of times. I bet MultiBlock creation triggers this for each submesh for whatever reason.This was broken by https://github.com/pyvista/pyvista/pull/1593/. I’m not very happy about this, because that was a major PR with 2800 lines lines touched across 57 files 😰
I’ll try to figure out what’s off here. The fact that scalars are missing is a good sign I think. And in any case MultiBlocks are more of less localised.
Hold on, something’s weird here. I recreated the environment and the timing script from scratch to answer your question… and now I get 0.1 second runtime with the old setup! And I see scalars, which weren’t there before. Weird. I’m starting to think I messed up something in my environment during my first attempt… sorry about that.
For the sake of exactness:
Old setup
Runtime 0.1 second.
Newer setup, only updating pyvista:
Runtime 240 seconds.
I checked pyvista 0.31.1 as a first step of bisecting, and it’s also fast (even slightly faster than the oldest version). So this indeed on our side, thanks for finding it. I’ll track down the change that broke this.
As an aside I wouldn’t be too surprised at different runtimes, because my laptop isn’t very strong either, and it doesn’t have a discrete GPU.