pyvista: Pickle Crashes for Large Data
Describe the bug, what’s wrong, and what you expected.
pickle
crashes for large data.
Steps to reproduce the bug.
import pickle
import numpy as np
import pyvista as pv
from vtkmodules.vtkIOLegacy import vtkDataSetWriter
def pickle_vtk(mesh, filename):
writer = vtkDataSetWriter()
writer.SetInputDataObject(mesh)
writer.SetWriteToOutputString(True)
writer.SetFileTypeToBinary()
writer.Write()
to_serialize = writer.GetOutputString()
with open(filename, 'wb') as handle:
pickle.dump(to_serialize, handle, protocol=pickle.HIGHEST_PROTOCOL)
return filename
if __name__ == '__main__':
dims = (2154, 1500, 1167)
volume = pv.UniformGrid(
dims=(2154, 1500, 1167),
spacing=(1, 1, 1),
origin=(0, 0, 0),
)
volume.point_data['scalars'] = np.zeros(
shape=(dims[0] * dims[1] * dims[2],), dtype=np.uint8
)
fname = 'filename.vtkpickle'
pickle_vtk(volume, fname) # Crashes
However, I have no problem pickling the numpy array by itself to file (this is 3.7 GB).
System Information
--------------------------------------------------------------------------------
Date: Sun Sep 04 11:59:36 2022 Pacific Daylight Time
OS : Windows
CPU(s) : 96
Machine : AMD64
Architecture : 64bit
RAM : 190.7 GiB
Environment : Python
GPU Vendor : NVIDIA Corporation
GPU Renderer : Quadro RTX 8000/PCIe/SSE2
GPU Version : 4.5.0 NVIDIA 516.40
Python 3.8.10 (tags/v3.8.10:3d8993a, May 3 2021, 11:48:03) [MSC v.1928 64
bit (AMD64)]
pyvista : 0.37.dev0
vtk : 9.1.0
numpy : 1.23.2
imageio : 2.21.2
appdirs : 1.4.4
scooby : 0.5.12
matplotlib : 3.5.3
pyvistaqt : 0.9.0
IPython : 8.4.0
tqdm : 4.64.0
meshio : 5.3.4
--------------------------------------------------------------------------------
Screenshots
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 19 (19 by maintainers)
To focus on the issue at hand (as in pyvista issue), I’d like to recap:
writer.Writer()
returns successfully).If we all agree, my recommendation would be to reduce the example to remove pyvista (by using a native VTK grid) and remove pickling (which is a red herring) and if the code still crashes for Adam and still doesn’t crash for Phil, compare the two setups (OS, VTK version mainly) and open an issue with VTK. They will be able to tell whether what we see is expected (due to some memory management quirk) or a bug.
This will also affect PyVista through our pickling mechanism, but as long as we rely on these VTK writers we’ll have to wait for such potential bugs to be fixed upstream.
vtkDataSetWriter.Write()
is inherited fromvtkWriter
(see https://vtk.org/doc/nightly/html/classvtkDataSetWriter-members.html)@whophil, do you not have the power to close issues? Because you ought to have that
This is closed with #3286
Thanks @adam-grant-hendry. That snippet now fails on my machine after 2 minutes or so of trying. Somehow I was able to write a 3.5 GB file yesterday, but I don’t know what exactly is different now.
If I tell the
vtkDataSetWriter
to write to file instead of string, the file it produces is 28 GB. This doesn’t seem right.Finally, if I use the appropriate XML writer, I am able to get past
writer.Write()
I wonder whether the return ofGetOutputString()
will be useful, or if the VTK XML writers will need to modified to implementGetOutputStdString
.@adeak I’m in agreement. I’ve modified the snippet to be just
VTK 9.1
without pickling, and it still crashes for me (to reiterate, I’m on Windows 10 x64-bit):Starting at 42.5 GB RAM
the program ramps up memory at
writer.Write()
to 58.5 GBand then abruptly ends without warnings/error and without printing ‘Finished!’ (ends at
writer.Write()
)The amount of memory it consumes is about 16 GB (58.5 - 42.5 = 16), but it just stops…
Having taken a closer look at this issue (with more information than the other comments I’ve been pinged on), let me amend my previous comment:
Thanks @MatthewFlamm for the pointer here.
Some information we were given in https://github.com/pyvista/pyvista/issues/1768#issuecomment-1236582728:
Considering also my experience it’s possible (likely?) that the “stops running” is the OOM killer making short work of the process with runaway memory need.
I don’t know what the writer tries to write, but the grid of this size has 28 GB worth of point coordinates (assuming doubles). So if there’s an attempt to instantiate all the points at one point this will inflate the memory footprint.
Is the script also killed if you leave out the call to
pickle.dump
? And how much memory does your machine have?