pyvista: Pickle Crashes for Large Data

Describe the bug, what’s wrong, and what you expected.

pickle crashes for large data.

Steps to reproduce the bug.

import pickle

import numpy as np
import pyvista as pv
from vtkmodules.vtkIOLegacy import vtkDataSetWriter


def pickle_vtk(mesh, filename):
    writer = vtkDataSetWriter()
    writer.SetInputDataObject(mesh)
    writer.SetWriteToOutputString(True)
    writer.SetFileTypeToBinary()
    writer.Write()
    to_serialize = writer.GetOutputString()

    with open(filename, 'wb') as handle:
        pickle.dump(to_serialize, handle, protocol=pickle.HIGHEST_PROTOCOL)

    return filename


if __name__ == '__main__':
    dims = (2154, 1500, 1167)

    volume = pv.UniformGrid(
        dims=(2154, 1500, 1167),
        spacing=(1, 1, 1),
        origin=(0, 0, 0),
    )

    volume.point_data['scalars'] = np.zeros(
        shape=(dims[0] * dims[1] * dims[2],), dtype=np.uint8
    )

    fname = 'filename.vtkpickle'
    pickle_vtk(volume, fname)  # Crashes

However, I have no problem pickling the numpy array by itself to file (this is 3.7 GB).

System Information

--------------------------------------------------------------------------------
  Date: Sun Sep 04 11:59:36 2022 Pacific Daylight Time

                OS : Windows
            CPU(s) : 96
           Machine : AMD64
      Architecture : 64bit
               RAM : 190.7 GiB
       Environment : Python
        GPU Vendor : NVIDIA Corporation
      GPU Renderer : Quadro RTX 8000/PCIe/SSE2
       GPU Version : 4.5.0 NVIDIA 516.40

  Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64
  bit (AMD64)]

           pyvista : 0.37.dev0
               vtk : 9.1.0
             numpy : 1.23.2
           imageio : 2.21.2
           appdirs : 1.4.4
            scooby : 0.5.12
        matplotlib : 3.5.3
         pyvistaqt : 0.9.0
           IPython : 8.4.0
              tqdm : 4.64.0
            meshio : 5.3.4
--------------------------------------------------------------------------------

Screenshots

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 19 (19 by maintainers)

Most upvoted comments

To focus on the issue at hand (as in pyvista issue), I’d like to recap:

  1. the snippet in the original comment here crashes for @adam-grant-hendry on windows
  2. it runs fine for @whophil on a system with much less memory (“runs fine” meaning writer.Writer() returns successfully).
  3. since all of this is VTK, this is a potential VTK issue, and independent from pickling.

If we all agree, my recommendation would be to reduce the example to remove pyvista (by using a native VTK grid) and remove pickling (which is a red herring) and if the code still crashes for Adam and still doesn’t crash for Phil, compare the two setups (OS, VTK version mainly) and open an issue with VTK. They will be able to tell whether what we see is expected (due to some memory management quirk) or a bug.

This will also affect PyVista through our pickling mechanism, but as long as we rely on these VTK writers we’ll have to wait for such potential bugs to be fixed upstream.

I’m struggling to find what’s going on in VTK, for what it’s worth. vtkDataSetWriter only seems to have a WriteData() method, but no Write()

vtkDataSetWriter.Write() is inherited from vtkWriter (see https://vtk.org/doc/nightly/html/classvtkDataSetWriter-members.html)

@whophil, do you not have the power to close issues? Because you ought to have that

This is closed with #3286

Thanks @adam-grant-hendry. That snippet now fails on my machine after 2 minutes or so of trying. Somehow I was able to write a 3.5 GB file yesterday, but I don’t know what exactly is different now.

If I tell the vtkDataSetWriter to write to file instead of string, the file it produces is 28 GB. This doesn’t seem right.

Finally, if I use the appropriate XML writer, I am able to get past writer.Write()

    from vtkmodules.vtkIOXML import vtkXMLImageDataWriter
    writer = vtkXMLImageDataWriter()
    writer.SetInputDataObject(volume)
    writer.SetDataModeToBinary()
    writer.SetCompressionLevel(0)
    writer.SetWriteToOutputString(True)
    writer.Write()
    to_serialize = writer.GetOutputString()

I wonder whether the return of GetOutputString() will be useful, or if the VTK XML writers will need to modified to implement GetOutputStdString.

@adeak I’m in agreement. I’ve modified the snippet to be just VTK 9.1 without pickling, and it still crashes for me (to reiterate, I’m on Windows 10 x64-bit):

from vtkmodules.vtkCommonCore import vtkIntArray
from vtkmodules.vtkCommonDataModel import vtkImageData
from vtkmodules.vtkIOLegacy import vtkDataSetWriter

if __name__ == '__main__':
    points = vtkIntArray()
    points.SetName('points')
    points.SetNumberOfComponents(1)

    # 4-bytes per int * 2154 * 1500 * 1167 ~= 15 GB
    points.SetNumberOfTuples(2154 * 1500 * 1167)

    volume = vtkImageData()
    volume.SetOrigin(0, 0, 0)
    volume.SetSpacing(1, 1, 1)
    volume.SetDimensions(2154, 1500, 1167)

    volume.GetPointData().SetScalars(points)

    writer = vtkDataSetWriter()
    writer.SetInputDataObject(volume)
    writer.SetWriteToOutputString(True)
    writer.SetFileTypeToBinary()
    writer.Write()
    to_serialize = writer.GetOutputString()

    print('Finished!')
    print(to_serialize)

Starting at 42.5 GB RAM

image

the program ramps up memory at writer.Write() to 58.5 GB

image

and then abruptly ends without warnings/error and without printing ‘Finished!’ (ends at writer.Write())

image

The amount of memory it consumes is about 16 GB (58.5 - 42.5 = 16), but it just stops…

Having taken a closer look at this issue (with more information than the other comments I’ve been pinged on), let me amend my previous comment:

  1. I now see this is a Windows machine, so OOM killing doesn’t enter the picture. I’m not familiar with Windows, but my vague understanding is that processes can still be terminated if they request more memory than available.
  2. I now also see that you have 190 GB of RAM. Is that all available? Or is the free memory considerably smaller?

Thanks @MatthewFlamm for the pointer here.

Some information we were given in https://github.com/pyvista/pyvista/issues/1768#issuecomment-1236582728:

There is no segfault, exception, or traceback. It simply stops running.

Considering also my experience it’s possible (likely?) that the “stops running” is the OOM killer making short work of the process with runaway memory need.

I don’t know what the writer tries to write, but the grid of this size has 28 GB worth of point coordinates (assuming doubles). So if there’s an attempt to instantiate all the points at one point this will inflate the memory footprint.

Is the script also killed if you leave out the call to pickle.dump? And how much memory does your machine have?