pydantic: Memory segfaults after V2 upgrade

Initial Checks

  • I confirm that I’m using Pydantic V2

Description

Thanks for amazing project! We have been using pydantic for couple of years and it become a standard building block for our codebase.

Everything seems to work, except that once we made a change to v2 version. There has been some problems with a segfaults on production environment.

We haven’t figured out a way to reproduce it locally, to provide you more details then just logs from our production environment.

Our setup:

  • flask application(latest)
  • gunicorn(latest) with 4 worker using gevent
  • pydantic==2.2.1

And those are segfaults we are facing on production:

segfault at 0 ip 000078b1d0ff6f1c sp 00007ffe971366c0 error 6 in libpython3.11.so.1.0[78b1d0eed000+1bb000]
segfault at 100 ip 00007ddfb64fe349 sp 00007ffc351395b0 error 4 in _pydantic_core.cpython-311-x86_64-linux-gnu.so
segfault at e4 ip 00000000000000e4 sp 00007ffc30d59a58 error 14

Example Code

No response

Python, Pydantic & OS Version

/usr/src/app# python -c "import pydantic.version; print(pydantic.version.version_info())"
             pydantic version: 2.2.1
        pydantic-core version: 2.6.1
          pydantic-core build: profile=release pgo=true
                 install path: /usr/local/lib/python3.11/site-packages/pydantic
               python version: 3.11.3 (main, May 23 2023, 13:34:03) [GCC 10.2.1 20210110]
                     platform: Linux-5.15.49-linuxkit-x86_64-with-glibc2.31
     optional deps. installed: ['email-validator', 'typing-extensions']

Selected Assignee: @dmontagu

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Reactions: 7
  • Comments: 30 (17 by maintainers)

Most upvoted comments

With the release now done in PyO3 0.21, and pydantic-core updated, I can no longer reproduce the crash on pydantic main. I will close this issue, hopefully people experiencing problems here can also confirm it’s fixed with pydantic main. We will also release this all soon as Pydantic 2.7!

Ok, some progress here: I can isolate the crash to just PyO3 + gevent, which I’ve documented in https://github.com/PyO3/pyo3/issues/3668

I will work to figure out next steps from here. We have at least one pathway to a solution (in the new PyO3 API) but maybe there are mitigations we can get across the ecosystem faster.

Yep, I’m looking into this at present and hope to have some progress within a few weeks. Will keep posted here.

We need wait for the new pyo3 API/GIL pool. That’s getting pretty close, check the progress in the pyo3 repo.

I was able to reproduce this error with @rafales’s example from #8392. Thanks so much @rafales, that’s really helpful.

@davidhewitt and I will do some further digging, specifically:

I contacted @ davidhewitt and give him all logs that I was able to collect from my project. So now all hope is that he will be able to figure it all out 🙏🏻

To follow up with the current state of things: in PyO3 we felt that mitigations are probably impractical from a performance standpoint so we are busy getting the new PyO3 API to a point where it can be used by projects to migrate. This might be a few weeks off still depending on review speed.

I was able to run valgrind on the pydantic-core test suite using a virtual environment on ubuntu with the following command:

valgrind --leak-check=full --track-origins=yes --log-file=valgrind-output.txt python -m pytest

The contents of valgrind-output.txt suggested a couple memory leaks, which look like globally cached strings, so not of relevant concern here. I’ll follow up on those separately another time. Hopefully if you can repeat the same thing but replace python -m pytest with your command which produces the repro under gdb, we will identify a cause of your crash. You can share any results with me confidentially over linkedin.

If you’re getting a lot of messages, you might want to check if you have /usr/lib/valgrind/python3.supp present, I understand this is needed due to Python’s internal memory allocator.

@davidhewitt I haven’t been able to figure out yet how to get more detailed logs or usual “core dumped” error (until now I believed that it is default behavior, at least in our docker environment). I already tried faulthandler.enable() but it gives just python traceback, no CPython or Rust code.

But I’ll probably try again a little later when I have more time to debug it.

If I had to guess, the stripping is done as a linker argument via

    LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \

Hey @davidhewitt ! Thank for comprehensive answer!

What we are using is python docker image. I don’t see where python build got stripped, but this is what I see on the container:

/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1466a54058de9be791ef96f61c8e185388684eb, for GNU/Linux 3.2.0, stripped

Link to docker source https://github.com/docker-library/python/blob/b7b91ef359a740a91caeabce414ce4ee70fd2b23/3.11/bookworm/Dockerfile#L44.

I might try to build custom python with your suggested flags.

In https://github.com/pydantic/pydantic-core/pull/922 I’ve run through the unsafe which is used in pydantic-core and either eliminated or justified.