pydantic: Memory segfaults after V2 upgrade

Initial Checks

I confirm that I’m using Pydantic V2

Description

Thanks for amazing project! We have been using pydantic for couple of years and it become a standard building block for our codebase.

Everything seems to work, except that once we made a change to v2 version. There has been some problems with a segfaults on production environment.

We haven’t figured out a way to reproduce it locally, to provide you more details then just logs from our production environment.

Our setup:

flask application(latest)
gunicorn(latest) with 4 worker using gevent
pydantic==2.2.1

And those are segfaults we are facing on production:

segfault at 0 ip 000078b1d0ff6f1c sp 00007ffe971366c0 error 6 in libpython3.11.so.1.0[78b1d0eed000+1bb000]

segfault at 100 ip 00007ddfb64fe349 sp 00007ffc351395b0 error 4 in _pydantic_core.cpython-311-x86_64-linux-gnu.so

segfault at e4 ip 00000000000000e4 sp 00007ffc30d59a58 error 14

Example Code

No response

Python, Pydantic & OS Version

/usr/src/app# python -c "import pydantic.version; print(pydantic.version.version_info())"
             pydantic version: 2.2.1
        pydantic-core version: 2.6.1
          pydantic-core build: profile=release pgo=true
                 install path: /usr/local/lib/python3.11/site-packages/pydantic
               python version: 3.11.3 (main, May 23 2023, 13:34:03) [GCC 10.2.1 20210110]
                     platform: Linux-5.15.49-linuxkit-x86_64-with-glibc2.31
     optional deps. installed: ['email-validator', 'typing-extensions']

Selected Assignee: @dmontagu

About this issue

Original URL
State: closed
Created 10 months ago
Reactions: 7
Comments: 30 (17 by maintainers)

Most upvoted comments

With the release now done in PyO3 0.21, and pydantic-core updated, I can no longer reproduce the crash on pydantic main. I will close this issue, hopefully people experiencing problems here can also confirm it’s fixed with pydantic main. We will also release this all soon as Pydantic 2.7!

davidhewitt on Mar 26, 2024

Ok, some progress here: I can isolate the crash to just PyO3 + gevent, which I’ve documented in https://github.com/PyO3/pyo3/issues/3668

I will work to figure out next steps from here. We have at least one pathway to a solution (in the new PyO3 API) but maybe there are mitigations we can get across the ecosystem faster.

davidhewitt on Dec 19, 2023

Yep, I’m looking into this at present and hope to have some progress within a few weeks. Will keep posted here.

davidhewitt on Nov 7, 2023

We need wait for the new pyo3 API/GIL pool. That’s getting pretty close, check the progress in the pyo3 repo.

samuelcolvin on Feb 19, 2024

I was able to reproduce this error with @rafales’s example from #8392. Thanks so much @rafales, that’s really helpful.

@davidhewitt and I will do some further digging, specifically:

see if we get the error with pure pyo3+gevent
see if we get the error with the new PyO3 API in https://github.com/pydantic/pydantic-core/pull/1085 + gevent

samuelcolvin on Dec 18, 2023

I contacted @ davidhewitt and give him all logs that I was able to collect from my project. So now all hope is that he will be able to figure it all out 🙏🏻

bogdandm on Nov 7, 2023

To follow up with the current state of things: in PyO3 we felt that mitigations are probably impractical from a performance standpoint so we are busy getting the new PyO3 API to a point where it can be used by projects to migrate. This might be a few weeks off still depending on review speed.

davidhewitt on Jan 8, 2024

I was able to run valgrind on the pydantic-core test suite using a virtual environment on ubuntu with the following command:

valgrind --leak-check=full --track-origins=yes --log-file=valgrind-output.txt python -m pytest

The contents of valgrind-output.txt suggested a couple memory leaks, which look like globally cached strings, so not of relevant concern here. I’ll follow up on those separately another time. Hopefully if you can repeat the same thing but replace python -m pytest with your command which produces the repro under gdb, we will identify a cause of your crash. You can share any results with me confidentially over linkedin.

If you’re getting a lot of messages, you might want to check if you have /usr/lib/valgrind/python3.supp present, I understand this is needed due to Python’s internal memory allocator.

davidhewitt on Oct 27, 2023

@davidhewitt I haven’t been able to figure out yet how to get more detailed logs or usual “core dumped” error (until now I believed that it is default behavior, at least in our docker environment). I already tried faulthandler.enable() but it gives just python traceback, no CPython or Rust code.

But I’ll probably try again a little later when I have more time to debug it.

bogdandm on Oct 24, 2023

If I had to guess, the stripping is done as a linker argument via

    LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \

davidhewitt on Oct 10, 2023

Hey @davidhewitt ! Thank for comprehensive answer!

What we are using is python docker image. I don’t see where python build got stripped, but this is what I see on the container:

/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1466a54058de9be791ef96f61c8e185388684eb, for GNU/Linux 3.2.0, stripped

Link to docker source https://github.com/docker-library/python/blob/b7b91ef359a740a91caeabce414ce4ee70fd2b23/3.11/bookworm/Dockerfile#L44.

I might try to build custom python with your suggested flags.

StasEvseev on Oct 10, 2023

In https://github.com/pydantic/pydantic-core/pull/922 I’ve run through the unsafe which is used in pydantic-core and either eliminated or justified.

davidhewitt on Aug 24, 2023