pydantic: Memory segfaults after V2 upgrade
Initial Checks
- I confirm that I’m using Pydantic V2
Description
Thanks for amazing project! We have been using pydantic for couple of years and it become a standard building block for our codebase.
Everything seems to work, except that once we made a change to v2 version. There has been some problems with a segfaults on production environment.
We haven’t figured out a way to reproduce it locally, to provide you more details then just logs from our production environment.
Our setup:
- flask application(latest)
- gunicorn(latest) with 4 worker using gevent
- pydantic==2.2.1
And those are segfaults we are facing on production:
segfault at 0 ip 000078b1d0ff6f1c sp 00007ffe971366c0 error 6 in libpython3.11.so.1.0[78b1d0eed000+1bb000]
segfault at 100 ip 00007ddfb64fe349 sp 00007ffc351395b0 error 4 in _pydantic_core.cpython-311-x86_64-linux-gnu.so
segfault at e4 ip 00000000000000e4 sp 00007ffc30d59a58 error 14
Example Code
No response
Python, Pydantic & OS Version
/usr/src/app# python -c "import pydantic.version; print(pydantic.version.version_info())"
pydantic version: 2.2.1
pydantic-core version: 2.6.1
pydantic-core build: profile=release pgo=true
install path: /usr/local/lib/python3.11/site-packages/pydantic
python version: 3.11.3 (main, May 23 2023, 13:34:03) [GCC 10.2.1 20210110]
platform: Linux-5.15.49-linuxkit-x86_64-with-glibc2.31
optional deps. installed: ['email-validator', 'typing-extensions']
Selected Assignee: @dmontagu
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 7
- Comments: 30 (17 by maintainers)
With the release now done in PyO3 0.21, and pydantic-core updated, I can no longer reproduce the crash on pydantic
main
. I will close this issue, hopefully people experiencing problems here can also confirm it’s fixed with pydanticmain
. We will also release this all soon as Pydantic 2.7!Ok, some progress here: I can isolate the crash to just PyO3 +
gevent
, which I’ve documented in https://github.com/PyO3/pyo3/issues/3668I will work to figure out next steps from here. We have at least one pathway to a solution (in the new PyO3 API) but maybe there are mitigations we can get across the ecosystem faster.
Yep, I’m looking into this at present and hope to have some progress within a few weeks. Will keep posted here.
We need wait for the new pyo3 API/GIL pool. That’s getting pretty close, check the progress in the pyo3 repo.
I was able to reproduce this error with @rafales’s example from #8392. Thanks so much @rafales, that’s really helpful.
@davidhewitt and I will do some further digging, specifically:
I contacted @ davidhewitt and give him all logs that I was able to collect from my project. So now all hope is that he will be able to figure it all out 🙏🏻
To follow up with the current state of things: in PyO3 we felt that mitigations are probably impractical from a performance standpoint so we are busy getting the new PyO3 API to a point where it can be used by projects to migrate. This might be a few weeks off still depending on review speed.
I was able to run valgrind on the
pydantic-core
test suite using a virtual environment on ubuntu with the following command:The contents of
valgrind-output.txt
suggested a couple memory leaks, which look like globally cached strings, so not of relevant concern here. I’ll follow up on those separately another time. Hopefully if you can repeat the same thing but replacepython -m pytest
with your command which produces the repro undergdb
, we will identify a cause of your crash. You can share any results with me confidentially over linkedin.If you’re getting a lot of messages, you might want to check if you have
/usr/lib/valgrind/python3.supp
present, I understand this is needed due to Python’s internal memory allocator.@davidhewitt I haven’t been able to figure out yet how to get more detailed logs or usual “core dumped” error (until now I believed that it is default behavior, at least in our docker environment). I already tried
faulthandler.enable()
but it gives just python traceback, no CPython or Rust code.But I’ll probably try again a little later when I have more time to debug it.
If I had to guess, the stripping is done as a linker argument via
Hey @davidhewitt ! Thank for comprehensive answer!
What we are using is python docker image. I don’t see where python build got stripped, but this is what I see on the container:
Link to docker source https://github.com/docker-library/python/blob/b7b91ef359a740a91caeabce414ce4ee70fd2b23/3.11/bookworm/Dockerfile#L44.
I might try to build custom python with your suggested flags.
In https://github.com/pydantic/pydantic-core/pull/922 I’ve run through the
unsafe
which is used inpydantic-core
and either eliminated or justified.