cysimdjson: Memory leak
I am observing a memory leak
Part of the code
metadata_parser, gamedata_parser = cysimdjson.JSONParser(), cysimdjson.JSONParser()
with lz4.frame.open(filepath) as file:
for line in file:
idx, metadata, gamedata = line.rstrip(b'\n').split(chr(31).encode())
metadata, gamedata = metadata_parser.parse(metadata), gamedata_parser.parse(gamedata)
for key, value in gamedata.at_pointer('/0/common').items():
if key not in test_data['common']:
test_data['common'][key] = []
value_type = str(type(value))
if value_type not in test_data['common'][key]:
test_data['common'][key].append(value_type)
for _, player in gamedata.at_pointer('/1').items():
for key, value in player.items():
if key not in test_data['player']:
test_data['player'][key] = []
value_type = str(type(value))
if value_type not in test_data['player'][key]:
test_data['player'][key].append(value_type)
for _, vehicles in gamedata.at_pointer('/0/vehicles').items():
vehicles = [vehicles] if isinstance(vehicles, dict) else vehicles
for vehicle in vehicles:
if isinstance(vehicle, str):
continue
for key, value in vehicle.items():
if key not in test_data['vehicle']:
test_data['vehicle'][key] = []
value_type = str(type(value))
if value_type not in test_data['vehicle'][key]:
test_data['vehicle'][key].append(value_type)
I am analyzing a large data dump, over 100gb, and memory leaks are preventing the process from completing successfully. The leak is somewhere on the C side of the extension, since profiling the python part didn’t show anything. I followed the first manual and ran valgrind
valgrind log
I can provide more information, just tell me what and how ))
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 18 (15 by maintainers)
To explain what’s happening here,
objectandPyObject [*]ultimately represent the same thing, but are handled in different ways.objectis ref counted,PyObjectis not. Casting to<object>is generally wrong, and needs close attention.yield <object> string_view_to_python_string(sv)looks like an immediate red flag to me. So lets expand it:Casting v (a
PyObject*) toobjectcreated a reference to it, and incremented the ref count. When Python is done with the object returned fromkeys(), it will decrement it by 1 and…nothing, because its ref count is now still 1.Ultimately, this is just because the signature of
string_view_to_python_stringis wrong, since it returns an owned object not a borrowed one.Is telling Cython that this method will return a borrowed reference, and:
Is telling Cython that this method will return an “owned” reference.
I will issue a PR unless you get to it first.
The fix to this should be trivial. Try this @lemire: