htmldate: Memory leak
See issue https://github.com/adbar/trafilatura/issues/216.
Extracting the date from the same web page multiple times shows that the module is leaking memory, this doesn’t appear to be related to extensive_search
:
import os
import psutil
from htmldate import find_date
with open('test.html', 'rb') as inputf:
html = inputf.read()
for i in range(10):
result = find_date(html, extensive_search=False)
process = psutil.Process(os.getpid())
print(i, ":", process.memory_info().rss / 1024 ** 2)
tracemalloc doesn’t give any clue.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (8 by maintainers)
Commits related to this issue
- fix: memory leak in lru_cache (#56) — committed to adbar/htmldate by adbar 2 years ago
- fix: remove additional lru_cache (#56) — committed to adbar/htmldate by adbar 2 years ago
Using master, the increase is acceptable now, good job. Can’t wait for the new release.