htmldate: Memory leak

See issue https://github.com/adbar/trafilatura/issues/216.

Extracting the date from the same web page multiple times shows that the module is leaking memory, this doesn’t appear to be related to extensive_search:

import os
import psutil
from htmldate import find_date

with open('test.html', 'rb') as inputf:
    html = inputf.read()

for i in range(10):
    result = find_date(html, extensive_search=False)
    process = psutil.Process(os.getpid())
    print(i, ":", process.memory_info().rss / 1024 ** 2)

tracemalloc doesn’t give any clue.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Using master, the increase is acceptable now, good job. Can’t wait for the new release.