nodemcu-firmware: TLS Module Memory Leak ?

Expected behavior

When NodeMCU modules are used in the correct way using LFS together with the overlay table method used by @nwf to load flash components, the components does not “stick” in RAM even when components are called again over time. This results that RAM is constantly released as the lua application runs for an extended period of time.

Actual behavior

The releasing of RAM holds true for all NodeMCU modules except when using the tls module. After every use of the tls module functionality, the available RAM decreases by about 2 KB.

Test code

The nodemcu-firmware/lua_examples/lfs/HTTP_OTA.lua from @TerryE has been modified by replacing local con = net.createConnection(net.TCP,0) with local con = tls.createConnection().

The specific example appears not to display this problem when it is used without the tls functionality. This strengthens the suspicion that it it might be related to a memory leak in the tls module.

NodeMCU startup banner

NodeMCU 3.0.0.0 built with Docker provided by frightanic.com branch: dev commit: c116d9d25f5d15e2b1d0d52ff4c8ebdfd18f75c3 release: 2.0.0-master_20170202 +477 release DTS: 202003062124 SSL: true build type: float LFS: 0x20000 bytes total capacity modules: adxl345,bit,cron,crypto,file,gpio,http,i2c,net,node,ow,rtcmem,rtctime,sjson,sntp,softuart,tls,tmr,uart,wifi build 2020-03-09 11:52 powered by Lua 5.1.4 on SDK 3.0.1-dev(fce080e)

Hardware

Standard ESP-12 is used.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Thanks for your detailed feedback and explanation.

My code “grew” a bit from the nodemcu-firmware/lua_examples/lfs/HTTP_OTA.lua example that I used as a starting point. Hence the unnecessary additions in my attempt to isolate the “leak” since I mentioned this problem about a year ago. I will clean up and get rid of all the “stuff” not doing anything.

I understand and get it 👍 as explained in #3062, the fundamental problem, as brought to the surface in the lfs example code, exposes the quirk of our APIs related to C objects when callbacks are unregistered leaving the underlying C object pinned to registry out of reach of the gc. I also need to re-look upvalues for a better understanding.

I didn’t see any overt reference leaks, but I did see some dodgy corner-case behavior which I’ve now taken a stab at fixing. Could you try again with e38ce43dcfa5c3ad3f5a87e7db31e2edb03d6cfb from my for-upstream branch (which should now be part of #3060)?

While I’m flattered, I cannot take credit for the overlay table technique. As with so many things LFS, full credit is due to @TerryE.

Memory leaking, especially at that rate, doesn’t sound stellar… 2K is an awkward size to explain, too, since it’s too small to be the full TLS connection state object (with its associated buffers of much fame) but too large to be a LwIP structure or anything of that sort, I think. Would you mind repeating your experiment on top of #3060? I don’t expect that it will fix the problem, but there was some minor re-jiggling of tls (de)allocation associated with that.

Since this seems easy for you to reproduce, can you tell me if the Lua registry is growing as memory seeps away? Dumping something like for k,v in ipairs(debug.getregistry()) do print(k,v) end when you sample node.heap() is likely informative. (It’s not as simple as #debug.getregistry() because the registry has a free list plumbed through it, and so peaks of registered entities might obfuscate what’s going on.)