nodemcu-firmware: mysterious crash "already freed"

Expected behavior

Lua program does not crash because a “table has been freed” mysteriously.

Actual behavior

Lua program reports “3fff5168 already freed” (the address is arbitrary, of course) and crashes right afterwards when code tries to use a value of a table that has been freed already.

Test code

Provide a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) which will reproduce the problem. I know what you need but I cannot provide it, unfortunately. My program started crashing only recently when it grown to a certain size. Even though it’s modularized I am afraid that the crash is related to the total size.

Some examples that do crash:

local names = {}
dofile("names.lc").load(names)
local name = names[1]

The names is somehow freed right after the dofile() call so the subsequent names[1] access crashes. However, if I try to print the names address (to see if it matches the address reported before the crash) it no longer crashes and the code continues to run perfectly well!

local names = {}
dofile("names.lc").load(names)
print(names)
local name = names[1]

Yes, one single print() added to the “right” place makes the crash go away.

Another example of the same issue:

local ok, rules = pcall(sjson.decode, data)
local x = rules[1]

Again crashes because the rules has been freed. But if I insert any print() there it starts working correctly:

local ok, rules = pcall(sjson.decode, data)
print("Please do not crash!")
local x = rules[1]

I must say this makes developing with NodeMCU kinda nightmare-ish.

BTW, if I replace the print("Please do not crash") with print(node.heap()) it also “fixes” the issue and shows that there’s about 23 kB free on the heap.

Any idea why a Lua table gets freed at unexpected places for no apparent reason? And why inserting a random print() fixes it?

FWIW, replacing the “right” print() with collectgarbage() makes the crash disappear as well.

NodeMCU version

Current master.

Hardware

Stock ESP8266-07.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 30 (16 by maintainers)

Most upvoted comments

Re the FAQ, I wrote the first version about 3 years ago and it’s been through a few update cycles as I continue to learn the subtlies of Lua over the SDK. It is really too monolithic, I feel that the strict review lifecycle of our release process doesn’t suit an FAQ in my view. But LFS is going to represent a step change as does the 5.3 platform, so I will synchronise any updates with the releases.

Getting there slowly, What I done is to document my debug session (less all of the embarrassing blind alleys): Commented debug log.

I have just bashed the comments in so there are loads of typos in the commentary that need fixing. I also want to include the final reveal of the actual bug, so this needs adding to. But the important point here is that we just couldn’t do this sort of debugging 4 months ago, and having this sort of tool is pretty much essential for hunting down complex or subtle bugs.

Petr, to be honest, we’ve only just got to the point where we can sensibly debug this type of issue. I had to sort out a few gremlins that made the remote debugger not usuable in practice in order to sort out the GC issues in LFS, but the upside is that this capability is now available for other issues.