nodemcu-firmware: HTTP module and SSL don't play nice on dev

When I happened to see another (or a new) HTTP & SSL issue the other day I went ā€œGrrr, why me again?ā€. Some of you may remember šŸ˜‰

TL;DR

Many HTTPS requests from the http module fail while connecting to the same resources with the net/TLS module usually succeeds.

Test code

function test(host, path)
  local url = "https://" .. host .. path;
  http.get(url, nil, function(code, data)
    if (code < 0) then
      print("HTTP request to " .. url .. " failed")
    else
      print("HTTP request to " .. url .. " succeeded")
    end
  end)
  local srv = tls.createConnection(net.TCP, 0)
  srv:on("receive", function(sck, c) print("net/TLS to " .. url .. " succeeded") end)
  srv:on("connection", function(sck, c)
    sck:send("GET " .. path .. " HTTP/1.1\r\nHost: " .. host .. "\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n")
  end)
  srv:connect(443, host)
end
test("raw.githubusercontent.com", "/espressif/esptool/master/MANIFEST.in")

Test result

net/TLS to https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in succeeded
HTTP client: Connection timeout

Then I checked heap: 25312. That was suspiciously low, I started with ~44k, so I ran test("raw.githubusercontent.com", "/espressif/esptool/master/MANIFEST.in") again and got

E:M 528
E:M 272
HTTP client: Disconnected with error: 46
HTTP client: Connection timeout
HTTP client: Connection timeout

-> no successful feedback from net/TLS code anymore.

Does ā€œHTTP client: Disconnected with error: 46ā€ indicate that the client was still maintaining the previous (failed) connection which it tried to kill first? I have my doubts because I sometimes also see this when the test runs after a clean reboot.

I tested a few more URLs, each after a clean reboot with both http and net modules.

URL http net/TLS
https://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.in āŒ āœ…
https://httpbin.org/ip āŒ āŒ no output at all, not even error
https://nodemcu-build.com āŒ āœ…
https://clients5.google.com/pagead/drt/dn/ āŒ āœ…

NodeMCU version

NodeMCU custom build by frightanic.com
	branch: dev
	commit: 5425adefff62f9ea2094e3e4581a79f1424e4433
	SSL: true
	modules: file,gpio,http,net,node,tmr,uart,wifi,tls
 build 	built on: 2017-01-06 19:39
 powered by Lua 5.1.4 on SDK 2.0.0(656edbf)

Hardware

NodeMCU devkit v2

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 1
  • Comments: 127 (64 by maintainers)

Most upvoted comments

I will put up a $500 bounty to get this fixed. This is really important to us. I have to believe that it is a serious issue for anyone who needs to send secure data.

I would contribute $$ to a bounty to get this fixed. Iā€™m also stuck on the 1.5.4.1 branch for my project due to this issue.

Alright, I finally found some time to throw at this. Using @djphoenixā€™s experimental tlswrap branch and a few tweaks on top (see https://github.com/djphoenix/nodemcu-firmware/tree/tlswrap and https://github.com/nwf/nodemcu-firmware/tree/dev-ssl), I was able to make progress!

I was testing HTTP using function hg(url) http.get(url, nil, function(...) print(...) end) end and TLS with

function sg(...)
  srv = tls.createConnection()
  srv:on("receive", function(sck, c) print(c) end)
  srv:on("disconnection", function() print("DISCON") end)
  srv:on("connection", function(sck, c) print("CONN") sck:send("GET / HTTP/1.1\r\nConnection: keep-alive\r\nAccept: */*\r\n\r\n") end)
  srv:connect(...)
  return srv
end

I observed the following results:

URL HTTP tls
https://blog.cloudflare.com HTTP response too long OK
https://pskreporter.info SSL bad message length OK
https://pskreporter.info/gen404 OK n/a
https://google.com SSL bad message length error on hangup
https://nodemcu-build.com SSL bad message length, crash OK

Getting ā€œhttps://raw.githubusercontent.com/espressif/esptool/master/MANIFEST.inā€ also works fine.

The error on hangup for google.com is really quite distressing:

> srv:close()
> HANDSHAKE ERROR: 7100
3fff8618 already freed
3fff2b48 already freed

It looks like TLS is behaving better, but not perfectly so. Despite my efforts to limit the maximum SSL segment size, it looks like that is not being respected by the HTTP module (via espconnā€™s wrapper of mbedtls). There are obviously some other problems lingering, too; it looks like tlswrapā€™s memory management is a little, uh, iffy.

@dtran123 is https working in 1.5.4.1?

//offtopic Would be ok if we create a Telegram group to share experiences with nodemcu, stay tuned with the last news, solve some problems, get interesting info about this projectā€¦ Is anyone interested in? Just like this post and I will edit with some link.

Telegram NodeMcu Group: @NodeMcu

Just to let everyone know that unfortunately the upgrade to SDK 2.1.0 did not fix this secure tcp socket issue. So I am still stuck with 1.5.4.1 branch till this issue is resolved as secure tcp is important for me.

Hey guys, I will also contribute to the bounty with a 50$ USD for anyone who can solve this issueā€¦though I am hoping that maybe the next SDK will address it. Refer to issue #1810.

@kirkryan If you need sunrise/sunset times why donā€™t you let the ESP calculate it itself? I have a module for this https://github.com/vsky279/nodemcu-firmware/blob/sun/app/modules/sun.c I did not try to pull it into dev but if there was interest I could prepare a PR.

I have encountered the same problem - http requests go through just fine, but https do not. Sorry itā€™s not helpful, I just want to bump this thread and to let everyone know itā€™s still a problem. I use the 2.1.0 build.

Wow @Jonathan411 thatā€™s quite a generous bounty! I canā€™t afford that much right now, but Iā€™d be willing to chip in 0.035 BTC (~$40 USD) to the developer that gets a PR merged that fixes this issue.

@djphoenix Have you found something interesting?

  • NodeMCU 2.1.0 built with Docker provided by frightanic.com
  • branch: dev
  • SSL: true

If the browser accesses

https://api.telegram.org/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello!

then the response comes quickly.

If the request from NodeMCU

http.get(text_req, nil, function(code, data)
                if (code < 0) then 
                    print("telegram bot failed")
                    print(code, data) 
                else 
                    print(code, data) 
                end
            end) 

then

HTTP client: hostname=api.telegram.org HTTP client: port=443 HTTP client: method=GET HTTP client: path=/bot446398269:AAF7jqBWZhj9oJlWT2f8mCS8_X5_LxJGLRA/sendMessage?chat_id=211698604&text=Hello! HTTP client: DNS request HTTP client: DNS pending HTTP client: DNS found api.telegram.org 149.154.167.220 HTTP client: Connection timeout HTTP client: Calling disconnect HTTP client: manually Calling disconnect callback due to error -12 HTTP client: Disconnected

How to search for this error?

Hello everyone. Thanks for the tests of my dev-ssl branch; Iā€™m glad to see things are better.

Iā€™m currently away from my ESP8266s, but real soon now Iā€™m going to push another version that bumps to mbedtls 2.6.1 and so should fix some CVEs in our TLS layer. Iā€™ll carve out PRs for

  • the mbedtls updates and some improvements to mbedtls debugging
  • the bump of MBEDTLS_SSL_MAX_CONTENT_LEN to 5120
  • general tidying around the tree

Those should be non-controversial and, as @dtran123 indicates, are likely to improve matters sufficiently for most peopleā€™s use cases. That leaves open

  • the possibility of dynamically setting MBEDTLS_SSL_MAX_CONTENT_LEN
  • the requisite work of integrating rtctime/sntp support into mbedtls to better support certificate verification.

and, of course, a set of scripts people can run to test TLS functionality in isolation. Itā€™d be nice to have a list of services people are likely to care about (several of which are mentioned in this issue) and tests for each in various configurations (esp. varying values of MBEDTLS_SSL_MAX_CONTENT_LEN).

These last three are probably non-blocking.

  • People can do their own testing as it stands.
  • It should be OK to disable timestamp verification, as most people probably know what certs, or at least, root certs, they are expecting to see, if they turn on verification. (We should probably scream if they leave it off, thoā€™, as thereā€™s minimal utility in such an unverified TLS connection.)
  • 5120 is probably a more generally tolerable default than we have now and I donā€™t think Iā€™d want it to be much bigger anyway.

Happy to report that setting MBEDTLS_SSL_MAX_CONTENT_LEN = 5120 did the trick !!! I now can connect to IFTTT as well as mnubo.com API. Awesome !

I feel like I will have more fun now with the ESP8266. I will definitely keep an eye on this type of error when secured connection fail. I agree with @nwf that there should be a log when we have busted the limit.

In my opinion, if there was a way to change this constant on the fly (or taking effect after a reboot), it will allow people to allocate enough buffer size as per their specific needs. A note in the documentation would indicate the inherent tradeoff by doing that with reduced memory for the lua app.

@kirkryan If you are equipped to build your own firmware images, you can try my experimental branch as mentioned in #2049 and peer into the TLS state machine. It is possible that the bumped version of mbedtls will solve the problem for you, but do note that since @djphoenix did his import of the later version, CVEs have been found in mbedtls which, I think, impact nodemcuā€™s usage of it. Nobody, so far as I know, has attempted further updates.

I have not had much time, but I spent a(n) (un)pleasant evening digging through debug messages and TLS packets. If anyone wants to follow along, my tree is up as commit nwf/nodemcu-firmware@d7f83d3305225675c5543c9a4055b165362b61d6 and the several proceeding. Some of those may be worth carving out as separate PRs, especially the ā€œ-Wall reductionā€ one. (It is merely a reduction; the tree is not -Wall clean afterwards.) Please expect this tree to be ephemeral and rewritten as I find time.

ETA: The reason for all the munging about with clocks is that I am now able to verify the timestamp on my test ECDSA self-signed cert. Thatā€™s a kind of progress, I guess. Itā€™s still fragile, thoā€™.

@nwf Were you able to make progress this past week ? Also, would it be possible to send us a link to the latest custom build so we can try it out. thx.

@djphoenix Hi! Yeah, I see that. I think there is utility in keeping TLSā€™s utilization as small as possible, but perhaps the docs should note that the implementation in nodemcu is not fully spec conforming (by default, unless one raises the MBEDTLS_SSL_MAX_CONTENT_LEN) and should, therefore, not be assumed to be able to communicate with arbitrary servers. Fortunately, it doesnā€™t seem like nodemcu is often asked to fetch from servers unknown to the author, so perhaps this is an acceptable compromise. Perhaps a maximum fragment length overrun is also cause to scream a little louder on the console?

In any case, I will throw some more time at this next week; ideally testing will become automated and will grow to include MQTT (at least, against my own broker) and cert verification. šŸ˜ƒ

Given the memory handling errors in tlswrap, I do not think it should be merged as is. I will attempt to understand and fix it, unless @djphoenix beats me to it. šŸ˜ƒ

ETA: That said, the motion to the newer mbedtls might be sufficiently orthogonal that it could be merged independently. When I get a minute Iā€™ll test with that configuration too.

I have had a reply from Espressif - the bug I hit is known, and they recommend using the mbedTLS library instead of SSL. Apparently itā€™s a drop in replacement for the ESP functions - remove libssl from your list and add mbedtls

http://www.espressif.com/en/support/download/sdks-demos

In all of that letā€™s not forget that actually verifying the server certificate is optional in establishing a connection to the server. Itā€™s an important security aspect but optional. So, I suggest we stick with the old ā€œmake it run, then make it secure, make it fast, and make it prettyā€ mantra šŸ˜‰ As for SNI, yes axTLS doesnā€™t support SNI but since Espressif switched to mbed TLS one should expect this to work fine.

So I have something working now (for tls module only). And I finally resolved why httpbin.org cannot be loaded - it doesnā€™t work at all without SNI. A new module have fix for this.

For early pre-alpha source see my tlswrap branch

@jmattsson there are debugging flags in user_mbedtls.h so you can enable each log level that you want. Canā€™t explain it carefully right now. So I suspect there are fatal architecture issues in http module because itā€™s ā€œdonā€™t play niceā€ for plain requests too. I can debug it carefully some later (maybe in this weekend).