cbor2: unexpected exceptions raised while parsing untrusted inputs using cbor2.loads
Things to check first
-
I have searched the existing issues and didn’t find my bug already reported there
-
I have checked that my bug is still present in the latest release
cbor2 version
5.5.1
Python version
3.10.12
What happened?
I have a script which is parsing untrusted data using the cbor2.loads
method. This script is trying to verify if the provided data is cbor encoded.
The implementation was as follows:
try: cbor2.loads(b'\x959;{{{{{{{{{{{{{') except CBORDecodeError: print('no cbor encoded')
For some inputs, I’ve noticed that MemoryError
is raised instead of CBORDecodeError
.
To better understand the problem and ensure that this is only one strange case while parsing untrusted data I’ve run fuzzer against cbor2.loads method.
It seems that the cbor2.loads
method is not able to parse untrusted data properly - in the worst case cbor2 is trying to allocate the whole memory - ref to the `MemoryError’ case presented in the code above.
I was able to find following exceptions raised by cbor2.loads (all reproduced using cbor2 5.5.1/python 3.10.12/Ubuntu 20.4):
# OverflowError: timestamp out of range for platform time_t
cbor2.loads(b'\xc1\x1b\x9b\x9b\x9b\x00\x00\x00\x00\x00')
# OSError: OSError: [Errno 75] Value too large for defined data type
cbor2.loads(b'\xc1\x1b\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16')
# MemoryError:
cbor2.loads(b'\x959;{{{{{{{{{{{{{')
# TypeError: object of type 'int' has no len()
cbor2.loads(b'\xd8%\x00\x10`\x00\x00\x00`\x10\x00\x00\x00\x00\x00\x00')
# SystemError: <built-in function loads> returned NULL without setting an error
cbor2.loads(b'\xd8\x1e\x84\xff\xff\xff\xff')
# re.error: unbalanced parenthesis at position 0
cbor2.loads(b'\xd8#A)')
# UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
cbor2.loads(b'b\n\xff')
I was trying to analyze how it could be improved but it is not an easy task for somebody who does not maintain this code. Is it possible to improve it somehow?
The expected and ideal solution would be to have CBORDecodeError
raised in case of not valid input cbor data.
How can we reproduce the bug?
Code to reproduce mentioned exceptions:
import cbor2
cbor2.loads(b'\xc1\x1b\x9b\x9b\x9b\x00\x00\x00\x00\x00')
cbor2.loads(b'\xc1\x1b\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16\x16')
cbor2.loads(b'\x959;{{{{{{{{{{{{{')
cbor2.loads(b'\xd8%\x00\x10`\x00\x00\x00`\x10\x00\x00\x00\x00\x00\x00')
cbor2.loads(b'\xd8\x1e\x84\xff\xff\xff\xff')
cbor2.loads(b'\xd8#A)')
cbor2.loads(b'b\n\xff')
cbor2 has been testes using atheris fuzzer and the following code:
import sys
import atheris
import pprint
with atheris.instrument_imports():
import cbor2
EXCEPTIONS = {}
pp = pprint.PrettyPrinter(indent=4)
def fuzz_cbor2(data):
try:
cbor2.loads(data)
except cbor2.CBORError:
# CBORError is expected for some data
pass
except Exception as e:
if type(e) not in EXCEPTIONS:
EXCEPTIONS[type(e)] = data.hex()
print(f"Found new exception {e}")
print("************** status *************")
pp.pprint(EXCEPTIONS)
if __name__ == "__main__":
atheris.Setup(sys.argv, fuzz_cbor2)
atheris.Fuzz()
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 27 (27 by maintainers)
I’ve fixed the problems originally reported here. I believe that, to fix all the problems thoroughly, a rewrite would be needed, but I don’t have the bandwidth for that, and I have to draw the line somewhere in order to move on to other projects. I’ve released v5.6.0 which contains these fixes.
Good question, I would suspect that downstream consumers of this library would want to know if they’re running a potentially insecure version (i.e. <
v5.6.0
). The easiest way to accomplish that would probably be issuing a security advisory, which would then automatically be picked up by tools like Dependabot and Snyk 👍Great minds think alike! I was actually fuzzing
cbor2
with Atheris very recently too. However, I focused on the C implementation and looked for memory corruption bugs. I did manage to find at least one, and there may be more. I would recommend setting up regular fuzz testing for this project. Here’s what I came up with.My
Dockerfile
for reproduction (note some paths may have to change, likeaarch64
):And my fuzz harness:
Build, then run the Docker image:
This then produces a crash like:
Which we can confirm like so:
This appears at the following location:
Which seems to be this code:
https://github.com/agronholm/cbor2/blob/850545ca33c1541de397ef2e6c6e1af221d4a0f8/source/decoder.c#L653
I’m not sure about exploitability here. Memory corruption in C code has more potential for exploitation than Python exceptions. I also did notice this big warning in the
Py_DECREF
docs. I’m not sure if that’s applicable in this situation, but again, it’s cause for concern.The second largest concern is that
MemoryError
, as it has the potential for a DoS attack.The most concerning error is that
SystemError
that says it returned NULL without setting an error. This looks like a bug in the C decoder implementation.