onnxruntime: Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash
Describe the bug I’m currently migrating a service deployed as a serverless function on AWS Lambda to the new ARM64 Graviton2 processor. Importing onnxruntime throws a cpuinfo error and crashes the code with the following messages:
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
--
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /onnxruntime_src/onnxruntime/core/common/cpuid_info.cc:62 onnxruntime::CPUIDInfo::CPUIDInfo() Failed to initialize CPU info.
The files /sys/devices/system/cpu/possible and /sys/devices/system/cpu/present don’t exist and apparently this causes the crash. Is this expected behaviour? I’m not sure how to proceed. Is onnxruntime currently not supported by Graviton2 processors? The contents of /proc/cpuinfo are as follows:
processor : 0
--
BogoMIPS : 243.75
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
processor : 1
BogoMIPS : 243.75
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux (AWS Lambda python runtime)
- ONNX Runtime installed from (source or binary): binary (with pip)
- ONNX Runtime version: 1.10.0
- Python version: 3.8.5
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 4
- Comments: 44 (11 by maintainers)
Commits related to this issue
- python3Packages.invisible-watermark: add tests After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make ... — committed to Luflosi/nixpkgs by Luflosi 9 months ago
- python3Packages.invisible-watermark: add tests After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make ... — committed to Noodlez1232/nixpkgs by Luflosi 9 months ago
- python3Packages.invisible-watermark: add tests After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make ... — committed to DrymarchonShaun/nixpkgs by Luflosi 9 months ago
- python3Packages.invisible-watermark: add tests After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make ... — committed to mexisme/nixpkgs by Luflosi 9 months ago
satyajit-bagchi, To summarise, ARM Lambda is missing a lot of features required by Onnxruntime that is standard across other Linux machines. As such, supporting ARM Lambda is unlikely and probably not worth the effort.
On a positive note, Onnxruntime does work on x86_64 lambda. If you have an ARM machine, one deployment solution is to configure a deployment pipeline on an x86_64 machine - eg: GitHub Actions.
Hope this saves weeks of debugging.
@neo AWS Arm64 does not provide the files required for “detection”:
As suggested by the maintainers, Arm64 Lambda deviates from the norm by not providing these typical linux files:
So the conclusion is that this issue should be ideally fixed by AWS instead. The issue is outside the scope of ONNXruntime and pytorch/cpuinfo.
A possible work around could be replacing any references to /sys/devices in the code and add your own files (No idea how that would work). If time is priority, then getting ONNXruntime working on Arm64 Lambda is probably a waste of time.
I’m also experiencing this issue with a similar setup (see “System information” below). The error message is below as well (the same as the OP). I can add more details if needed/helpful.
System information
I’m able to run inference on an arm64 Lambda by building without cpuinfo via the CMake flag onnxruntime_ENABLE_CPUINFO, i.e.,
Would just like to add on that this is definitely a real issue for us… Migrating back to x86 introduces a whole separate set of problems that we’re not really prepared to do right now.
Thank you for the investigation!
I am not sure this is on cpuinfo’s shoulders anymore. CPU feature detection is already a complex matter. CPUinfo library already has lots of code handling many different platforms. Why does AMW lambda missing two files that are present in most other Linux platforms? This would make cross platform programming unnecessarily complex. Why?
These two system files provide CPU information such as instruction set, number of big little cores, cache size, etc. This information is vital for onnxruntime to provide necessary performance on ARM64 systems. Without it, onnxruntime could run an order of magnitude slower. At that point I doubt onnxruntime is useful.
I also ran into this error today on lambda ARM64 architecture, with onnxruntime==1.16.1.
Is there a recommended version of onnxruntime which works on AWS Arm64 devices?
So interestingly the Intel Lambda also doesn’t mount
/sysand as a result cpuinfo has to implement a bunch of workarounds to detect the features. It will still emit a warning in this case.So looks like this will be fixed as soon as cpuinfo is also fixed fro arm64.
But the fact it emits an error sounds like it might be a case of it not realising it needs to use those workarounds
https://github.com/pytorch/cpuinfo/issues/14
It’s because ARM Lambda uses the custom Amazon Graviton Processor and in order to get the best support also run’s Amazon Linux 2. I’m not sure if this is the exact reason they don’t have
/sys/devices/system/cpu/possibleBut I have found that Amazon Linux and in particularly the one used for AWS Lambda is very heavily restricted and many things disabled on it.
For example you can’t use multi processing from python because /dev/shm isn’t supported by Amazon Linux on ARM Lambda which is almost always provided by any other OS
Regardless, the main issue is just that the lates onnxruntime crashes, even running very very slow would be an upgrade.
@MengLinMaker thanks!
As a sidenote, I would not recommend deploying hugging face models in AWS lambda as it takes a long time to download models. Furthermore, even when using EFS to connect to lambda to cache the model, the read/write speeds are not fast enough to load LLMs in lambda quickly. Leaving this here to help anyone who wants to build an AI API microservice.
@johnsonchau-bulb It’s likely that AWS lambda ARM does not populate CPU info into the “/sys” folder. So essentially onnxruntime is trying to read a nonexistent file and directory.
The following test confirms this:
Result - “/sys” has no content:
Hello my dude! Using X86 (or whatever the OTHER architecture is, maybe it’s called AMD64) Architecture (+ maybe a few other tweaks including increasing the memory of the function to at least a few GB) I think solved it!
Will be getting back into this stuff later this week so will likely have more concrete answers then but for now I’m pretty sure that
Using X86 and increasing Memory gets you 97% of the way there - Good Luck!
@tianleiwu
Still the same Error when I run it in the cloud, works totally fine when I run the function locally, but fails when I invoke it in AWS.
NOTE: I am building it locally on a M1 Mac and then pushing it to ECR registry
Local Build command, run in the same directory as the other files
Here is my dockerfile:
app.py
Response when run in AWS
Log output
TLDR: I don’t see any Solution to this issue to using ONNX in AWS Lambda? Docker Image builds and runs fine locally on my M1 mac but in the cloud this happens:
Pls Help… Really need to run inference in AWS Lambda 🥲
You need to include both #10199 and #10334 .