tangram-es: Memory allocation failure in MapController init

TO REPRODUCE THE ISSUE, FOLLOW THESE STEPS:

  • Use an Android phone with GrapheneOS
  • Run an app that uses the tangram library, 0.13.0

RESULT:

https://github.com/tangrams/tangram-es/blob/c4b024539f07dff1addc2e6424e7a535a1f0c191/platforms/android/tangram/src/main/java/com/mapzen/tangram/NativeMap.java#L10-L15

12-02 10:04:08.935 26927 26927 E AndroidRuntime: FATAL EXCEPTION: main
12-02 10:04:08.935 26927 26927 E AndroidRuntime: Process: de.westnordost.streetcomplete, PID: 26927
12-02 10:04:08.935 26927 26927 E AndroidRuntime: java.lang.RuntimeException: Unable to create a native Map object! There may be insufficient memory available.
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapController.<init>(MapController.java:163)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapView.getMapInstance(MapView.java:276)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapView.initMapController(MapView.java:173)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapView$1.onLibraryReady(MapView.java:120)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapView$1InitTask.onPostExecute(MapView.java:236)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.mapzen.tangram.MapView$1InitTask.onPostExecute(MapView.java:224)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.os.AsyncTask.finish(AsyncTask.java:771)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.os.AsyncTask.access$900(AsyncTask.java:199)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:788)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.os.Handler.dispatchMessage(Handler.java:106)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.os.Looper.loop(Looper.java:223)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at android.app.ActivityThread.main(ActivityThread.java:7656)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at java.lang.reflect.Method.invoke(Native Method)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.android.internal.os.ExecInit.main(ExecInit.java:43)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.android.internal.os.RuntimeInit.nativeFinishInit(Native Method)
12-02 10:04:08.935 26927 26927 E AndroidRuntime:        at com.android.internal.os.RuntimeInit.main(RuntimeInit.java:399)

EXPECTED RESULT:

  • The NativeMap object does not throw an error.

ENVIRONMENT:

  • What operating system and device did you produce this issue on? Pixel 4a, Android 11 (GrapheneOS w/ hardened_malloc RP1A.201105.002.2020.11.27.15)
  • If you used a released version, what is the version number? v0.13.0

OTHER:

Although this works fine in a regular Android install, failing to allocate with hardened_malloc could indicate underlying issues with memory management. Something unsafe is happening for this code path to be triggered.

Thanks 👍

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (13 by maintainers)

Commits related to this issue

Most upvoted comments

Ok I think I know what’s going on now, and it’s way dumber than I first thought.

I had been looking for ways that Tangram could be mis-managing memory and triggering the new pointer checks in Android 11. However, the crash reports have not indicated that the application was killed by the system - the application was killed from an uncaught exception in our own code! So then I looked closer at the condition that leads to that exception.

The native init code allocates native memory and returns the pointer to it as a Java long. There’s a check on this returned value to make sure the allocation succeeded. The Java code throws an exception if nativePtr <= 0. Clearly 0 is an error - we can’t dereference NULL later on. But what about < 0? This would only happen when the highest bit in the pointer is set. On a normal system this would never happen, but this is exactly what happens when pointer tagging is enabled!! In that case, those pointers aren’t actually invalid, but our code thinks they are because of the way pointer tagging is implemented.

The bottom line is: I am pretty sure this crash will be resolved by changing nativePtr <= 0 to nativePtr == 0. I’ll make this change and upload a build for testing tonight.

Original issue resolved by https://github.com/tangrams/tangram-es/pull/2217 - the segfault has been moved to another issue https://github.com/tangrams/tangram-es/issues/2218

The offset is another issue that has been fixed for a while on tangram-es master but hasn’t been released yet. The offset looks more wrong with the fix because SC intentionally sets a wrong offset to counteract the now fixed offset bug that existed until 0.13

On 9 December 2020 11:42:37 CET, Tim notifications@github.com wrote:

I have compiled the StreetComplete app with the non-optimized .aar library, but it still crashes with the same exception. When the TBI check is turned off, the only thing that changed with the new version is that the quest pin (or selection ring) is now offset.

Screenshot_20201209-111626

– You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/tangrams/tangram-es/issues/2215#issuecomment-741688387

– Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

Wow, this is really useful info! (and some interesting tech in Android!)

The first step for fixing this is a reliable reproduction case. I don’t have an Android 11 device, but I can try to run it in an ARMv8 emulator.

If I can’t manage to reproduce the Android crash, I can also try looking for memory management issues with tools like Valgrind and hope that the problem is detectable in the Linux desktop build.

I have some suspects in my head now for what part of the code might be causing this. There is at least one 3rd-party library we use that implements an optimization by storing some data in the unused bytes of pointers - it’s easy to disable this behavior if it turns out to be the problem. Another possibility is that we are triggering these pointer checks when we pass pointer values from memory allocated in C++ up to Java as a long and then back to C++. However, I believe this practice is very common and is suggested in the Android docs themselves, so it seems unlikely.

Okay, @tapetis found something that will help find the culprit.

The crash only happens on Android 11 and when the app does not run in any kind of backward compatibility mode (i.e., compiled for and targeting Android 11).

Starting in Android 11, heap allocations have a tag set in the top byte of the pointer on devices with kernel support for ARM Top-byte Ignore (TBI). The feature is described in more detail here: https://source.android.com/devices/tech/debug/tagged-pointers

The linked article also includes some helpful information on what may be the problem if one experiences crashes in the application:

If your app crashed and you were prompted with this link, it could mean one of the following:

  • The application tried to free a pointer that wasn’t allocated by the system’s heap allocator.
  • Something in your app modified the top byte of a pointer. The top byte of the pointer can’t be modified and your code needs to be changed to fix this issue.

Examples of the top byte pointer being incorrectly used or modified.

  • Pointers to a particular type have application specific metadata stored in the top 16 address bits.
  • A pointer cast to double and then back, losing the lower address bits.
  • Code computing the difference between the addresses of local variables from different stack frames as a way to measure recursion depth.

The SIGSEGV crashes occur both with and without pointer tagging enabled. However, they only appear with the PtrChkFix2 version and not with the 0.13.0 release. I have attached the entire tombstones of multiple SEGV_ACCERR and SEGV_MAPERR errors.

SEGV_ACCERR-1.txt SEGV_ACCERR-2.txt SEGV_MAPERR-1.txt SEGV_MAPERR-2.txt SEGV_MAPERR-3.txt