realm-java: Fatal signal 11 (SIGSEGV) from Java_io_realm_internal_UncheckedRow_nativeGetString

Goal

No crashes

Expected Results

No crashes

Actual Results

Crashing consistently for one affected user w/a seemingly corrupted DB state

A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
    Build fingerprint: 'samsung/dreamqltesq/dreamqltesq:8.0.0/R16NW/G950USQU5CRG3:user/release-keys'
    Revision: '12'
    ABI: 'arm64'
A/DEBUG: pid: 27327, tid: 27372, name: RxComputationTh  >>> com.preveil.preveil <<<
    signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x78713fe000
        x0   00000078704e4380  x1   00000078704e437f  x2   0000007871254317  x3   00000078704e43d8
        x4   00000078704e4380  x5   000000786f5a9d7d  x6   0000000000000000  x7   0000000000000000
        x8   0000000000000000  x9   0000000000000000  x10  0000000000000001  x11  0000000000000000
        x12  000000786fc05210  x13  0000000001000000  x14  0000000000000000  x15  0000000000000000
        x16  000000787380e570  x17  000000788e181970  x18  0000000000000020  x19  00000078704e4380
        x20  000000787376a000  x21  00000078704e4540  x22  00000078713fe000  x23  00000078704e4380
        x24  00000078704e437f  x25  0000000000000001  x26  00000078704e4568  x27  00000078704e45d0
        x28  00000078704e4570  x29  00000078704e42f0  x30  00000078735424e0
        sp   00000078704e42f0  pc   00000078735424cc  pstate 0000000020000000
09-04 11:23:56.963 27385-27385/? A/DEBUG: backtrace:
        #00 pc 000000000003b4cc  /data/app/com.preveil.preveil-FG01oMB2aWtfSFb4Aipq1w==/lib/arm64/librealm-jni.so
        #01 pc 00000000000be5d8  /data/app/com.preveil.preveil-FG01oMB2aWtfSFb4Aipq1w==/lib/arm64/librealm-jni.so
        #02 pc 00000000000b6f28  /data/app/com.preveil.preveil-FG01oMB2aWtfSFb4Aipq1w==/lib/arm64/librealm-jni.so (Java_io_realm_internal_UncheckedRow_nativeGetString+92)
        #03 pc 0000000000510d00  /system/lib64/libart.so (art_quick_generic_jni_trampoline+144)
        #04 pc 000000000000f8bc  /dev/ashmem/dalvik-jit-code-cache_27327_27327 (deleted)

Steps & Code to Reproduce

So far, only one known user has encountered this issue. This user will encounter the crash every time they launch the app. Fortunately, I have access to the user’s device and have hooked it up to the debugger. There seems to be 3 RealmObjects (all of the same ChildObject type described below) out of hundreds which have somehow corrupted, and trying to access any of these 3 objects will seg fault. I’ve tried accessing these objects w/in a DynamicRealm, but that seg faults as well.

Although the stacktrace above happens on a RxComputation thread, when I run everything on the main thread, the crash persists.

Code Sample

Unfortunately I can’t share specific code or realm files, but I’ll describe the relevant schema structure and access which is causing the crash.

open class ParentObject : RealmObject() {
    @PrimaryKey
    var identifier = UUID.randomUUID().toString()
    var children = RealmList<ChildObject>()
    // other properties
}

open class ChildObject : RealmObject() {
    @PrimaryKey
    var identifier = UUID.randomUUID().toString()
    // other properties
}

// Elsewhere, on app launch
val parents = Realm.getDefaultInstance().where(ParentObject::class.java).findAll()
parents.forEach { parentObject ->
    parentObject.children.forEach { childObject ->
        // For hundreds of ChildObjects, this is totally fine
        // But for 3 seemingly corrupted objects, this seg faults
        val property = childObject.property
    }
}

Version of Realm and tooling

Realm version(s): 5.3.1 w/encryption enabled

Realm sync feature enabled: no

Android Studio version: 3.1.3

Which Android version and device: Samsung Galaxy S8 running Android 8

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 80 (28 by maintainers)

Most upvoted comments

Hutlihut (sorry non-danes: https://www.youtube.com/watch?v=QdFK6VbuIC0)

I managed to reproduce in an isolated unit test on an x86 emulator:

    // Attempts to reproduce https://github.com/realm/realm-java/issues/6152
    @Test
    @RunTestInLooperThread
    public void encryption_stressTest() {
        final int WRITER_THREADS = 20;
        final int TEST_OBJECTS = 100_000;
        final int MAX_STRING_LENGTH = 1000;
        final AtomicInteger id = new AtomicInteger(0);
        final CountDownLatch writersDone = new CountDownLatch(WRITER_THREADS);
        final CountDownLatch mainReaderDone = new CountDownLatch(1);
        long seed = System.nanoTime();
        RealmLog.info("Starting test with seed: " + seed);
        Random random = new Random(seed);

        final RealmConfiguration config = new RealmConfiguration.Builder() //.configFactory.createConfigurationBuilder()
                .name("stress-test.realm")
                .encryptionKey(TestHelper.getRandomKey(seed))
                .build();
        Realm.getInstance(config).close();

        for (int i = 0; i < WRITER_THREADS; i++) {
            new Thread(new Runnable() {
                @Override
                public void run() {
                    Realm realm = Realm.getInstance(config);
                    realm.executeTransaction(r -> {
                        for (int j = 0; j < (TEST_OBJECTS / WRITER_THREADS); j++) {
                            AllJavaTypes obj = new AllJavaTypes(id.incrementAndGet());
                            obj.setFieldString(TestHelper.getRandomString(random.nextInt(MAX_STRING_LENGTH)));
                            r.insert(obj);
                        }
                    });
                    realm.close();
                    writersDone.countDown();
                }
            }).start();
        }

        Realm realm = Realm.getInstance(config);
        looperThread.closeAfterTest(realm);
        RealmResults<AllJavaTypes> results = realm.where(AllJavaTypes.class).findAllAsync();
        looperThread.keepStrongReference(results);
        results.addChangeListener(new OrderedRealmCollectionChangeListener<RealmResults<AllJavaTypes>>() {
            @Override
            public void onChange(RealmResults<AllJavaTypes> results, OrderedCollectionChangeSet changeSet) {
                for (AllJavaTypes obj : results) {
                    String s = obj.getFieldString();
                }

                if (results.size() == TEST_OBJECTS) {
                    realm.close();
                    mainReaderDone.countDown();
                }
            }
        });

        Thread t = new Thread(new Runnable() {
            @Override
            public void run() {
                try {
                    writersDone.await();
                    mainReaderDone.await();
                } catch (InterruptedException e) {
                    fail(e.toString());
                }
                looperThread.testComplete();
            }
        });
        looperThread.keepStrongReference(t);
        t.start();
    }

Stacktrace:

2019-10-08 19:16:57.717 14445-14474/io.realm.test A/libc: Fatal signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xd4cc0000 in tid 14474 (RunTestInLooper), pid 14445 (io.realm.test)
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: Build fingerprint: 'google/sdk_gphone_x86/generic_x86:9/PSR1.180720.093/5456446:userdebug/dev-keys'
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: Revision: '0'
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: ABI: 'x86'
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: pid: 14445, tid: 14474, name: RunTestInLooper  >>> io.realm.test <<<
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG: signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xd4cc0000
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG:     eax d473902f  ebx d6d1dd64  ecx d653bc74  edx d6d12acc
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG:     edi d653ba8c  esi d4cc0000
2019-10-08 19:16:57.759 14522-14522/? A/DEBUG:     ebp d653bb58  esp d653ba70  eip d66aa5f2
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG: backtrace:
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #00 pc 0016a5f2  /data/app/io.realm.test-oAB4AdGBdcp3-4nV76wAjA==/lib/x86/librealm-jni.so (string_to_hex(std::string const&, realm::StringData&, char const*, char const*, unsigned short*, unsigned short*, unsigned int, unsigned int)+264)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #01 pc 0016b16f  /data/app/io.realm.test-oAB4AdGBdcp3-4nV76wAjA==/lib/x86/librealm-jni.so (to_jstring(_JNIEnv*, realm::StringData)+1471)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #02 pc 0015c0ae  /data/app/io.realm.test-oAB4AdGBdcp3-4nV76wAjA==/lib/x86/librealm-jni.so (Java_io_realm_internal_UncheckedRow_nativeGetString+574)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #03 pc 000151a0  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.internal.UncheckedRow.nativeGetString+144)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #04 pc 00012725  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.internal.UncheckedRow.getString+69)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #05 pc 00012671  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.io_realm_entities_AllJavaTypesRealmProxy.realmGet$fieldString+161)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #06 pc 00013c49  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.entities.AllJavaTypes.getFieldString+41)
2019-10-08 19:16:57.876 14522-14522/? A/DEBUG:     #07 pc 00013e46  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.RealmTests$67.onChange+182)

Also seeing this assertion failure:

/home/jenkins/workspace/realm_realm-core_release_5.23.5@2/src/realm/alloc_slab.cpp:1233: [realm-core-5.23.5] Assertion failed: matches_section_boundary(file_size)

This unit test does indeed create a large number of both writer and reader threads.

Further observations:

  • Removing the changelistener from the main thread does not the crash, i.e. the reader threads work.
  • Saving all data in one write transaction prevents the crash
  • 100.000 objects with string length 1.000 crashes. 1.000.000 objects with string length 100 does not. 1.000 objects with string length 1.000 does not crash. 10.000.000 objects with string length 100 does not crash.
  • Multiple write transactions on the same Realm instance on the same thread crashes with a slightly different error:
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: Build fingerprint: 'google/sdk_gphone_x86/generic_x86:9/PSR1.180720.093/5456446:userdebug/dev-keys'
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: Revision: '0'
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: ABI: 'x86'
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: pid: 20192, tid: 20220, name: RunTestInLooper  >>> io.realm.test <<<
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG: signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG:     eax 00000000  ebx 00004ee0  ecx 00004efc  edx 00000006
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG:     edi 00004ee0  esi f33b62a8
2019-10-08 23:25:07.538 20231-20231/? A/DEBUG:     ebp d653b6d8  esp d653b608  eip f7310b39
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG: backtrace:
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #00 pc 00000b39  [vdso:f7310000] (__kernel_vsyscall+9)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #01 pc 0001fdf8  /system/lib/libc.so (syscall+40)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #02 pc 00022ed3  /system/lib/libc.so (abort+115)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #03 pc 005de414  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (__gnu_cxx::__verbose_terminate_handler()+452)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #04 pc 005a6fa7  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (__cxxabiv1::__terminate(void (*)())+23)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #05 pc 005de085  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (__cxa_call_terminate+69)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #06 pc 005a6701  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (__gxx_personality_v0+321)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #07 pc 005f14d8  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (_Unwind_RaiseException_Phase2+140)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #08 pc 005f1926  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (_Unwind_Resume+92)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #09 pc 001e88ee  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::util::do_encryption_read_barrier(void const*, unsigned int, unsigned int (*)(char const*), realm::util::EncryptedFileMapping*)+110)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #10 pc 001e8855  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::util::encryption_read_barrier(void const*, unsigned int, realm::util::EncryptedFileMapping*, unsigned int (*)(char const*))+44)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #11 pc 001ec03a  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::SlabAlloc::do_translate(unsigned int) const+1020)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #12 pc 000cf474  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::ArrayBigBlobs::get(char const*, unsigned int, realm::Allocator&)+136)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #13 pc 0050e4b1  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::ArrayBigBlobs::get_string(char const*, unsigned int, realm::Allocator&, bool)+57)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #14 pc 0050f498  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::StringColumn::get(unsigned int) const+632)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #15 pc 0048d031  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (realm::StringData realm::Table::get<realm::StringData>(unsigned int, unsigned int) const+421)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #16 pc 0015c092  /data/app/io.realm.test-jOcLv55p07ug14CfJ3a4vw==/lib/x86/librealm-jni.so (Java_io_realm_internal_UncheckedRow_nativeGetString+546)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #17 pc 00014d20  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.internal.UncheckedRow.nativeGetString+144)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #18 pc 000123c5  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.internal.UncheckedRow.getString+69)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #19 pc 00012321  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.io_realm_entities_AllJavaTypesRealmProxy.realmGet$fieldString+161)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #20 pc 000134a9  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.entities.AllJavaTypes.getFieldString+41)
2019-10-08 23:25:07.636 20231-20231/? A/DEBUG:     #21 pc 00013881  /dev/ashmem/dalvik-jit-code-cache (deleted) (io.realm.RealmTests$66.onChange+193)

This matches the different kinds we seen so far.

We believe we have identified the root cause: https://github.com/realm/realm-core/issues/3427

Our team has active paid support. We provide 2 or 3 corrupted databases with encryption keys. Unfortunately, no update. And now we consider painful database switch, because of this and similar native crashes. That sad…

  1. Reproduced on 5.15.2. Tryed to load data from our server - 3 crashes of 3 attempts.
  2. 100%, 5 devices from different vendors - Honor, Xiaomi, Lenovo.
  3. Crash happend when our application receive data from server. Is a custom sync mechanism that writes data in portions (~50 rows per transaction). In parallel different data is requested by views by mean of RxJava subscriptions. First crash happend on read thread. But database is corrupted when writen. If disable read subscription and wait sync copmlete - any read from big table will reproduce crash. Table contains about 25 colums and ~1200 rows (on moment of crash about 350). Realm 5.3.1 works with crash probability of 10%. After 5.9.1 there is no chance to sync our tipical data set (100% crash).
  4. No, Realm Sync not used.
  5. No, this issues is not reproducable with disabled encryption. But is not option for us. Actualy Realm is choosen for embedded encryption, so no encryption - no Realm.

I have no knowledge about the inner workings of Realm-Core but I want to understand why realm files cannot be validated internally to not contain corrupt or erroneous data and auto-recover in a clean slate. The fact that realm files can go “corrupt” and require a full uninstall/reinstall of the application is very bad for such a widely used database.

Also, has anyone tried to build their own encryption layer/interface and disable the encryption done by realm? We would have tried something like this if it wasn’t for our encryption key rotating every so often.

Logcat output attached. Do you need something else?

crush_report_realm_honor.log crush_report_realm_xiaomi.log

That’s good. But our app make writes in one thread and open few subscriptions (changelistener) depending on active screen. My observations:

  1. Writes may complete successfully. Crash happens on any read on written data (even after restart). So there is no correlation with active changelistener actually (in our case). 2 Crash is highly depends on “width” of one row - entity with 15+ String field (5-40 chars each) is enough to reproduce it.

No, 6.0 only breaks some Sync API’s: https://github.com/realm/realm-java/blob/master/CHANGELOG.md#6002019-10-01

7.0 will contain the Core 6 upgrade which will change the file format.

We will start logging every realm transaction to Firebase NDK Crashlytics so we can see exactly what steps lead to these crashes. I hope other people will do the same so we can discover patterns.

@yohanan

(1) how do you catch these corruption exceptions? since most if not all are coming from the JNI, they are difficult to catch and respond.

You pretty much can’t.

We put a number of validating realm transactions and the realm init code around this wrapper:

@SuppressLint("ApplySharedPref")
fun <T> safeExecute(block: () -> T): T {
	sharedPreferences.edit().putInt(REALM_VALIDATION, FLAG_START).commit()
	val result = block()
	sharedPreferences.edit().putInt(REALM_VALIDATION, FLAG_END).commit()
	return result
}

Then on startup we delete the realm files if the FLAG_END was not hit. It’s not pretty but it works for native crashes triggered during startup. At least sometimes our users can continue rather than uninstall the app. But the corruption sometimes happens on a single field, or during background syncs and then your pretty much screwed (unless you use a timer approach).

Also, cmelchior mentioned that reading the corrupt string field on javascript does not cause a crash but silently fails, so I’m wondering how javascript is able to do that.

Alternately you can catch the exception and delete the Realm file if it fails to open, but that’s clearly not something you want to do for something that stores important data that you only have locally.

Well that’s not any solution, except you get rid of the crashes. Not even saying that catching native crashes is problematic.

Personally I would expect Realm to be able to detect that a transaction would cause corruption, instead of validating only on start-up after it is too late to do any recovery (as the transaction history is also in the Realm file).

That could actually be very helpful and maybe a “quickest” solution to the problem, but I still don’t understand why these transaction errors occur only on encrypted db’s.

Corrupted DB emailed to help@realm.io