autoortho: X-Plane crashes in `TEX_obj::load_texture_data()` with `fuse.threading=True` (Arch/Manjaro)

First of all thank you for that project.

Describe the bug

As soon as I use fuse.threading=True (even with maxwait=1.0), I get texture artifacts like this even for areas that have been fully cached, for example when I just loaded into a flight there.

When flying, AutoOrtho loads tiles, but after a relatively short time (usually < 20NM) X-Plane 12 chokes on some texture data and crashes.

The X-plane backtrace shows:

#0  __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:486
#1  0x0000000000f45685 in TEX_obj::load_texture_data(TEX_obj_load_data const&, std::__1::function<void (TEX_obj_load_result const&)>&&) ()
#2  0x0000000000f42d90 in TEX_obj::do_load(unsigned int, bool, UTL_continuable*) ()
#3  0x0000000000f5604e in lambda_wrapper<TEX_obj::queue_load_async(bool, UTL_continuable*)::$_1>::resume(resume_base*) ()
#4  0x0000000000cf08a0 in UTL_threadpool::thread_func(void*) ()
#5  0x00000000005398b7 in THREAD_launch(void*) ()

(Actually these crashes were my problem since months, I just now discovered that they only happen with fuse.threading=True).

To Reproduce

Steps to reproduce the behavior:

Setup 0.5.0-2 on Manjaro Linux
Leave settings at default (fuse.threading=True)
Run via the released bin, start X-Plane 12.05r1
Move plane. The faster you go, the quicker it crashes for me. I have an X-Plane crash after 1-4 min when I go in a straight line at 3000ft with 300kts.

Expected behavior

I expect some tiles to not load, some blurriness or even green fallback tiles. But I receive a crash.

(please complete the following information)

OS: Linux (Arch/Manjaro, Kernel 6.3/6.1/5.15)
XPlane Version: 12.05r1 (no Zink)
AutoOrtho Version: main, 0.5.0-2, 0.4.3 (run via released bin, via self-built bin, via Python)
Filesystem type: ext4

Additional context

The .jpg files in cache are all valid and all are 256 x 256px (jpeginfo --check).

I suspected an architecture problem with the pre-built STB/ISPC texture utils so as a test I built them myself from source https://github.com/bensnell/stb_dxt https://github.com/GameTechDev/ISPCTextureCompressor (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/self-compiled-shared-objects)

no change

I tried pydds.compressor=ISPC with pydds.format=BC1 and BC3 and pydds.compressor=STB.

no change for the crashes (but STB has a general problem? See below)

I changed the cache code so that always the same cache file was delivered for a given ZL in trying to narrow it down. (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/dds-tests)

no change

I suspected unflushed write() due to non-closed files (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/close-files)

no change

I’ve run tests (cd autoortho && pytest .) and all 45 complete successfully after a little change (https://github.com/kubilus1/autoortho/compare/main...jonaseberle:autoortho:dev/use-refuse)

configuration: .autoortho Please tell me if logs of a certain combination of settings would be interesting.

Side note: STB always produces these kind of textures for me:

Actually I am happy to have it stable again with threading=False, but I am posting my findings now anyways. With maxwait=1.0 X-Plane hangs (around 20-30s) each 5-15min @400kts @FL410 while AutoOrtho is heavily downloading slippy tiles and that’s not fit for VATSIM so I drastically lowered maxwait to 0.1 - that’s still not perfect but it leads to less and shorter hangs, although it still happened during approach where even 10s stand-still are too much. It would be great if we could prioritize performance over any beauty on demand. Let me know if I can help to make multithreading more stable.

About this issue

Original URL
State: closed
Created a year ago
Comments: 15 (13 by maintainers)

Most upvoted comments

This should be resolved by #409 Well the duplicate tile stuff, anyhow

kubilus1 on Oct 8, 2023

Face palm 😃 I am not very literate with Python.

lol. No worries. It’s a somewhat vague construct.

So another thing to perhaps experiment with would be to play around with thread locking. For the ‘getortho.py’ module I have a decorator setup. (another weird python feature!). This is basically a syntax for a function wrapper.

Try adding @locked right above some key methods. https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L632 would be a good candidate to start.

    @locked
    def read_dds_bytes(self, offset, length):
        log.debug(f"READ DDS BYTES: {offset} {length}")
       
        if offset > 0 and offset < self.lowest_offset:

That method is the ingress point for reads from the FUSE side of things.

Another one that could be at play is this: https://github.com/kubilus1/autoortho/blob/main/autoortho/getortho.py#L803

Which basically handles the decision of finding the best possible tile in the case of a chunk fetch taking too long. Since you report issues during high load times, that’s suspicious.

Thanks again for the detailed feedback.

kubilus1 on Jul 4, 2023

Yeah if a read starts from 0, you can assume a header read is occurring (that’s always bytes), but the read request will be longer. So effectively at that point I have to assume the rest of the data will be needed, and I process that much data and return that with the read.

At the point of that read request, it’s entirely possible there will be another read request for the next block. Or not. Impossible to tell.

As far as how much data is read, it’s pretty consistent on Linux but can be system dependent, but not so much on Windows. Some of the read patterns there are just … odd. I did try directio at one point but had some issues. Possibly worth a retry.

You can play around with fuse options here if you want: https://github.com/kubilus1/autoortho/blob/main/autoortho/autoortho_fuse.py#L452

kubilus1 on Jul 4, 2023

Some trickier areas of code in the project involve the ‘partial’ compression, and the hand off to the aoimage module. That along with areas where reads happen at boundaries between mipmaps.

For instance, when reads occur, first the header is read. The header is 128bytes, however the block that is read will be larger. Modern OSs don’t simply consistently read 32k or whatever, this can vary. We don’t want to just assume that a header+extra read means we want to pull in a full image (that would be the largest mipmap to boot!) but also we can’t assume that this is just a header read.

Afterall, that extra info after the 128byte header will likely be cached deep in the OS somewhere. So now we really have to go ahead and pull in the data for that random extra amount. And also we need to go ahead and convert that to DDS files. Oh and DDS conversion has to be on certain boundaries as well.

It would be much much simpler to just pull down the entire mipmap0 and convert the whole thing as soon as we see and access, but from the log lines, the ratio of very far low res tiles to mipmap0 full res tiles is around 1000/1.

Anyhow, so if there is some kind of uncleared data, or maybe two threads contending here, I’d suspect that area to be a prime target.

kubilus1 on Jul 3, 2023