pyodide: ModuleNotFoundError after writing multiple files with FS.writeFile

🐛 Bug

This might be a duplicate of #737, but I’m opening a new issue because I’m seeing something very weird. I’m using Pyodide 0.21.3.

To Reproduce

Consider the following snippet (I run it in node, but I can reproduce the same behavior also in the browser):

const { loadPyodide } = require("pyodide");

function busyWait(ms) {
    let start = new Date().valueOf();
    while( (new Date()).valueOf() - start < ms)
        ;
}

async function main() {
    let pyodide = await loadPyodide({
        indexURL: "./node_modules/pyodide",
    });

    console.log("Pyodide version: ", pyodide.version);

    let success = 0;
    let errors = 0;
    for(let i=0; i<100; i++) {
        // uncommenting EITHER of the following lines, it works reliably
        //pyodide.runPython('import importlib; importlib.invalidate_caches()');
        //busyWait(1000);

        pyodide.FS.writeFile(`bar${i}.py`, `x = ${i}`);
        try {
            pyodide.runPython(`import bar${i} as bar; print('OK ', bar.x)`);
            success++;
        }
        catch(e) {
            //console.log(e);
            console.log('ERR', i);
            errors++;
        }
    }
    console.log("success:", success);
    console.log("errors :", errors);
    console.log("error rate: ", (errors/(success+errors))*100, "%");
}

main();

The expected result is obviously to see all OK and 0% error rate, but what happens is that very often I get ERR. I observed the following:

  • if I uncomment the call to invalidate_caches(), it passes reliably
  • if I uncomment busyWait(1000), (i.e., 1 second) it passes reliably

The interesting part is what happens with lower values of busyWait:

  • with busyWait(500), it fails ~50% of the times
  • with busyWait(100) it fails ~90% of the times
  • with busyWait(10) it fails ~99% of the times

The following is just a guess, but I think that considering the above it’s a reasonable explanation:

  • somewhere there is a timestamp which is stored with a resolution of 1s
  • if the “new” files are written “too early”, importlib think that it is of the same epoch as the “old files/directories” and thus it thinks the cache is still valid
  • if you wait long enough, the 1s timestamp gets a new value and importlib correctly notices that there is a “new” file

I am not sure whether this is a bug/“feature” of emscripten. I thought it’s a good idea to open it here so that we can maybe discuss possible pyodide-specific mitigations and/or workarounds

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 25 (25 by maintainers)

Commits related to this issue

Most upvoted comments

so… HAVE_STAT_TV_NSEC being 1 sounds promising…

Great! Thanks sound indeed promising.

BTW, even if we make st_mtime_ns available in Pyodide, I guess we have to persuade CPython devs to use st_mtime_ns instead of st_mtime here?

I think the point is that this <1s behavior is not observed in native Python

I was able to observe the similar result in native Python (tested in Linux):

import os
import time
import importlib

success = 0
error = 0
for i in range(1000):
    os.system(f"echo x=1 >> bar{i}.py")
    # importlib.invalidate_caches()
    try:
        __import__(f"bar{i}")
        success += 1
    except:
        error += 1

print(f"Success: {success}")
print(f"Error: {error}")

os.system("rm -rf __pycache__")
os.system("rm -rf bar*.py")
Success: 254
Error: 746