playwright: [BUG] Request Interception Breaks Worker importScripts with Remote Resource

Context:

  • Playwright Version: 1.11.0-next (but effects at least the last couple of versions)
  • Operating System: macOS
  • Node.js version: v14.16.0
  • Browser: Chromium
  • Extra: -

Code Snippet

it('should not break remote worker importScripts',  (test, { browserName }) => {
  test.fail(browserName === "chromium");
}, async ({ page, server, context }) => {
  context.route('**', async request => {
    await request.continue();
  });
  await page.goto(server.PREFIX + '/worker/worker-http-import.html');
  await page.waitForSelector("#status:has-text('finished')");
});

/worker/worker-http-import.html:

<!DOCTYPE html>
<html>
<head>
    <title>Remote importScripts Test</title>
</head>
<html>
    <div id="status">-</div>
    <script>
    new Worker('worker-http-import.js').addEventListener("message", ({ data }) => {
        document.getElementById("status").innerText = data;
    });
    </script>
</html>
</html>

/worker/worker-http-import.js:

console.log("hello from worker-http-import.js");
importScripts("./import-me.js")
console.log("successfully imported");
self.postMessage("finished");

/worker/import-me.js:

console.log("hello from import-me.js");

Describe the bug

Request interception causes importScripts("./import-me.js") to hang. If you disable request interception (i.e. comment out the context.route(…) bits), the importScripts succeeds, hello from import-me.js is logged, and the waitForSelector succeeds and matches the finished message that’s bubbled up.


Additional Debugging Notes

  • With DEBUG=pw:protocol we see a Fetch.requestPaused for http://localhost:4747/import-me.js, but no protocol messages about that domain after when we would eventually expect a requestWillBeSent and a resume.
  • For http://localhost:4747/import-me.js, requestWillBeSent is undefined here in crNetworkManager.js: https://github.com/microsoft/playwright/blob/495085cbb2092bac71e36120e718e5cd92ad7374/src/server/chromium/crNetworkManager.ts#L182
  • The test passes for Firefox and Webkit.
  • Not all network requests originating from the worker hang. For example, in the below demo (fetch-worker.js) the fetch request succeeds.
  • Not all importScripts hang. For example, in the below demos (inline-import-worker.js and blob-import-worker.js) a Data URL and Object URL are successfully imported.

Here’s a real world example that’s broken:

const playwright = require("playwright");
const http = require("http");

const server = http.createServer((req, res) => {
  switch (req.url) {
    case "/main":
      res.writeHead(200);
      res.end(`
                <!DOCTYPE html>
                <html>
                    <head>
                        <title>Demo</title>
                    </head>
                    <body>
                        <h1>Demo</h1>
                        <iframe width="100%" height="300" src="//jsfiddle.net/pdfjs/wagvs9Lf/embedded/result/" allowfullscreen="allowfullscreen" allowpaymentrequest frameborder="0"></iframe>
                    </body>
                </html>
            `);
      return;
    default:
      res.writeHead(404);
      res.end("404 Not Found");
      return;
  }
});

server.listen(4747);

(async () => {
  const browser = await playwright.chromium.launch({ headless: false });
  const ctx = await browser.newContext();
  const page = await ctx.newPage();
  ctx.route("**", async (route) => {
    await route.continue();
  });
  await page.goto("http://localhost:4747/main");
  await page.frame({
    url: 'http://fiddle.jshell.net/pdfjs/wagvs9Lf/show/light/'
  }).click('text=14');
  await browser.close();
  server.close();
})()

If you comment out ctx.route(…) and re-run, it will succeed.

And below are various demos/experiments I used to narrow down why a website wasn’t working. We started with a 3rd-party PDF reader (https://mozilla.github.io/pdf.js/) appearing subtly broken and hanging only when connected with Playwright, and were not sure where the problem was, but have hopefully started to narrow it down! Unfortunately, we encounter a handful of libraries that use Web Workers with remote importScripts that are broken during testing because of this bug.

const pw = require('playwright');
const http = require('http');

const server = http.createServer((req, res) => {
  switch (req.url) {
    case "/":
      res.writeHead(200);
      res.end(`
                <!DOCTYPE html>
                <html>
                    <head>
                        <title>Broken Web Worker Demo</title>
                    </head>
                    <body>
                        <table>
                            <thead>
                                <tr>
                                    <th>Worker</th>
                                    <th>Status</th>
                                </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <td>basic-worker</td>
                                    <td id="basic-worker">-</td>
                                </tr>
                                <tr>
                                  <td>fetch-worker</td>
                                  <td id="fetch-worker">-</td>
                                </tr>
                                <tr>
                                    <td>inline-import-worker</td>
                                    <td id="inline-import-worker">-</td>
                                </tr>
                                <tr>
                                    <td>http-import-worker</td>
                                    <td id="http-import-worker">-</td>
                                </tr>
                                <tr>
                                    <td>blob-import-worker</td>
                                    <td id="blob-import-worker">-</td>
                                </tr>
                            </tbody>
                        </table>
                        <script>
                            const newWorker = (id) => {
                                new Worker("/" + id + ".js").addEventListener("message", ({ data }) => {
                                    document.getElementById(id).innerText = data;
                                });
                            };

                            newWorker("basic-worker");
                            newWorker("fetch-worker");
                            newWorker("inline-import-worker");
                            newWorker("http-import-worker");
                            newWorker("blob-import-worker");
                        </script>
                    </body>
                </html>
            `);
      return;
    case `/basic-worker.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`
                self.postMessage("running");
                console.log("basic-worker.js fully ran");
                self.postMessage("ran");
      `);
      return;
    case `/fetch-worker.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`
                self.postMessage("running");
                fetch("/test.json").then(resp => resp.json()).then(({ data }) => {
                  if (data === "hello") {
                    console.log("fetch-worker.js fully ran");
                    self.postMessage("ran");
                  }
                });
      `);
      return;
    case `/inline-import-worker.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`
                self.postMessage("running");
                importScripts('data:text/javascript,console.log("inline-import-worker.js fully ran")');
                self.postMessage("ran");
            `);
      return;
    case `/http-import-worker.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`
                self.postMessage("running");
                importScripts("http://localhost:4747/import-me.js");
                console.log("http-import-worker.js fully ran");
                self.postMessage("ran");
            `);
      return;
    case `/import-me.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`console.log("import-me.js fully ran");`);
      return;
    case `/blob-import-worker.js`:
      res.setHeader("Content-Type", "text/javascript");
      res.writeHead(200);
      res.end(`
                self.postMessage("running");
                const oURL = URL.createObjectURL(new Blob(['console.log("blob-import-worker.js fully ran")'], { type: "application/javascript"}));
                importScripts(oURL);
                self.postMessage("ran");
            `);
      return;
    case `/test.json`:
      res.setHeader("Content-Type", "application/json");
      res.writeHead(200);
      res.end(JSON.stringify({ data: "hello" }));
      return;
    default:
      res.writeHead(404)
      res.end(`${req.url} not found`);
      return;
  }
});
server.listen(4747);

(async () => {
  const browser = await pw.chromium.launch({ headless: false });
  const ctx = await browser.newContext();
  const page = await ctx.newPage();
  ctx.route("**", async (route) => {
    await route.continue();
  });
  await page.goto("http://localhost:4747");
  await page.waitForSelector("#basic-worker:has-text('ran')");
  await page.waitForSelector("#fetch-worker:has-text('ran')");
  await page.waitForSelector("#inline-import-worker:has-text('ran')");
  await page.waitForSelector("#http-import-worker:has-text('ran')");
  await page.waitForSelector("#blob-import-worker:has-text('ran')");
  await browser.close();
  server.close();
})()

If you run the above, you’ll notice all entries in the table flip to ran except http-import-worker. If you comment out the ctx.route(…) bits, it succeeds.

Thank you for all your work and let me know if you have some additional tips for debugging or need more information! I would love help getting to the bottom of this! (I will contribute an MR with the test case shortly.)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (14 by maintainers)

Commits related to this issue

Most upvoted comments

@pavelfeldman I had a chance to setup my machine with a Chromium debug build this evening, and think I’ve found some useful additional bits. (Wow, does compilation of Chromium for the first time on a new machine take a good deal of time!)

I’m operating under the assumption that Chrome SHOULD send us Network.requestWillBeSent before Fetch.requestPaused for import-me.js like it does for the other requests and as the Playwright code assumes it does to avoid deadlock. Let me know if that’s a wrong assumption.

If you run the failing test case (https://github.com/microsoft/playwright/pull/5953) with Playwright pointed at a local Chromium build with the following logging in inspector_network_agent.cc/InspectorNetworkAgent::WillSendRequestInternal:

diff --git a/third_party/blink/renderer/core/inspector/inspector_network_agent.cc b/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
index 6370da0d58..8cc24776cc 100644
--- a/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
+++ b/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
@@ -36,6 +36,7 @@
 #include "base/containers/span.h"
 #include "base/macros.h"
 #include "base/memory/scoped_refptr.h"
+#include "base/logging.h"
 #include "build/build_config.h"
 #include "net/base/ip_address.h"
 #include "net/base/ip_endpoint.h"
@@ -1122,6 +1123,7 @@ void InspectorNetworkAgent::WillSendRequestInternal(
   Maybe<String> maybe_frame_id;
   if (!frame_id.IsEmpty())
     maybe_frame_id = frame_id;
+  VLOG(1) << "WillSendRequestInternal:: URL:" << request_info->getUrl();
   GetFrontend()->requestWillBeSent(
       request_id, loader_id, documentURL, std::move(request_info),
       base::TimeTicks::Now().since_origin().InSecondsF(),

you will see that, in fact, the requestWillBeSent gets constructed and logged for import-me.js, but the Frontend doesn’t flush it out and we never see it sent to the CDP channel (before the Fetch.requestPaused):

[…] WillSendRequestInternal:: URL:"http://localhost:4747/http-import-worker.js"
[…] WillSendRequestInternal:: URL:"http://localhost:4747/favicon.ico"
[…] WillSendRequestInternal:: URL:"http://localhost:4747/import-me.js"

However, if we experiment and remove the guard to the deliberate flush (and therefore immediately flush):

diff --git a/third_party/blink/renderer/core/inspector/inspector_network_agent.cc b/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
index 6370da0d58..651a6eec83 100644
--- a/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
+++ b/third_party/blink/renderer/core/inspector/inspector_network_agent.cc
@@ -36,6 +36,7 @@
 #include "base/containers/span.h"
 #include "base/macros.h"
 #include "base/memory/scoped_refptr.h"
+#include "base/logging.h"
 #include "build/build_config.h"
 #include "net/base/ip_address.h"
 #include "net/base/ip_endpoint.h"
@@ -1122,14 +1123,15 @@ void InspectorNetworkAgent::WillSendRequestInternal(
   Maybe<String> maybe_frame_id;
   if (!frame_id.IsEmpty())
     maybe_frame_id = frame_id;
+  VLOG(1) << "WillSendRequestInternal:: URL:" << request_info->getUrl();
   GetFrontend()->requestWillBeSent(
       request_id, loader_id, documentURL, std::move(request_info),
       base::TimeTicks::Now().since_origin().InSecondsF(),
       base::Time::Now().ToDoubleT(), std::move(initiator_object),
       BuildObjectForResourceResponse(redirect_response), resource_type,
       std::move(maybe_frame_id), request.HasUserGesture());
-  if (is_handling_sync_xhr_)
-    GetFrontend()->flush();
+  // if (is_handling_sync_xhr_)
+  GetFrontend()->flush();
 
   if (pending_xhr_replay_data_) {
     resources_data_->SetXHRReplayData(request_id,

the test PASSES since we now see the the Network.requestWillBeSent before the Fetch.requestPaused for import-me.js! ✅ 🎉

If you think I’m on the right track (and Chrome is misbehaving here), please let me know and I will contribute a test to Chromium and then work with you on figuring out the appropriate patch. (I’m assuming always flushing won’t fly 😄 .)

I’ve also noticed that Fetch and Network are separate domains. Do they queue independently and are therefore going to cause us to see unexpected orderings of combined events depending on when each of them flushes? (If they queue in the same queue, the order would be preserved regardless of when we eventually flush.)

/cc @aslushnikov since it looks like he fixed a similar timing/flushing bug for sync xhrs.