chromium: [BUG] Headless 'new' mode uses a headless user agent

Environment

  • chromium Version: 112.0.2
  • puppeteer / puppeteer-core Version: 19.9.1
  • Node.js Version: 18
  • Lambda / GCF Runtime: 18

Expected Behavior

Using headless: 'new' should use a non headless user agent (for example, Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36)

Current Behavior

I’m using those launch options

launchOptions: {
    args: [
      '--allow-pre-commit-input',
      '--disable-background-networking',
      '--disable-background-timer-throttling',
      '--disable-backgrounding-occluded-windows',
      '--disable-breakpad',
      '--disable-client-side-phishing-detection',
      '--disable-component-extensions-with-background-pages',
      '--disable-component-update',
      '--disable-default-apps',
      '--disable-dev-shm-usage',
      '--disable-extensions',
      '--disable-hang-monitor',
      '--disable-ipc-flooding-protection',
      '--disable-popup-blocking',
      '--disable-prompt-on-repost',
      '--disable-renderer-backgrounding',
      '--disable-sync',
      '--enable-automation',
      '--enable-blink-features=IdleDetection',
      '--export-tagged-pdf',
      '--force-color-profile=srgb',
      '--metrics-recording-only',
      '--no-first-run',
      '--password-store=basic',
      '--use-mock-keychain',
      '--disable-domain-reliability',
      '--disable-print-preview',
      '--disable-speech-api',
      '--disk-cache-size=33554432',
      '--mute-audio',
      '--no-default-browser-check',
      '--no-pings',
      '--single-process',
      '--disable-features=Translate,BackForwardCache,AcceptCHFrame,MediaRouter,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,site-per-process',
      '--enable-features=NetworkServiceInProcess2,SharedArrayBuffer',
      '--hide-scrollbars',
      '--ignore-gpu-blocklist',
      '--in-process-gpu',
      '--window-size=1920,1080',
      '--use-gl=angle',
      '--allow-running-insecure-content',
      '--disable-setuid-sandbox',
      '--disable-site-isolation-trials',
      '--disable-web-security',
      '--no-sandbox',
      '--no-zygote',
      '--headless=new',
    ],
    executablePath: '/tmp/chromium',
    headless: 'new'
  }

and when I obtain the user agent, it has the value Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/112.0.5614.0 Safari/537.36. I tried to override the user agent, it works but I’m still being detected by an antibot, so I suspect the headless ‘new’ mode is not working well (In local environment, using a local chromium, it’s working ok)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (5 by maintainers)

Most upvoted comments

Looked into this last week, there are a few different things going on here:

1. Headless arguments

The new headless mode is pretty particular about what CLI arguments it expects. It only launches with --headless=new, whereas the current flag generation logic uses headless='new'.

2. User Agent

As @jacobi973 pointed out, the user agent does reveal whether we’re using headless v1 or v2. V1 will have:

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/116.0.5845.82 Safari/537.36

While V2 has:

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36

If you’re testing for the user agent remotely in lambda, you can either add some logging or ping a user agent detection website that echos back the content of this header.

3. Build target

The combination of headless.gn / the headless_shell target will only build the old headless codebase. I have a fork going where I’m trying to build the full chromium payload, which should support V1 and V2. Will keep this thread posted on progress there. /cc @Sparticuz

I’m experiencing the same issue as well.

@cernadasjuan In doing some research about the new headless mode, it’s not meant to evade bot detection. In fact, I’ve seen comments where it specifically marks itself as headless. It’s meant to close the gap between how headless and headful mode render content. I’m leaning towards this is not a bug, however, I’d like to know how you are determining that it’s not working. Is there a page that will tell if you are using old or new headless mode?

Another thing that might be affecting this is args. I’ve seen some flags, especially the --single-process, affect ‘bot detection’.

Thanks @Sparticuz ! As a workaround, I created a custom docker image with the last chromium build installed, and using lambda container images it’s working! This is the Dockerfile:

FROM public.ecr.aws/lambda/nodejs:18

RUN yum install -y unzip && \
  curl -Lo "/tmp/chrome-linux.zip" "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F1129993%2Fchrome-linux.zip?alt=media" && \
  unzip /tmp/chrome-linux.zip -d /opt/

RUN yum install atk cups-libs gtk3 libXcomposite alsa-lib \
    libXcursor libXdamage libXext libXi libXrandr libXScrnSaver \
    libXtst pango at-spi2-atk libXt xorg-x11-server-Xvfb \
    xorg-x11-xauth dbus-glib dbus-glib-devel -y

RUN mv /opt/chrome-linux /opt/chrome
 
# Copy handler function and package.json
ADD dist/ ./
ADD node_modules ./node_modules
 
 
# Set the CMD to your handler
CMD [ "/var/task/app.handler" ]

Then, in the puppeteer project, you should use /opt/chrome/chrome as executablePath