TagUI: snap not working on some pages
Hi,
I currently try to automate downloading a newspaper to my archive because it is only available online for limited time (7 days per issue). I decided to take some screenshots to see whether things are working like I expect them to work. Unfortunately in headless mode snap hangs and both, phantomjs and chrome have 100%cpu. I also tried to enable the debugging mode which didn’t gave me very much information. I try to use headless mode because it looks like the normal mode cannot render the next pages (snap is just empty - I’d say those pages are relatively complex).
tagui@51e7de6e5311:/s$ /tagui/src/tagui sz headless debug
START - automation started - Tue May 22 2018 15:24:56 GMT+0000 (UTC)
[info] [phantom] Starting...
[info] [phantom] Running suite: 18 steps
[debug] [phantom] opening url: about:blank, HTTP GET
[debug] [phantom] url changed to ""
[debug] [phantom] Successfully injected Casper client-side utilities
https://epaper.sueddeutsche.de/login - SZID - Login
[info] [phantom] Step anonymous 2/18: done in 1953ms.
wait 10 seconds
[info] [phantom] Step anonymous 3/18: done in 1958ms.
[info] [phantom] Step _step 4/18: done in 1977ms.
[info] [phantom] wait() finished waiting for 10000ms.
type id_login as username
[info] [phantom] Step _step 5/19: done in 11994ms.
[info] [phantom] waitFor() finished in 229ms.
[info] [phantom] Step then 6/20: done in 13154ms.
type id_password as password
[info] [phantom] Step anonymous 7/20: done in 13156ms.
[info] [phantom] Step _step 8/21: done in 13176ms.
[info] [phantom] waitFor() finished in 224ms.
[info] [phantom] Step then 9/22: done in 14330ms.
click authentication-button
[info] [phantom] Step anonymous 10/22: done in 14331ms.
[info] [phantom] Step _step 11/23: done in 14351ms.
[info] [phantom] waitFor() finished in 224ms.
[info] [phantom] Step then 12/24: done in 16600ms.
wait 10 seconds
[info] [phantom] Step anonymous 13/24: done in 16605ms.
[info] [phantom] Step _step 14/24: done in 16625ms.
[info] [phantom] wait() finished waiting for 10000ms.
snap page to page1.pdf
here it hangs with both processes using 100% cpu
The flow looks like this. I’ve added the high wait times to see if it improves anything.
https://epaper.sueddeutsche.de/login
wait 10 seconds
type id_login as username
type id_password as password
click authentication-button
wait 10 seconds
snap page to page1.pdf
https://epaper.sueddeutsche.de/Stadtausgabe/2018-05-22
wait 10 seconds
snap page to page4.pdf
click issue__cover
wait 10 seconds
snap page to page2.pdf
click sz-daily-download-thumb-tray-control
click //a[text()="Ganze Ausgabe speichern"]
wait 10 seconds
snap page to page3.pdf
I’m using https://raw.githubusercontent.com/tebelorg/Tump/master/TagUI_Linux.zip from yesterday.
I run TagUI using docker, which allows me to just upload it to my gitlab and schedule a regular task where I don’t have to care about the system environment and it can just be run on any runner available. The Dockerfile looks like this:
FROM debian:latest
RUN apt-get update \
&& apt-get -y install \
php-cli \
python \
unzip \
wget \
curl \
procps \
&& wget https://raw.githubusercontent.com/tebelorg/Tump/master/TagUI_Linux.zip \
&& unzip TagUI_Linux.zip \
&& rm TagUI_Linux.zip \
&& wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb \
&& dpkg -i google-chrome-stable_current_amd64.deb || true \
&& apt-get install -f -y \
&& dpkg -i google-chrome-stable_current_amd64.deb \
&& rm google-chrome-stable_current_amd64.deb \
&& apt-get -y remove unzip \
&& rm -rf /var/lib/apt/lists
RUN adduser tagui && chmod +r -R /tagui && chown tagui -R /tagui
USER tagui
WORKDIR /s
#ENTRYPOINT ["/tagui/src/tagui"]
And I run it with this docker run -it --rm --privileged --shm-size 256m -v "$PWD/s":/s tagui bash
and invoke TagUI with /tagui/src/tagui sz headless debug
Edit: I’ve tried a minimal flow to take a snap of google and that works fine.
tagui@f2de344576a0:/s$ cat google
https://www.google.com
snap page to google.pdf
~/D/d/d/tagui docker run -it --rm --privileged --shm-size 256m -v "$PWD/s":/s tagui bash
tagui@f2de344576a0:/s$ /tagui/src/tagui google headless debug
START - automation started - Tue May 22 2018 15:56:50 GMT+0000 (UTC)
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: about:blank, HTTP GET
[debug] [phantom] url changed to ""
[debug] [phantom] Successfully injected Casper client-side utilities
https://www.google.com - Google
[info] [phantom] Step anonymous 2/4: done in 1653ms.
snap page to google.pdf
[info] [phantom] Step anonymous 3/4: done in 1763ms.
https://www.google.com/ - Google
FINISH - automation finished - 2.0s
[info] [phantom] Step anonymous 4/4: done in 1967ms.
[info] [phantom] Done 4 steps in 1967ms
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (8 by maintainers)
Ok, I’ll look at it another day and see if I can find out something new or get another idea.