yt-dlp: [prosiebensat1] Unable to extract clip id
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
- I understand that I will be blocked if I intentionally remove or skip any mandatory* field
Checklist
- I’m reporting that yt-dlp is broken on a supported site
- I’ve verified that I’m running yt-dlp version 2023.03.04 (update instructions) or later (specify commit)
- I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
- I’ve read the guidelines for opening an issue
- I’ve read about sharing account credentials and I’m willing to share it if required
Region
Accessible in Germany/Austria/Switzerland. Possibly worldwide.
Provide a description that is worded well enough to be understood
steps to reproduce this issue
- disable Widevine as instructed to confirm video is not DRM-protected
- open URL in Firefox
- click on video
- site redirects to an authentication page
- login (signup is free and fast as only email/firstname/bday are required, but can share login credentials if needed)
- site redirects back to URL from step 2
- video is now playable in Firefox
- run
yt-dlp --no-config -f- -v --cookies-from-browser firefox "URL"
expected result
yt-dlp download should start
actual result
ERROR: [prosiebensat1] tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge: Unable to extract clip id; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template.
possibly related issues
Provide verbose output that clearly demonstrates the problem
- Run your yt-dlp command with -vU flag added (
yt-dlp -vU <your command line>
) - If using API, add
'verbose': True
toYoutubeDL
params instead - Copy the WHOLE output (starting with
[debug] Command-line config
) and insert it below
Complete Verbose Output
yt-dlp --no-config -f- -v --cookies-from-browser firefox "https://www.sat1.at/tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge"
[debug] Command-line config: ['--no-config', '-f-', '-v', '--cookies-from-browser', 'firefox', 'https://www.sat1.at/tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version nightly@2023.06.05.155301 [59d9fe083] (win_exe)
[debug] Python 3.8.10 (CPython AMD64 64bit) - Windows-10-10.0.19045-SP0 (OpenSSL 1.1.1k 25 Mar 2021)
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.05.07, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3
[Cookies] Extracting cookies from firefox
[debug] Extracting cookies from: "C:\Users\User\AppData\Roaming\Mozilla\Firefox\Profiles\4ad8do09.monika\cookies.sqlite"
[Cookies] Extracted 719 cookies from firefox
[debug] Proxy map: {}
[debug] Loaded 1840 extractors
[prosiebensat1] Extracting URL: https://www.sat1.at/tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge
[prosiebensat1] tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge: Downloading webpage
ERROR: [prosiebensat1] tv/videos/der-sat-1-bio-check-aldi-rewe-denns-co-ganze-folge: Unable to extract clip id; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
File "yt_dlp\extractor\common.py", line 703, in extract
File "yt_dlp\extractor\prosiebensat1.py", line 491, in _real_extract
File "yt_dlp\extractor\prosiebensat1.py", line 431, in _extract_clip
File "yt_dlp\extractor\common.py", line 1287, in _html_search_regex
File "yt_dlp\extractor\common.py", line 1251, in _search_regex
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (6 by maintainers)
I haven’t looked at the atv.at pages or (recently) any of the sites supported in the existing ProSiebenSat1 extractor. However, you can get metadata from the page under test in two ways (at least). The method
InfoExtractor._search_ld_json()
returns an info_dict like this:The method
InfoExtractor._search_nextjs_data()
returns the page hydration JSON, where the.props.pageProps.info
member is full of metadata.The atv.at extractor logic looks for hydration JSON in the webpage containing video IDs and then makes the vas-v4.p7s1video.net request using a JWT with those IDs (
content_ids
). However for prosieben.de, the videos object was empty ({}
) when the web client made the equivalent request, and the video ID that was the key can be extracted from the end of the URL:The vas-v4.p7s1video.net
getsources
API is geo-restricted. As you saw, this API host is already used in the existing yt-dlp ProSiebenSat1 extractor, but with different endpoints (eggeturls
) and relying on other API hosts as removed in PR #5593. Presumably it’sgeturls
vsgetsources
that causes the old code to return the “eingestellt” video.The problem mentioned above is that
ProSiebenSat1BaseIE
is the base class ofPuls4IE
, which apparently still works. Therefore a separate or intermediate base class is needed forProSiebenSat1IE
, ideally one that could be used as a base class to simplifyATVAtIE
. If any of the other sites supported by the existingProSiebenSat1IE
extractor might still be handled correctly, it will have to be cloned so that the module hasProSiebenSat1IE
for those sites andProSiebenSat1v2IE
for prosieben.de and any other broken sites (eg, sat1.de from OP).Thanks for the hint with the HAR files!
Here are more findings…
The JWT key is not in the HAR file. It is calculated using JavaScript in In https://oasis-player-prod.p7s1.io/web/15.18.0/bootstrap/bootstrap.js
The JWT token is signed using algorithm “HS256”. The secret can be found using the Chrome Developer Tools (“Sources” tab). I could set a breakpoint at the sign() method in “webpack:///node_modules/.pnpm/jwt-encode@1.0.1/node_modules/jwt-encode/src/index.js” to see the secrets.
The secrets are:
(It should be tested if the encryption key is individual for each video or not)
Unfortunately, I don’t have experience with Python or the yt-dlp code, so I have no idea how to implement it…