youtube-dl: SBS on demand. One of six episodes fails to download.
Checklist
- I’m reporting a broken site support issue
- I’ve verified that I’m running youtube-dl version 2021.12.17
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar bug reports including closed ones
- I’ve read bugs section in FAQ
Verbose log
$ youtube-dl -v https://www.sbs.com.au/ondemand/watch/2175290435999
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.sbs.com.au/ondemand/watch/2175290435999']
WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
[debug] Encodings: locale ANSI_X3.4-1968, fs ANSI_X3.4-1968, out ANSI_X3.4-1968, pref ANSI_X3.4-1968
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.17 (CPython) - Linux-5.4.2-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[SBS] 2175290435999: Downloading JSON metadata
[ThePlatform] 8Eexyds5RzGA: Downloading SMIL data
[ThePlatform] 8Eexyds5RzGA: Downloading MPD manifest
WARNING: Failed to download MPD manifest: HTTP Error 403: Forbidden
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
ie_result = self._real_extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/theplatform.py", line 309, in _real_extract
self._sort_formats(formats)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 1374, in _sort_formats
raise ExtractorError('No video formats found')
ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Description
This particular episode (https://www.sbs.com.au/ondemand/watch/2175290435999) fails to download. Other episodes (2175290435997, 2175290435996 etc) download correctly. The episode is viewable in the browser directly (https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999)
As a side note, attempting this browser viewing URL with youtube-dl results in an Unsupported URL’ message, so the correct URL to use which results in other successful downloads is the URL originally used.
$ youtube-dl -v https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999']
WARNING: Assuming --restrict-filenames since file system encoding cannot encode all characters. Set the LC_ALL environment variable to fix this.
[debug] Encodings: locale ANSI_X3.4-1968, fs ANSI_X3.4-1968, out ANSI_X3.4-1968, pref ANSI_X3.4-1968
[debug] youtube-dl version 2021.12.17
[debug] Python version 2.7.17 (CPython) - Linux-5.4.2-x86_64-with-glibc2.2.5
[debug] exe versions: ffmpeg 4.4, ffprobe 4.4
[debug] Proxy map: {}
[generic] 2175290435999: Requesting header
WARNING: Falling back on generic information extractor.
[generic] 2175290435999: Downloading webpage
[generic] 2175290435999: Extracting information
ERROR: Unsupported URL: https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 2477, in _real_extract
doc = compat_etree_fromstring(webpage.encode('utf-8'))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2571, in compat_etree_fromstring
doc = _XML(text, parser=etree.XMLParser(target=_TreeBuilder(element_factory=_element_factory)))
File "/usr/local/bin/youtube-dl/youtube_dl/compat.py", line 2560, in _XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1659, in feed
self._raiseerror(v)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1523, in _raiseerror
raise err
ParseError: syntax error: line 1, column 0
Traceback (most recent call last):
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 815, in wrapper
return func(self, *args, **kwargs)
File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 836, in __extract_info
ie_result = ie.extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 534, in extract
ie_result = self._real_extract(url)
File "/usr/local/bin/youtube-dl/youtube_dl/extractor/generic.py", line 3489, in _real_extract
raise UnsupportedError(url)
UnsupportedError: Unsupported URL: https://www.sbs.com.au/ondemand/tv-series/cobra/season-2/cobra-s2-ep6/2175290435999
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 47 (13 by maintainers)
Now we know (how easy is this?):
https://www.sbs.com.au/api/v3/video_smil?id={video_id}BTW, I use the nightly build of yt-dlp, not youtube-dl, which hasn’t worked in ages.
gswan your trouble appears to be either a transient issue or some problem at your end since it is working fine for me, with both the show you tried to download (https://www.sbs.com.au/ondemand/watch/2208035907943) and also another randomly chosen show.
I would recommend that if you have an issue with SBS first check here: (https://forums.whirlpool.net.au/thread/2699206?p=-1&#bottom) to confirm whether the issue is with yt-dlp or something else (eg your end or the SBS servers) before posting on yt-dlp. Just a suggestion. The discussion on that forum is these days pretty much limited to using yt-dlp (and frontends) since other methods no longer work. Response is often within an hour.
And arrives: #31880. Local testing needed.
Comparing the
odwebsiteandtvJSON blocks, the former identifies as aStreamProviderld+json object (non-standard, in accordance with the 404 non-standard http://www.sbs.com.au/schemas/) while the latter is a (superset of the standard)VideoObject.The
StreamProviderhas the SMIL expanded and a Chromecast block. For our purposes thetvJSON is equivalent and easier to get.In the context of steps 1-4 above, this adds actual thumbnail URLs which aren’t available from the catalogue.pr.sbsod.com endpoints (only IDs that, so far at least, we don’t know how to resolve: https://www.sbs.com.au/api/v3/video_image/getimage disappointingly needs an actual URL passed in its query parameters).
With all this, my draft extractor is passing its test, albeit with a different MD5 for the download fragment, but just returning the detected geo-restriction error for the problem video here. PR beckons.
How interesting:
I don’t know how to read a smil file, but what I did notice was it has a bunch of links to
https://sbs-vod-prod-01.akamaized.net/Content/HLS_AES_TSO/VOD/geo/12986/2483/a3330a06-4a56-45fd-911e-110540bf8c7a/9439a831-9b13-6280-ed4f-cb66148195cb/master.m3u8, the exact url that context=oddesktop returned as it’s contenturl. I assume that means thatcontext=tvultimately links to the same file ascontext=oddesktop, but without requiring authenticationWhoops, I posted the wrong json:
context=odwebsite
I just visually compared the manifest files, and the main difference is that if context isn’t provided then it contains a bunch of mentions of
https://securepubads.g.doubleclick.netGET-ing “https://www.sbs.com.au/api/v3/video_smil?id=2175290435999” (no proxy, no authentication required) and inspecting the downloaded SMIL, the URI to the master HLSe manifest is:Then, with a whitelisted AU HTTPS proxy:
I just omitted all the not obviously required query parameters and found that I got a bigger SMIL manifest than when a
contextwas provided. Let’s see if it’s a good link…The data from the
/v3/video_streamendpoint looks like well-formed ld+json.I just noticed that at the exact same time 🤣 I’ll just quickly check to see if that is the same url that the authenticated context=odwebsite api provides, since I found it on the unauthenticated context=tv api, and I’m guessing you did too (correct me if I’m wrong)
Haha, context=tv lets me completely bypass the login screen :trollface:
EDIT: I’ll verify later if it gives me the same result as logging in
By the way, it took me ages to figure out, but odwebsite means on demand website (the website is called sbs on demand)
yeah, that’s the same url I got last night, but I also got stuck on the authorisation bit. I have an account, but I don’t know much about session ids and headers. I was planning on worling on that tonight, but I’m happy to send you my login details instead and you can work on it
PS: the old api didn’t require authentication, so it sucks if this one does. This workaround (shameless self-promotion) doesn’t require one, so it should probably be added as a permanent fallback
The URL wouldn’t work anyway.
This is what we’re after:
Are either of these found in a previous response?
So, it’s a new set of APIs.
video_id = '2175290435999'detailsfromhttps://catalogue.pr.sbsod.com/mpx-media/{video_id}, including:series_slug = details['seriesSlug']https://catalogue.pr.sbsod.com/tv-series/{series_slug}There must be another response with the media link.
Here's the last few request/responses before it starts serving up the content chunks.
Thanks for the suggestion. I switched off DRM in FF (unchecked “Play DRM-controlled content”) and it played OK still. The URL in FF appeared as: https://www.sbs.com.au/ondemand/watch/2175290435999 I can use tcpdump to capture the packet interchange if you like, but I’m not sure if that will show anything interesting.
Can you play the show in FF with DRM (EME) disabled?
If not, SBS is now encrypting its media and yt-dl is no help with such pages. If DRM affects all site media, the site could be marked as not working; if only some, the extractor should report the problem instead of “No video formats found”. Or perhaps there is some new access protocol for the failing MPD manifest that would reveal playable formats.
BTW you can format your logs by putting triple backquotes (```) above and below, and append
consoleto the top triple for extra formatting credit.