yt-dlp: yt-dlp fails to parse MPD manifest: KeyError('sourceURL')
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
- I understand that I will be blocked if I intentionally remove or skip any mandatory* field
Checklist
- I’m reporting a bug unrelated to a specific site
- I’ve verified that I’m running yt-dlp version 2023.09.24 (update instructions) or later (specify commit)
- I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
- I’ve read the guidelines for opening an issue
Provide a description that is worded well enough to be understood
Disclaimer: I checked all the boxes to advance in the process.
Dear developers and maintainers,
I have no idea, if the MPD file conforms to the standard. Downloading it with ffmpeg also fails, but maybe due to missing Header attributes. Please decide for yourself, if the MPD parsing needs to be changed or maybe you can tell me, if this particular format is too anomalous.
Best regards Marcel
Provide verbose output that clearly demonstrates the problem
- Run your yt-dlp command with -vU flag added (
yt-dlp -vU <your command line>
) - If using API, add
'verbose': True
toYoutubeDL
params instead - Copy the WHOLE output (starting with
[debug] Command-line config
) and insert it below
Complete Verbose Output
[debug] Command-line config: ['https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd', '--no-config', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8 (No ANSI), error utf-8 (No ANSI), screen utf-8 (No ANSI)
[debug] yt-dlp version stable@2023.09.24 [088add956] (pip)
[debug] Python 3.10.6 (CPython x86_64 64bit) - Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 (OpenSSL 1.1.1f 31 Mar 2020, glibc 2.31)
[debug] exe versions: ffmpeg 4.2.7, ffprobe 4.2.7
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, sqlite3-3.31.1, websockets-11.0.3
[debug] Proxy map: {}
[debug] Extractor Plugins: Auf1IE, Auf1RadioIE, BrighteonIE, BrighteonRadioIE, BrighteonTvIE, PmWissenIE, PmWissenSearchIE, ServusSearchIE, ServusTVIE
[debug] Plugin directories: ['python3.10/site-packages/yt_dlp_plugins']
[debug] Loaded 1895 extractors
[generic] Extracting URL: https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information
ERROR: An extractor error has occurred. (caused by KeyError('sourceURL')); please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U
File "python3.10/site-packages/yt_dlp/extractor/common.py", line 715, in extract
ie_result = self._real_extract(url)
File "python3.10/site-packages/yt_dlp/extractor/generic.py", line 2535, in _real_extract
info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles(
File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2734, in _parse_mpd_formats_and_subtitles
representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2618, in extract_multisegment_info
extract_Initialization(segment_list)
File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2613, in extract_Initialization
ms_info['initialization_url'] = initialization.attrib['sourceURL']
KeyError: 'sourceURL'
About this issue
- Original URL
- State: open
- Created 9 months ago
- Comments: 16 (11 by maintainers)
I just described which formats should be expected when the MPD is finally parsed. My plugin does not try to solve the issue, as MPD-parsing is a yt-dlp core functionality.
@dirkf: If you use the generic extractor then you also get all formats. Just that the mp3 is an additional playlist item, but that’s how the extractor works, I suppose…
In https://github.com/ytdl-org/youtube-dl/issues/32595#issuecomment-1761209532, I back-ported yt-dlp’s
_parse_mpd_formats_and subtitles()
and modified it to address this issue.The old code instantiated a
BaseURL
at therepresentation
level by mergingBaseURL
s up the XML hierarchy and finally adding default URL components from thempd_base_url
, but didn’t use any default for media URL attributes.My approach was to pull out the
BaseURL
processing so that as the hierarchy is descended whateverBaseURL
has been constructed so far can be passed, if it isn’t a partial path, with keybase_url
in the parent info, and then used as a default for any missing media URLs.There may be better ways. This sort of DASH format may even be invalid. But this is what happens with OP’s link:
This MPD is using an
Initialization
element that does not include asourceURL
attribute. It only includes arange
attribute that refers to a higher-levelBaseURL
. yt-dlp is assuming thatsourceURL
is always present.BTW, dash-mpd-cli downloads this content fine.