yt-dlp: yt-dlp fails to parse MPD manifest: KeyError('sourceURL')

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I intentionally remove or skip any mandatory* field

Checklist

  • I’m reporting a bug unrelated to a specific site
  • I’ve verified that I’m running yt-dlp version 2023.09.24 (update instructions) or later (specify commit)
  • I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched known issues and the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I’ve read the guidelines for opening an issue

Provide a description that is worded well enough to be understood

Disclaimer: I checked all the boxes to advance in the process.

Dear developers and maintainers,

I have no idea, if the MPD file conforms to the standard. Downloading it with ffmpeg also fails, but maybe due to missing Header attributes. Please decide for yourself, if the MPD parsing needs to be changed or maybe you can tell me, if this particular format is too anomalous.

Best regards Marcel

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • If using API, add 'verbose': True to YoutubeDL params instead
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd', '--no-config', '-v']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8 (No ANSI), error utf-8 (No ANSI), screen utf-8 (No ANSI)
[debug] yt-dlp version stable@2023.09.24 [088add956] (pip)
[debug] Python 3.10.6 (CPython x86_64 64bit) - Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.31 (OpenSSL 1.1.1f  31 Mar 2020, glibc 2.31)
[debug] exe versions: ffmpeg 4.2.7, ffprobe 4.2.7
[debug] Optional libraries: Cryptodome-3.19.0, brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, sqlite3-3.31.1, websockets-11.0.3
[debug] Proxy map: {}
[debug] Extractor Plugins: Auf1IE, Auf1RadioIE, BrighteonIE, BrighteonRadioIE, BrighteonTvIE, PmWissenIE, PmWissenSearchIE, ServusSearchIE, ServusTVIE
[debug] Plugin directories: ['python3.10/site-packages/yt_dlp_plugins']
[debug] Loaded 1895 extractors
[generic] Extracting URL: https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage
WARNING: [generic] Falling back on generic information extractor
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information
ERROR: An extractor error has occurred. (caused by KeyError('sourceURL')); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "python3.10/site-packages/yt_dlp/extractor/common.py", line 715, in extract
    ie_result = self._real_extract(url)
  File "python3.10/site-packages/yt_dlp/extractor/generic.py", line 2535, in _real_extract
    info_dict['formats'], info_dict['subtitles'] = self._parse_mpd_formats_and_subtitles(
  File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2734, in _parse_mpd_formats_and_subtitles
    representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
  File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2618, in extract_multisegment_info
    extract_Initialization(segment_list)
  File "python3.10/site-packages/yt_dlp/extractor/common.py", line 2613, in extract_Initialization
    ms_info['initialization_url'] = initialization.attrib['sourceURL']
KeyError: 'sourceURL'

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Comments: 16 (11 by maintainers)

Most upvoted comments

I don’t think that plugin did anything special about this issue, it just skips problematic MPDs.

As for these these dash formats it does return, maybe it just extracted them from another non-problematic MPDs?

Edit: wait, that’s your plugin! Then I have no idea what you meant.

I just described which formats should be expected when the MPD is finally parsed. My plugin does not try to solve the issue, as MPD-parsing is a yt-dlp core functionality.

@dirkf: If you use the generic extractor then you also get all formats. Just that the mp3 is an additional playlist item, but that’s how the extractor works, I suppose…

In https://github.com/ytdl-org/youtube-dl/issues/32595#issuecomment-1761209532, I back-ported yt-dlp’s _parse_mpd_formats_and subtitles() and modified it to address this issue.

The old code instantiated a BaseURL at the representation level by merging BaseURLs up the XML hierarchy and finally adding default URL components from the mpd_base_url, but didn’t use any default for media URL attributes.

My approach was to pull out the BaseURL processing so that as the hierarchy is descended whatever BaseURL has been constructed so far can be passed, if it isn’t a partial path, with key base_url in the parent info, and then used as a default for any missing media URLs.

There may be better ways. This sort of DASH format may even be invalid. But this is what happens with OP’s link:

$ python -m youtube_dl -v -F 'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://video.brighteon.com/file/BTBucket-Prod/dash/b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a.mpd']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 66ab0814c
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1w  11 Sep 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Requesting header
WARNING: Falling back on generic information extractor.
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Downloading webpage
[generic] b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a: Extracting information
[info] Available formats for b00477ec-6e1b-4ab8-a42b-43b6cdf18c0a:
format code  extension  resolution note
1            m4a        audio only [eng] DASH audio    0k , m4a_dash container, mp4a.40.2 (44100Hz)
0            mp4        480x270    [eng] DASH video  300k , mp4_dash container, avc1.640015, video only
2            mp4        960x540    [eng] DASH video  600k , mp4_dash container, avc1.64001f, video only (best)
$ 

This MPD is using an Initialization element that does not include a sourceURL attribute. It only includes a range attribute that refers to a higher-level BaseURL. yt-dlp is assuming that sourceURL is always present.

BTW, dash-mpd-cli downloads this content fine.