yt-dlc: [Broken] Youtube sometimes fails with "Unable to extract video data" for the same video.

Checklist

  • I’m reporting a broken site support
  • I’ve verified that I’m running youtube-dlc version 2020.10.26
  • I’ve checked that all provided URLs are alive and playable in a browser
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: [] 
[debug] User config: []
[debug] Custom config: [] 
[debug] Command-line args: [u'--no-warnings', u'--dump-json', u'--verbose', u'--id', u'--', u'6-8E4Nirh9s'] 
[debug] Loading archive file None 
[debug] Encodings: locale UTF-8, fs UTF-8, out None, pref UTF-8 
[debug] youtube-dlc version 2020.10.25 [debug] Python version 2.7.16 (CPython) - Linux-4.19.0-10-amd64-x86_64-with-debian-10.6 
[debug] exe versions: ffmpeg 4.1.6-1, ffprobe 4.1.6-1 
[debug] Proxy map: {} ERROR: 6-8E4Nirh9s: YouTube said: Unable to extract video data
Traceback (most recent call last): 
File "./youtube-dlc/youtube_dlc/YoutubeDL.py", line 830, in extract_info ie_result = ie.extract(url) 
File "./youtube-dlc/youtube_dlc/extractor/common.py", line 532, in extract ie_result = self._real_extract(url) 
File "./youtube-dlc/youtube_dlc/extractor/youtube.py", line 1866, in _real_extract 'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id) 
ExtractorError: 6-8E4Nirh9s: YouTube said: Unable to extract video data

Description

Currently, about 3% of video analysis for youtube fails with “Unable to extract video data”, even when always using the same URL. The problem ist that sometimes Youtube delivers a player page which does not contain the expected “args”: “player_response” object (see youtube.py - _real_extract()).

I did not have time to investigate further today, therefore I open this issue for others to investigate. I will update this if I find out more, if someone else finds something, please add to this issue 😃

I attached a working and a non-working reponse to this issue for further investigation. responses.zip

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Took a closer look at this issue. Youtube is serving different page response depending on your account (and it appears on some videos at random), they are probably testing a new page.

youtube._get_ytplayer_config searches for the ytconfig_section from the webpage using the following regex patterns

            r';ytplayer\.config\s*=\s*({.+?});ytplayer',
            r';ytplayer\.config\s*=\s*({.+?});',

neither of these two will find anything in the new response.

using r'ytInitialPlayerResponse\s*=\s*({.+?});var meta' I was able to extract something resembling player_response from the old pages.

Old Page

<script >var ytplayer = ytplayer || {};ytplayer.config = {"args":{"player_response":"{\"responseContext\":{\"serviceTrackingParams\":...{\"serializedSlotAdServingDataEntry\":\"\"}}}]}"}};ytplayer.web_player_context_config

New Page

<script nonce="yQgGLM0ZubpmF+JbvjNyLQ">var ytInitialData = {"responseContext"..."adSlotLoggingData": {..."serializedSlotAdServingDataEntry": ""}}}]};var meta = document.createElemen

As for using the new format, I’m not sure where to get the info that would have previously been associated with ytplayer_config[‘args’], but ignoring that section and using the matched string returned from _get_ytplayer_config as player_response successfully downloaded a video while using cookies to log into an account that was being served the new webpage.

Temporary work around I’m using. Does not cover all the usages of the old ytconfig: https://github.com/abayochocoball/yt-dlc/commit/c4ac92bbc8ffcb8626543a097efbbbfada65d6c6

@abayochocoball Wow perfect, thank you 😃 I checked the complete output of the new ytInitialPlayerResponse vs the old player_response (with a diff-tool) and it does not only resemble the old one, its (almost) exactly the same!

The only other usage of _get_ytplayer_config() is in _get_automatic_captions(): image

I applied your part of the fix, then continued to fix the automatic caption extraction. It works 👍

./check_extract_bug.sh
Success rate: 100.00% (320/320), Captions: 32/288/0, Video: 32/288/0

The 3 numbers are “old player”/“new player”/“some other method”. Seems like YouTube picked up the pace since yesterday, now its 10% of the calls using the new player.

I opened a pull request with the fix, feel free to comment: #68

I literally just installed it (via pip) and it failed on a video with the same error on the first try. Second try worked fine though.

Well it should be fixed with next release. At least for me the newest version is working. Will let this open for some days.

@peet1993 Just FYI, maybe you should use python3 (not particularly for this issue). I have an alias for that in .bash_aliases (is in your /home folder; if not, create it). alias youtube-dlc='python3 /usr/local/bin/youtube-dlc'

Thanks for the hint, I usually use python3, it seems I had my IDE configured incorrectly 😅