yt-dlp: [Twitch] Unable to download chat replay (--sub-langs rechat)

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I remove or skip any mandatory* field

Checklist

  • I’m reporting a broken site
  • I’ve verified that I’m running yt-dlp version 2022.11.11 (update instructions) or later (specify commit)
  • I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I’ve read the guidelines for opening an issue
  • I’ve read about sharing account credentials and I’m willing to share it if required

Region

Luxembourg/Europe

Provide a description that is worded well enough to be understood

Cannot download live chat replay (comments) from any recorded live stream on Twitch.

Example video:

$ yt-dlp -vU --skip-download --write-subs --sub-langs rechat https://www.twitch.tv/videos/1670416229

We get:

yt_dlp.utils.DownloadError: Unable to download video subtitles for 'rechat': HTTP Error 410: Gone

It is probably a problem with client_id not being allowed by Twitch API any more, because in output we see:

[debug] Invoking http downloader on “https://api.twitch.tv/v5/videos/1670416229/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko

When we go to the URL with a normal web browser, we get Message:

This api.twitch.tv page can’t be found
It may have been moved or deleted.
HTTP ERROR 410

I am using rechat, because it is on the list of available subtitles:

$ yt-dlp -vU --skip-download --list-subs https://www.twitch.tv/videos/1670416229
[debug] Command-line config: ['-vU', '--skip-download', '--list-subs', 'https://www.twitch.tv/videos/1670416229']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.11.11 [8b64402] (linux_exe)
[debug] Python 3.10.6 (CPython x86_64 64bit) - Linux-5.15.0-56-generic-x86_64-with-glibc2.35 (OpenSSL 3.0.7 1 Nov 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1723 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.11.11, Current version: 2022.11.11
yt-dlp is up to date (2022.11.11)
[debug] [twitch:vod] Extracting URL: https://www.twitch.tv/videos/1670416229
[twitch:vod] 1670416229: Downloading stream metadata GraphQL
[twitch:vod] 1670416229: Downloading video access token GraphQL
[twitch:vod] 1670416229: Downloading m3u8 information
[twitch:vod] 1670416229: Downloading storyboard metadata JSON
WARNING: [twitch:vod] Unable to download JSON metadata: HTTP Error 403: Forbidden
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available subtitles for v1670416229:
Language Formats
rechat   json

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

[debug] Command-line config: ['-vU', '--skip-download', '--write-subs', '--sub-langs', 'rechat', 'https://www.twitch.tv/videos/1670416229']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.11.11 [8b64402] (linux_exe)
[debug] Python 3.10.6 (CPython x86_64 64bit) - Linux-5.15.0-56-generic-x86_64-with-glibc2.35 (OpenSSL 3.0.7 1 Nov 2022, glibc 2.35)
[debug] exe versions: ffmpeg 4.4.2 (setts), ffprobe 4.4.2, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.09.24, mutagen-1.46.0, sqlite3-2.6.0, websockets-10.4
[debug] Proxy map: {}
[debug] Loaded 1723 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2022.11.11, Current version: 2022.11.11
yt-dlp is up to date (2022.11.11)
[debug] [twitch:vod] Extracting URL: https://www.twitch.tv/videos/1670416229
[twitch:vod] 1670416229: Downloading stream metadata GraphQL
[twitch:vod] 1670416229: Downloading video access token GraphQL
[twitch:vod] 1670416229: Downloading m3u8 information
[twitch:vod] 1670416229: Downloading storyboard metadata JSON
WARNING: [twitch:vod] Unable to download JSON metadata: HTTP Error 403: Forbidden
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] v1670416229: Downloading subtitles: rechat
[debug] Default format spec: bestvideo*+bestaudio/best
[info] v1670416229: Downloading 1 format(s): 1080p60
[info] Writing video subtitles to: GlobiHorror - Une soirée d'horreur pour se faire peur avant Noël ! #Horreur #Peur #Halloween [v1670416229].rechat.json
[debug] Invoking http downloader on "https://api.twitch.tv/v5/videos/1670416229/comments?client_id=kimne78kx3ncx6brgo4mv6wki5h1ko"
ERROR: Unable to download video subtitles for 'rechat': HTTP Error 410: Gone
Traceback (most recent call last):
  File "yt_dlp/YoutubeDL.py", line 3950, in _write_subtitles
  File "yt_dlp/YoutubeDL.py", line 2924, in dl
  File "yt_dlp/downloader/common.py", line 446, in download
  File "yt_dlp/downloader/http.py", line 371, in real_download
  File "yt_dlp/downloader/http.py", line 129, in establish_connection
  File "yt_dlp/YoutubeDL.py", line 3692, in urlopen
  File "urllib/request.py", line 525, in open
  File "urllib/request.py", line 634, in http_response
  File "urllib/request.py", line 563, in error
  File "urllib/request.py", line 496, in _call_chain
  File "urllib/request.py", line 643, in http_error_default
urllib.error.HTTPError: HTTP Error 410: Gone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "yt_dlp/YoutubeDL.py", line 1485, in wrapper
  File "yt_dlp/YoutubeDL.py", line 1582, in __extract_info
  File "yt_dlp/YoutubeDL.py", line 1641, in process_ie_result
  File "yt_dlp/YoutubeDL.py", line 2737, in process_video_result
  File "yt_dlp/YoutubeDL.py", line 2983, in process_info
  File "yt_dlp/YoutubeDL.py", line 3958, in _write_subtitles
yt_dlp.utils.DownloadError: Unable to download video subtitles for 'rechat': HTTP Error 410: Gone

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 21 (13 by maintainers)

Most upvoted comments

Capture data

Open a new tab in your web browser, open the develeoper tools and its network tab for it, and load Twitch. The order is important because this way the devtools wont miss traffic from the beginning.

image

Extract identifiers

You will need 2 or 3 things from here:

  • Device ID
  • Client-Integrity

Device ID: this is stored in the unique_id cookie and is set in the first request (which is the document itself). Although yt-dlp can extract this from your browser if you give it the --cookies-from-browser firefox arguments, (for now) it wont know how it should be used. It would attach it to every request as a cookie, but we need something else.

Last I checked this expires in a year, not counting from loading the page, but instead from when it was generated.

image

Client-Integrity: you can find this either in the token field in the JSON response of the latest https://gql.twitch.tv/integrity request, or in requests which use this token in the Client-Integrity header.

Last I checked this last a day, so you may have to re-obtain this regularly.

When it is obtained When it is used
image image

Use identifiers

Open the yt_dlp/extractor/twitch.py file from the repository, and find the yt_dlp.extractor.twitch.TwitchBaseIE._download_base_gql function.

The headers is a dictionary/map that contains what HTTP headers should yt-dlp use when making certain GQL requests. Insert into it the Device ID you obtained with the X-Device-Id key, and the Client Integrity token with the Client-Integrity key. It should look like this:

    def _download_base_gql(self, video_id, ops, note, fatal=True):
        headers = {
            'Content-Type': 'text/plain;charset=UTF-8',
            'Client-ID': self._CLIENT_ID,
        }
        gql_auth = self._get_cookies('https://gql.twitch.tv').get('auth-token')
        if gql_auth:
            headers['Authorization'] = 'OAuth ' + gql_auth.value
        # else:
            # headers['Authorization'] = 'undefined'

        headers["X-Device-Id"] = "your Device ID goes here"
        headers["Client-Integrity"] = "your Client-Integrity token goes here"

        return self._download_json(
            'https://gql.twitch.tv/gql', video_id, note,
            data=json.dumps(ops).encode(),
            headers=headers, fatal=fatal)

There shouldn’t be any more changes to this function. Please don’t copy and paste the above snippet, as I don’t know if there have been changes to this function since I last pulled, but only take it as an illustration.

When you are done, try running yt-dlp for a VOD where you previously couldn’t obtain all messages. Be sure to actually run yt-dlp from this source code you just edited, not the installed version, and also run it at a temporary output directory, so as to not contaiminate your usual one with bad files, maybe even overwriting good ones. If it looks good, you should be ok to run it with the usual output directory too. If you use archive files (--download-archive), don’t forget to remove the IDs of the failed VODs from it, so that yt-dlp does not skip them when retrying.

@donnaken15 did you enable downloading the live chat? You need to add live_chat to the list of downloaded subtitle languages, even if you use all, like this: --sub-langs all,live_chat

the only “subs” that come up is “rechat” just tried this yt-dlp --sub-langs all,live_chat --write-sub --skip-download https://www.twitch.tv/videos/... --verbose https://gist.github.com/donnaken15/31af64223eccfed160c287eddf13c782

patched code:

	def _download_base_gql(self, video_id, ops, note, fatal=True):
		headers = {
			'Content-Type': 'text/plain;charset=UTF-8',
			'Client-ID': self._CLIENT_ID,
		}
		gql_auth = self._get_cookies('https://gql.twitch.tv').get('auth-token')
		if gql_auth:
			headers['Authorization'] = 'OAuth ' + gql_auth.value
		headers["Device-ID"] = "obtained from browser"
		headers["Client-Integrity"] = "obtained from browser"
		return self._download_json(
			'https://gql.twitch.tv/gql', video_id, note,
			data=json.dumps(ops).encode(),
			headers=headers, fatal=fatal)

realized I accidentally didn’t put “X-” in front of Device-ID, but it still fails

extractor options with the cases being preserved

_configuration_arg() has a casesense param