yt-dlp: [facebook] Cannot parse data

Checklist

  • I’m reporting a broken site
  • I’ve verified that I’m running yt-dlp version 2022.06.29 (update instructions) or later (specify commit)
  • I’ve checked that all provided URLs are playable in a browser with the same IP and same login details
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones. DO NOT post duplicates
  • I’ve read the guidelines for opening an issue
  • I’ve read about sharing account credentials and I’m willing to share it if required

Region

Bangladesh

Provide a description that is worded well enough to be understood

Downloading worked perfectly fine a just few days ago but now any facebook video link in “facebook.com/. . ./ videos/ . . .” format throws this error. I have checked a similar type of issue #4289 , but it works fine on my side. The aforementioned link format, however, shows this error

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

❯ yt-dlp.exe -vU "https://www.facebook.com/100015388953240/videos/424804832900683/"
[debug] Command-line config: ['-vU', 'https://www.facebook.com/100015388953240/videos/424804832900683/']
[debug] Encodings: locale cp1252, fs utf-8, pref cp1252, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2022.06.29 [9d339c4] (win32_exe)
[debug] Python 3.8.10 (CPython 64bit) - Windows-10-10.0.19044-SP0
[debug] Checking exe version: ffmpeg -bsfs
[debug] Checking exe version: ffprobe -bsfs
[debug] exe versions: ffmpeg 5.0-full_build-www.gyan.dev (setts), ffprobe 5.0-full_build-www.gyan.dev
[debug] Optional libraries: Cryptodome-3.15.0, brotli-1.0.9, certifi-2022.06.15, mutagen-1.45.1, sqlite3-2.6.0, websockets-10.3
[debug] Proxy map: {}
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
[debug] Downloading _update_spec from https://github.com/yt-dlp/yt-dlp/releases/download/2022.06.29/_update_spec
Latest version: 2022.06.29, Current version: 2022.06.29
yt-dlp is up to date (2022.06.29)
[debug] [facebook] Extracting URL: https://www.facebook.com/100015388953240/videos/424804832900683/
[facebook] 424804832900683: Downloading webpage
[facebook] 424804832900683: Downloading webpage
ERROR: [facebook] 424804832900683: Cannot parse data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp\extractor\common.py", line 640, in extract
  File "yt_dlp\extractor\facebook.py", line 713, in _real_extract
  File "yt_dlp\extractor\facebook.py", line 659, in _extract_from_url

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 14
  • Comments: 96 (31 by maintainers)

Commits related to this issue

Most upvoted comments

I am pretty sure it’s an account-related problem, excessive use of yt-dlp can cause some kind of “invisible ban”.

Attempting to use the cookies of an account with this kind of “ban” (even with public videos that are downloaded normally with no cookies on the same machine) will result in the error mentioned in this issue, another attempt within 0 - 10 minutes will lock the account and “someone may have tried to access your account”.

In the past (until about a few months ago) you could easily know you currently have this kind of “ban” as almost every non-public video you normally have access to would give “content isn’t available at the moment”, and you could easily know when it ends. Currently, everything works as expected on browser but using yt-dlp with cookies causes the aforementioned error, and there is no way to know if the “ban” has ended other than trial and error.

I am 95% sure about this, but the only way to prove this theory is to use a clean account on a clean IP.

This theory is only partially correct in my case. I tried with this public video in the following scenarios:

  • yt-dlp 2021.12.01 + with cookies -> success
  • yt-dlp 2021.12.01 + no cookies -> success
  • yt-dlp 2021.12.25 (or above) + with cookies -> failed
  • yt-dlp 2021.12.25 (or above) + no cookies -> success
  • youtube-dl 2021.12.17 + with cookies -> success
  • youtube-dl 2021.12.17 + no cookies -> success

I’m pretty sure this is a regression in yt-dlp 2021.12.25

Actually, true. Something has changed in Facebook (again) and now not even youtube-dl is able to download it. Moreover, Facebook has been actively locking accounts if you use --cookies. Better to be safe than sorry. 🙂

This does not work with yt-dlp. Though as a test, I downloaded the same private (group) video with youtube-dl manually running youtube-dl -v --cookies "$HOME\Documents\ytdl-cookies\facebook_cookies.txt" somePrivateVideoUrl) over ten times in less than 5 minutes with no “shadow ban” so this is definitely a bug with youtube-dlp.

I will even go as far as to running a script that will repeat this over a couple hours I’m so confident there’s no “shadow ban” for using cookies to prove this is a youtube-dlp bug. As for the other reports saying otherwise, I will say that Facebook is held together with sticks and glue anymore so random errors are prone unless there’s some niche cases I’m not catching.

Per #7839, this error now occurs with nearly every FB URL

I can confirm youtube-dl works while yt-dlp does not. I have downloaded a few random videos with youtube-dl without any issue, while yt-dlp keep saying Cannot parse data.

Then it was almost certainly https://github.com/yt-dlp/yt-dlp/commit/d76d15a6699dc41eea26a96d054a1b7bcb12c69b that broke the FB “Tahoe” fallback

Indeed it was! 👍 ; to sum it up:

The Tahoe API endpoint “fallback” in recent yt-dlp versions is broken because of:

  1. Update our chrome versions used for User-Agents; the cut-off Chrome version in this case is 77 (>=78 gets the door shut on its face 😞 ); “us” engaged in retrocomputing have a saying: “Newer isn’t always better”, which, I think, fits perfectly here 😜 …
  2. [utils] Add Sec-Fetch-Mode to std_headers; this is worked-around by --add-header "Sec-Fetch-Mode:" (thanks @dirkf 😄 ).

Putting 1+2 into practice 😉, with latest yt-dlp-nightly:

yt-dlp -vF --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" --add-header "Sec-Fetch-Mode:" "https://www.facebook.com/gazzetta.gr/videos/1282182232423557/" => 

[debug] Command-line config: ['--ffmpeg-location', '..\\FFmpeg', '--downloader-args', 'ffmpeg:-v 8 -stats', '-vF', '--user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36', '--add-header', 'Sec-Fetch-Mode:', 'https://www.facebook.com/gazzetta.gr/videos/1282182232423557/']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.09.04.183545 [69dbfe01c] (win_x86_exe)
[debug] Python 3.7.9 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 1.1.1g  21 Apr 2020)
[debug] exe versions: ffmpeg n6.1-dev-1945-N-111829-g3c9dc0 (setts), ffprobe n6.1-dev-1945-N-111829-g3c9dc0, phantomjs 2.1.1, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.47.0, sqlite3-2.6.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1864 extractors
[facebook] Extracting URL: https://www.facebook.com/gazzetta.gr/videos/1282182232423557/
[facebook] 1282182232423557: Downloading webpage
[facebook] 1282182232423557: Downloading webpage
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 1282182232423557:
ID                       EXT RESOLUTION |   TBR PROTO | VCODEC          VBR ACODEC     ABR ASR MORE INFO
-------------------------------------------------------------------------------------------------------------------
309810111544885a         m4a audio only |   65k https | audio only          mp4a.40.5  65k 44k DASH audio, m4a_dash
dash_sd_src              mp4 unknown    |       https | unknown             unknown
dash_sd_src_no_ratelimit mp4 unknown    |       https | unknown             unknown
1345607079376096v        mp4 216x384    |  105k https | vp09.00.20.08  105k video only         DASH video, mp4_dash
1285943572038730v        mp4 360x640    |  182k https | av01.0.01M.08  182k video only         DASH video, mp4_dash
2905904322879782v        mp4 360x640    |  217k https | av01.0.01M.08  217k video only         DASH video, mp4_dash
332073845822799v         mp4 360x640    |  161k https | vp09.00.21.08  161k video only         DASH video, mp4_dash
276657805118621v         mp4 360x640    |  228k https | vp09.00.21.08  228k video only         DASH video, mp4_dash
1357980768469980v        mp4 540x960    |  333k https | av01.0.04M.08  333k video only         DASH video, mp4_dash
300477575999225v         mp4 540x960    |  367k https | vp09.00.30.08  367k video only         DASH video, mp4_dash
264187603137569v         mp4 720x1280   |  575k https | av01.0.05M.08  575k video only         DASH video, mp4_dash
828632058729309v         mp4 720x1280   |  553k https | vp09.00.31.08  553k video only         DASH video, mp4_dash
834065534700307v         mp4 720x1280   |  955k https | vp09.00.31.08  955k video only         DASH video, mp4_dash
dash_hd_src              mp4 720p       |       https | unknown             unknown
701716058436450v         mp4 900x1600   |  922k https | av01.0.08M.08  922k video only         DASH video, mp4_dash
279905904762370v         mp4 1080x1920  | 1556k https | av01.0.08M.08 1556k video only         DASH video, mp4_dash
1529796337425201v        mp4 1080x1920  | 1630k https | vp09.00.40.08 1630k video only         DASH video, mp4_dash

😃

Hii guys i’m facing the same issue “cannot parse data” is there any solution , because we are stucked in our product development.

I have the same problem

I am pretty sure it’s an account-related problem, excessive use of yt-dlp can cause some kind of “invisible ban”.

Attempting to use the cookies of an account with this kind of “ban” (even with public videos that are downloaded normally with no cookies on the same machine) will result in the error mentioned in this issue, another attempt within 0 - 10 minutes will lock the account and “someone may have tried to access your account”.

In the past (until about a few months ago) you could easily know you currently have this kind of “ban” as almost every non-public video you normally have access to would give “content isn’t available at the moment”, and you could easily know when it ends. Currently, everything works as expected on browser but using yt-dlp with cookies causes the aforementioned error, and there is no way to know if the “ban” has ended other than trial and error.

I am 95% sure about this, but the only way to prove this theory is to use a clean account on a clean IP.

this Tahoe issue is about to become impossible to debug with master/nightly (you’ll need to use an old nightly build

… For anyone still interested 😉 , the very last yt-dlp-nightly release without #7890 merged is 2023.09.05.200110

I was hoping to figure this out as well, but I’ve tried everything and can’t reproduce the successful response on my end

Last night I did continue my tests, this time using some experimental yt-dlp_x86.exe builds, compiled with modded CPython 3.8/3.9 (so as to launch even on NT 5.1 😉 ) and modded OpenSSL-3.1.0 lib, and I still found that adding:

--user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" --add-header "Sec-Fetch-Mode:"

to my commands did make the Tahoe APIhappy” 😜 ; as a test, I used the TEST 'url' inside #7890,

https://www.facebook.com/radiokicksfm/videos/3676516585958356/

CPython 3.8.13+OpenSSL-3.1.0-dev
yt-dlp_x86 -vF "https://www.facebook.com/radiokicksfm/videos/3676516585958356/" => 

[debug] Command-line config: ['-vF', 'https://www.facebook.com/radiokicksfm/videos/3676516585958356/']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.08.31 [7237c8dca] (win_x86_exe)
[debug] Python 3.8.13+ (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 3.1.0-dev )
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1864 extractors
[facebook] Extracting URL: https://www.facebook.com/radiokicksfm/videos/3676516585958356/
[facebook] 3676516585958356: Downloading webpage
[facebook] 3676516585958356: Downloading webpage
ERROR: [facebook] 3676516585958356: Cannot parse data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp\extractor\common.py", line 715, in extract
  File "yt_dlp\extractor\facebook.py", line 733, in _real_extract
  File "yt_dlp\extractor\facebook.py", line 680, in _extract_from_url

but

yt-dlp_x86 -vF --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" --add-header "Sec-Fetch-Mode:" "https://www.facebook.com/radiokicksfm/videos/3676516585958356/ => 

[debug] Command-line config: ['-vF', '--user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36', '--add-header', 'Sec-Fetch-Mode:', 'https://www.facebook.com/radiokicksfm/videos/3676516585958356/']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.08.31 [7237c8dca] (win_x86_exe)
[debug] Python 3.8.13+ (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 3.1.0-dev )
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1864 extractors
[facebook] Extracting URL: https://www.facebook.com/radiokicksfm/videos/3676516585958356/
[facebook] 3676516585958356: Downloading webpage
[facebook] 3676516585958356: Downloading webpage
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 3676516585958356:
ID                EXT RESOLUTION |  TBR PROTO | VCODEC       VBR ACODEC     ABR ASR MORE INFO
--------------------------------------------------------------------------------------------------------
998172518084032a  m4a audio only |  65k https | audio only       mp4a.40.5  65k 48k DASH audio, m4a_dash
dash_sd_src       mp4 unknown    |      https | unknown          unknown
673954957623594v  mp4 276x144    |  45k https | avc1.4D400C  45k video only    DASH video, mp4_dash
329723412734958v  mp4 460x240    |  96k https | avc1.4D4015  96k video only    DASH video, mp4_dash
1334094473848775v mp4 690x360    | 200k https | avc1.4D401E 200k video only    DASH video, mp4_dash
266105879550722v  mp4 920x480    | 333k https | avc1.4D401F 333k video only    DASH video, mp4_dash
682373210155474v  mp4 1380x720   | 606k https | avc1.4D4020 606k video only    DASH video, mp4_dash
CPython 3.9.13+OpenSSL-3.1.0-dev
yt-dlp_x86 -vF "https://www.facebook.com/radiokicksfm/videos/3676516585958356/" => 

[debug] Command-line config: ['-vF', 'https://www.facebook.com/radiokicksfm/videos/3676516585958356/']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.08.31 [7237c8dca] (win_x86_exe)
[debug] Python 3.9.13 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 3.1.0-dev )
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1864 extractors
[facebook] Extracting URL: https://www.facebook.com/radiokicksfm/videos/3676516585958356/
[facebook] 3676516585958356: Downloading webpage
[facebook] 3676516585958356: Downloading webpage
ERROR: [facebook] 3676516585958356: Cannot parse data; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
  File "yt_dlp\extractor\common.py", line 715, in extract
  File "yt_dlp\extractor\facebook.py", line 733, in _real_extract
  File "yt_dlp\extractor\facebook.py", line 680, in _extract_from_url

but

yt-dlp_x86 -vF --user-agent "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36" --add-header "Sec-Fetch-Mode:" "https://www.facebook.com/radiokicksfm/videos/3676516585958356/ => 

[debug] Command-line config: ['-vF', '--user-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36', '--add-header', 'Sec-Fetch-Mode:', 'https://www.facebook.com/radiokicksfm/videos/3676516585958356/']
[debug] Encodings: locale cp1253, fs utf-8, pref cp1253, out utf-8 (No VT), error utf-8 (No VT), screen utf-8 (No VT)
[debug] yt-dlp version nightly@2023.08.31 [7237c8dca] (win_x86_exe)
[debug] Python 3.9.13 (CPython x86 32bit) - Windows-Vista-6.0.6003-SP2 (OpenSSL 3.1.0-dev )
[debug] exe versions: none
[debug] Optional libraries: Cryptodome-3.18.0, brotli-1.0.9, certifi-2023.07.22, mutagen-1.46.0, sqlite3-2.6.0, websockets-11.0.3
[debug] Proxy map: {}
[debug] Loaded 1864 extractors
[facebook] Extracting URL: https://www.facebook.com/radiokicksfm/videos/3676516585958356/
[facebook] 3676516585958356: Downloading webpage
[facebook] 3676516585958356: Downloading webpage
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 3676516585958356:
ID                EXT RESOLUTION |  TBR PROTO | VCODEC       VBR ACODEC     ABR ASR MORE INFO
--------------------------------------------------------------------------------------------------------
998172518084032a  m4a audio only |  65k https | audio only       mp4a.40.5  65k 48k DASH audio, m4a_dash
dash_sd_src       mp4 unknown    |      https | unknown          unknown
673954957623594v  mp4 276x144    |  45k https | avc1.4D400C  45k video only    DASH video, mp4_dash
329723412734958v  mp4 460x240    |  96k https | avc1.4D4015  96k video only    DASH video, mp4_dash
1334094473848775v mp4 690x360    | 200k https | avc1.4D401E 200k video only    DASH video, mp4_dash
266105879550722v  mp4 920x480    | 333k https | avc1.4D401F 333k video only    DASH video, mp4_dash
682373210155474v  mp4 1380x720   | 606k https | avc1.4D4020 606k video only    DASH video, mp4_dash

that maybe FB serves different html to different regions.

Sadly 😢 , I can’t test your theory about FB serving different page sources in different parts of the globe, as any attempt to use a VPN node/HTTPS Proxy (to “move out” of my region 😜 ) has resulted in a Tahoe API block 😠 …

The results above are repeatable with yt-dl master and Python 3.9 and also with the nightly release. Note that yt-dl currently sends UA versions < FF 90 by default.

However:

  • yt-dl --add-header 'Sec-Fetch-Mode: same-origin' ... breaks
  • ditto 'Sec-Fetch-Mode: navigate'
  • ditto 'Sec-Fetch-Dest: document'
  • yt-dlp stable@2023.06.22 fails with UA set to FF 89 and FF 90
  • yt-dlp --add-header 'Sec-Fetch-Mode:' ... succeeds, with UA set to FF 89, with these formats:
[info] Available formats for 1282182232423557:
ID                       EXT RESOLUTION │   TBR PROTO │ VCODEC        VBR ACODEC     ABR ASR MORE INFO
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────
309810111544885a         m4a audio only │   65k https │ audio only        mp4a.40.5  65k 44k DASH audio, m4a_dash
dash_sd_src              mp4 unknown    │       https │ unknown           unknown
dash_sd_src_no_ratelimit mp4 unknown    │       https │ unknown           unknown
135152312980525v         mp4 360x640    │   90k https │ avc1.4d401e   90k video only         DASH video, mp4_dash
1159988861624345v        mp4 360x640    │  179k https │ avc1.4d401e  179k video only         DASH video, mp4_dash
288169693854753v         mp4 540x960    │  366k https │ avc1.4d401f  366k video only         DASH video, mp4_dash
1703613763384686v        mp4 720x1280   │  770k https │ avc1.4d401f  770k video only         DASH video, mp4_dash
1760573717746257v        mp4 720x1280   │ 1575k https │ avc1.4d401f 1575k video only         DASH video, mp4_dash
dash_hd_src              mp4 720p       │       https │ unknown           unknown
312425081166210v         mp4 1080x1920  │ 3465k https │ avc1.640028 3465k video only         DASH video, mp4_dash
  • ditto with UA Mozilla/5.0, but only these formats:
[info] Available formats for 1282182232423557:
ID                              EXT RESOLUTION │ PROTO │ VCODEC  ACODEC
────────────────────────────────────────────────────────────────────────
progressive_sd_src              mp4 unknown    │ https │ unknown unknown
progressive_sd_src_no_ratelimit mp4 unknown    │ https │ unknown unknown
progressive_hd_src              mp4 720p       │ https │ unknown unknown

Far-sighted.

Python 3.10’s SSL ciphers were universally backported regardless of the actual Python version used

And we don’t know exactly what aspect of the fingerprint is triggering a block from FB. It could be any number of things

[facebook] 1282182232423557: Downloading webpage [facebook] 1282182232423557: Downloading webpage

There was very little difference between the yt-dlp FB extractor code and the youtube-dl FB extractor code prior to the reels patch (which also does not have a meaningful impact on this issue). I’d go so far as to say there was no meaningful difference at all.

As you see in the log, there are 2 requests being made – the first is for the actual FB webpage, and extraction silently fails. The 2nd request is to FB’s “Tahoe” video API endpoint, unhelpfully also labeled as Downloading webpage – yt-dlp does this as well. But the difference is in the response, since this API endpoint seems to employ some sort of TLS fingerprinting protection that blocks yt-dlp’s Py>=3.10 fingerprint and allows youtube-dl’s Py<=3.9 fingerprint. This is what yt-dlp gets back:

{
  "__ar": 1,
  "error": 1357005,
  "errorSummary": "Your Request Couldn't be Processed",
  "errorDescription": "There was a problem with this request. We're working on geting it fixed as soon as we can.",
  "payload": null,
  "hsrp": {
    "hblp": {
      "consistency": {
        "rev": 1008421808
      }
    }
  },
  "lid": "7274291556808519235"
}

The facebook extractor is designed rather poorly IMO, and everything fails silently until there are no more fallbacks, at which point an identical Cannot parse data extractor error is raised. So that’s why #7901 is tracking the newly manifested problem where webpage extraction always fails with all links, while this longstanding issue (#4311) should be tracking the Tahoe API request failure (and whatever else was happening before FB changed their webpage structure around)

FWIW, is the genericIE still involved here, as suggested?

no

Hello, is there any update about the bugfix for this “cannot parse data” error which causes the problem of not being able to download Facebook videos? Sorry for bothering with the question and thanks in advance for your support

Hi!

I have been looking into this, and Facebook would consistently return “RelayPrefetchedStreamCache\u00406fa52080e4dc537033301a17dbec99b5” instead of “RelayPrefetchedStreamCache”. I fixed it by modifying the facebook function a little:

def extract_relay_prefetched_data(_filter): return traverse_obj(extract_relay_data(_filter), ( ‘require’, (None, (…, …, …, ‘__bbox’, ‘require’)), lambda _, v: any(key.startswith(‘RelayPrefetchedStreamCache’) for key in v if isinstance(key, str)), …, …, ‘__bbox’, ‘result’, ‘data’, {dict}), get_all=False) or {}

Nightly build suddenly works again!

$ ~/yt-dlp_linux --version 2023.09.05.203540

~/yt-dlp_linux --extract-audio --audio-format mp3 -o xx1.mp3 https://web.facebook.com/watch/?v=2139244499701931 [facebook] Extracting URL: https://web.facebook.com/watch/?v=2139244499701931 [facebook] 2139244499701931: Downloading webpage [info] 2139244499701931: Downloading 1 format(s): sd [download] Destination: xx1.mp4 [download] 100% of 4.59MiB in 00:00:00 at 8.64MiB/s [ExtractAudio] Destination: xx1.mp3 Deleting original file xx1.mp4 (pass -k to keep)

I think there is more to the fingerprinting than just the headers, since I still get the error with the same command line on yt-dlp master. Or it could be a regional thing, maybe

but both these commits (in Jan 2022) were authored after v2021.12.25, which is now also broken in FB,

Then it was almost certainly d76d15a6699dc41eea26a96d054a1b7bcb12c69b that broke the FB “Tahoe” fallback

Yet:

$ yt-dlp -v -F 'https://web.facebook.com/watch/?v=2139244499701931' --user-agent 'Mozilla/5.0 (Windows NT 6.0; rv:89.0) Gecko/20100101 Firefox/89.0' --add-header 'Sec-Fetch-Mode:'
[debug] Command-line config: ['-v', '-F', 'https://web.facebook.com/watch/?v=2139244499701931', '--user-agent', 'Mozilla/5.0 (Windows NT 6.0; rv:89.0) Gecko/20100101 Firefox/89.0', '--add-header', 'Sec-Fetch-Mode:']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.06.22 [812cdfa06] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: de4cf77ec
[debug] Python 3.9.16 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-glibc2.23 (OpenSSL 1.1.1v  1 Aug 2023, glibc 2.23)
[debug] exe versions: ffmpeg 4.3, ffprobe 4.3
[debug] Optional libraries: Cryptodome-3.11.0, certifi-2019.11.28, secretstorage-3.2.0, sqlite3-2.6.0
[debug] Proxy map: {}
[debug] Loaded 1851 extractors
[facebook] Extracting URL: https://web.facebook.com/watch/?v=2139244499701931
[facebook] 2139244499701931: Downloading webpage
[facebook] 2139244499701931: Downloading webpage
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 2139244499701931:
ID                              EXT RESOLUTION │ PROTO │ VCODEC  ACODEC
────────────────────────────────────────────────────────────────────────
progressive_sd_src              mp4 unknown    │ https │ unknown unknown
progressive_sd_src_no_ratelimit mp4 unknown    │ https │ unknown unknown
$

Apparently FB is looking at the final version number, not the rv.

With this page yt-dl nightly gets:

[facebook] 2139244499701931: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found (caused by HTTPError()) ....

But that’s another problem.

yt-dlp --user-agent “Mozilla/5.0 (Windows NT 6.0; rv:89.0) Gecko/20100101 Firefox/89.0” --add-header ‘Sec-Fetch-Mode:’ --extract-audio --audio-format mp3 https://web.facebook.com/watch/?v=2139244499701931 [facebook] Extracting URL: https://web.facebook.com/watch/?v=2139244499701931 [facebook] 2139244499701931: Downloading webpage [facebook] 2139244499701931: Downloading webpage ERROR: [facebook] 2139244499701931: Cannot parse data; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U

yt-dlp --add-header ‘Sec-Fetch-Mode:’ --extract-audio --audio-format mp3 https://web.facebook.com/watch/?v=2139244499701931 [facebook] Extracting URL: https://web.facebook.com/watch/?v=2139244499701931 [facebook] 2139244499701931: Downloading webpage [facebook] 2139244499701931: Downloading webpage ERROR: [facebook] 2139244499701931: Cannot parse data; please report this issue on https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using yt-dlp -U

yt-dlp is up to date (stable@2023.07.06)

FB is being broken now on every yt-dlp version after v2021.12.01; so, 11 months already prior to https://github.com/yt-dlp/yt-dlp/commit/5b9f253fa0aee996cf1ed30185d4b502e00609c4, something inside 2021.12.01…2021.12.25 (https://github.com/yt-dlp/yt-dlp/commit/5f549d4959025eef8bb49c870be5a8c35866e301 ?) now breaks FB in yt-dlp

@Vangelis66 I think it’s more likely that b1156c1e59646d450836ce8e61c34641070e8ccb / d14cbdd92d8bbb9deedc77da80085b0280ae52bb was the culprit, since FB’s fingerprinting seems to be very http header-focused

@bashonly

Many thanks for your erudite reply; highly appreciated 😄 👍 … BTW:

prior to the reels patch

… the link has to be changed to bb5d84c9d2f1e978c3eddfb5ccbe138036682a36 😉 …

I had to do a “refresh” on TLS fingerprinting, still I kindly ask for a further clarification on:

since this API endpoint seems to employ some sort of TLS fingerprinting protection that blocks yt-dlp’s Py>=3.10 fingerprint and allows youtube-dl’s Py<=3.9 fingerprint.

From the article I read it comes that both apps’ TLS fingerprints depend on the combination of CPython+OpenSSL lib in use when the scripts are being invoked; or is there something additional involved in the case of yt-dlp?

I find that even yt-dlp_x86.exe, currently built on CPython 3.7, stills fails on FB 😿 , so it isn’t only on Py>=3.10 that the Tahoe API endpoint blocks yt-dlp (unless I misunderstood what you wrote previously) …

The successful yt-dl logs I posted previously were on CPython 3.4+OpenSSL-1.0.2k and CPython 2.7+OpenSSL-1.0.2t, so both using the 1.0.2 OpenSSL branch; AFAIAA, 1.0.2 doesn’t even support TLSv1.3, so why older (less secure) CPython and older (less secure) OpenSSL are being whitelisted by that API is beyond me…

Finally, I dug up my archives and found out that the last yt-dlp version that emulates the current yt-dl behaviour in regards to FB is v2021.12.01 (!):

yt-dlp_x86 -vF "https://www.facebook.com/gazzetta.gr/videos/1282182232423557/" => 

[debug] Command-line config: ['-vF', 'https://www.facebook.com/gazzetta.gr/videos/1282182232423557/']
[debug] Encodings: locale cp1253, fs utf-8, out utf-8 (No ANSI), err utf-8 (No ANSI), pref cp1253
[debug] yt-dlp version 2021.12.01 [91f071a] (win_exe)
[debug] Python version 3.7.9 (CPython 32bit) - Windows-Vista-6.0.6003-SP2
[debug] exe versions: none
[debug] Optional libraries: Cryptodome, mutagen, sqlite, websockets
[debug] Proxy map: {}
[debug] [facebook] Extracting URL: https://www.facebook.com/gazzetta.gr/videos/1282182232423557/
[facebook] 1282182232423557: Downloading webpage
[facebook] 1282182232423557: Downloading webpage
[debug] Sort order given by extractor: res, quality
[debug] Formats sorted by: hasvid, ie_pref, res, quality, lang, fps, hdr:12(7), vcodec:vp9.2(10), acodec, filesize, fs_approx, tbr, vbr, abr, asr, proto, vext, aext, hasaud, source, id
[info] Available formats for 1282182232423557:
ID                       EXT RESOLUTION |   TBR PROTO | VCODEC          VBR ACODEC     ABR     ASR MORE INFO
-----------------------------------------------------------------------------------------------------------------------
309810111544885a         m4a            |   65k https | audio only          mp4a.40.5  65k 44100Hz DASH audio, m4a_dash
dash_sd_src              mp4 unknown    |       https | unknown             unknown
dash_sd_src_no_ratelimit mp4 unknown    |       https | unknown             unknown
1345607079376096v        mp4 216x384    |  104k https | vp09.00.20.08  104k video only             DASH video, mp4_dash
1285943572038730v        mp4 360x640    |  181k https | av01.0.01M.08  181k video only             DASH video, mp4_dash
2905904322879782v        mp4 360x640    |  217k https | av01.0.01M.08  217k video only             DASH video, mp4_dash
332073845822799v         mp4 360x640    |  161k https | vp09.00.21.08  161k video only             DASH video, mp4_dash
276657805118621v         mp4 360x640    |  227k https | vp09.00.21.08  227k video only             DASH video, mp4_dash
1357980768469980v        mp4 540x960    |  332k https | av01.0.04M.08  332k video only             DASH video, mp4_dash
300477575999225v         mp4 540x960    |  366k https | vp09.00.30.08  366k video only             DASH video, mp4_dash
264187603137569v         mp4 720x1280   |  575k https | av01.0.05M.08  575k video only             DASH video, mp4_dash
828632058729309v         mp4 720x1280   |  552k https | vp09.00.31.08  552k video only             DASH video, mp4_dash
834065534700307v         mp4 720x1280   |  955k https | vp09.00.31.08  955k video only             DASH video, mp4_dash
dash_hd_src              mp4 720p       |       https | unknown             unknown
701716058436450v         mp4 900x1600   |  922k https | av01.0.08M.08  922k video only             DASH video, mp4_dash
279905904762370v         mp4 1080x1920  | 1555k https | av01.0.08M.08 1555k video only             DASH video, mp4_dash
1529796337425201v        mp4 1080x1920  | 1629k https | vp09.00.40.08 1629k video only             DASH video, mp4_dash

As the more observant among you might have noticed, avc1 streams are missing, but I guess this is due to this

Kind regards 😄

The FB extractor in yt-dl hasn’t been updated since release (2021.12.17) and should be expected to be broken. A yt-dl version using a version of the extractor from a PR might work. Also, it might depend on which Python version is used since one failure mode (IIRC, and why no PR was merged) is blockage by FB’s bouncers based on client fingerprinting.

Above, a video that can be downloaded with yt-dl doesn’t match the FB extractor directly but with the generic extractor its embedded videos (that do match the FB extractor) can be downloaded.

The FB extractor in yt-dl hasn’t been updated since release (2021.12.17) and should be expected to be broken. A yt-dl version using a version of the extractor from a PR might work. Also, it might depend on which Python version is used since one failure mode (IIRC, and why the extractor wasn’t updated) is blockage by FB’s bouncers based on client fingerprinting.

yt-dlp has made some networking improvements that might help to fix such blockage.

@UtopianElectronics can you share an example? I can’t find any URLs that work with youtube-dl anymore

Experiencing the same issue. Surprisingly, the latest version of youtube-dl can download Facebook videos, but yt-dlp cannot.

actually I’m using this with python and the youtube_dl fails too.

this makes me think the difference could be in the python version / SSL ciphers used rather than the extractor code

@omarmohamedmoustafa facebook does enforce a rate-limit on how many posts you are able to view when not logged in. I can download that link without issue. Does that video load for you in your browser when not logged in? (use private/incognito tab)

Going to add here that using yt-dlp to try to download a reel has locked me out of my Facebook account. I am jumping through the hoops to get it back but be cautious as it’s very much not worth getting a few min video to jump through Facebook security hoops.

I also do not understand why it is doing this when I have 2fa enabled… smh

As mentioned in https://github.com/yt-dlp/yt-dlp/issues/4477#issuecomment-1200612649 (I’ve also tested using the same cookies.txt file), youtube-dl works, but not yt-dlp. So it’s caused by a regression, not Facebook “banning” you.