youtube-transcript-api: Could not retrieve a transcript for the video

Could not retrieve a transcript for the video https://www.youtube.com/watch?v=98TQv5IAtY8! This is most likely caused by: The video is no longer available

It works on my local computer (Windows 10) but when I try to use it on Ubuntu 20.04(DigitalOcean Droplet) I get this error! I assume the error is caused by sender I.P. address.

I got a similar problem using youtube-dl on my droplet and when I tried using “–force-ipv4” with youtube-dl It worked. Is there a similar solution to this?

Code YouTubeTranscriptApi.get_transcript("98TQv5IAtY8", languages=['en'])

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 37 (17 by maintainers)

Most upvoted comments

@iercetin @sdtblck any news on this?

In v0.4.0 the Exception TooManyRequests has been added which is raised when running into rate limits. This could be used to further investigate the issue. My guess with the IPv4 vs. IPv6 thing @iercetin mentioned is that the IPv6 has been blocked due to extensive usage, while IPv4 still works as it hasn’t been blocked yet.

As this issue is kinda all over the place now with different things being reported (most of them most likely due to rate limits, which now have a more speaking error message) I will close this for now. If individual issues arise again feel free to open a new issue with a title more specific to that issue.

Perfect, that’s exactly what I was looking for @cramdoulfa! Thank you very much! 👍 I will add a custom error for this suggesting the user to wait for the rate limit to reset, or use a VPN/change IP. I am quite busy right now, as I am in the last weeks of writing my master thesis, so I probably won’t be doing any coding on this module for a few weeks, but I’ll try to get that done as soon as I can.

The other thing which still remains interesting is the IPv4 vs IPv6 thing suggested above. I would be great if you could try executing a IPv4 and IPv6 request next time you run into the rate limit and upload the results here. The responses which have been uploaded so far have been contradicting each other a bit and the people have unfortunately stopped replying.

curl -L -6 "http://youtube.com/watch?v=0vAfIcmpqzQ" returns curl: (7) Couldn't connect to server for me

Good luck with the thesis!

@cramdoulfa no worries, thanks for putting in the time trying to resolve this! 😊👍

Ok I think this is the right one this time. The page actually says ‘We have been receiving large amounts of requests from your nework.’

curl_result_video_unavailable.txt

I had this issue also myself. It’s due to youtube blocking your IP. I switched on a VPN and everything worked as expected.

Slight sidetrack but I’m curious @jacksonw765 do you use a commercial VPN or did you configure one yourself with openVPN? I could not find a nice VPN client for Linux AMI

@jdepoix here is a sample HTML page for a video with available transcripts when the API seems to be blocked: curl_result_blocked_transcript_API.txt

@jacksonw765 yeah, that’s what I was guessing. It would be great though, if I could see what HTML YouTube returns after they blocked you, so that I can add a proper error message to this module.

Thanks for the additional information @cramdoulfa!

This seems a bit odd though, as the information which is required for this module is actually being returned by your request. Are you sure that the module was still failing, while trying to retrieve this video, as you did the requests? Maybe there were some rate limits which did reset. Did you check this, before executing the curl request?

Hum very good point, the package is actually working again now! I will start a batch of query and update if it starts blocking again. It’s probably a matter of quota or rate limit.

Hey @jdepoix , apologies if I was being vague. I think my issue might be related to https://github.com/jdepoix/youtube-transcript-api/issues/60 instead of this.

I’m not actually sure if it was serving via IPV6 before since I didn’t have any global LB set up as it was a light project. I think forcing the function to go through a reserved static IP stopped youtube from limiting the shared machine my Cloud Function was running on. Apologies for spinning your wheel.

So if I am understanding this correctly, you were probably doing IPv6 requests before setting up the VPC, while now you’re doing IPv4 requests. Which would further support the assumption that this module can fail when sending IPv6 requests to YouTube. Thank you for sharing @adongu!

I guess my best bet would be to implement something which forces this module to use IPv4. I’ll look into that when I have some time at hand.

Hey @jdepoix I did some digging and found this tidbit the documentations for Google VPC networks. https://cloud.google.com/vpc/docs/vpc#specifications

“VPC networks only support IPv4 unicast traffic. They do not support broadcast, multicast, or IPv6 traffic within the network; VMs in the VPC network can only send to IPv4 destinations and only receive traffic from IPv4 sources. However, it is possible to create an IPv6 address for a global load balancer.”

It looks like all VPC traffic is IPV4, unless I create a IPV6 address on global LB, and route service all traffic first to the LB. The guide I followed didn’t create any LB as far as I know and the VPC network routing mode is regional.