youtube-dl: Multi-threading slowdown for YouTube
Checklist
- I’m reporting a broken site support
- I’ve verified that I’m running youtube-dl version 2021.12.17
- I’ve checked that all provided URLs are alive and playable in a browser
- I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
- I’ve searched the bugtracker for similar issues including closed ones
Verbose log
PASTE VERBOSE LOG HERE
Description
Using youtube_dl with multiple threads to get information about multiple videos, is a lot slower after the last breakage.
Using the code below gives me ~70 sec for 42 videos.
With the yt-dlp the time it takes for the same videos is ~30 sec.
Before the last code change (I’m using the current git code), youtube_dl was faster than yt-dlp.
Changing the THREAD_NR didn’t change the difference…
The processing is also much higher than before. Because I use a similar strategy in an app I have, in my system (with an older i7) it got noticeable worst than before. Trying it with an older laptop, made the app totally unusable…
The ids are some random links, you can use whatever you like.
# coding=utf-8
from __future__ import absolute_import, division, print_function, unicode_literals
from queue import Queue
from threading import Thread, Event
import youtube_dl
# import yt_dlp as youtube_dl
ydl_opts = {"quiet": True, "no_warnings": True}
link_ids = ["4jduuQh-Uho", "9GNpv7QDvMY", "MbEOR2Flc-4", "ZKUzNF21n9w", "y-JqH1M4Ya8",
"pUqfaiUb3l4", "bL5eqSOXMtE", "HyMm4rJemtI", "BU4kGkrrJEw", "wA1v207xlOw",
"pFS4zYWxzNA", "aF6hDcAbSoE", "G1ckKDRc69w", "o9_jzBtdMZ4", "AGoQZx8Mn0g",
"6W-pHCD6Tow", "kszLwBaC4Sw", "mwTd_PzGY-c", "iqLTYD_nhsU", "X335gdcPE7A",
"z_54vDk8lWw", "8a82arE0JSQ", "tJmzQHWl9kc", "8jPQjjsBbIc", "ENJUB5thpB4",
"dEhUMvjFuQY", "D6XyJh1tsGI", "tFCfb-Qqdz0", "UkafA6r1caQ", "OO8HtAXnRqQ",
"--da0m2K4I4", "EOlI0UtLDk4", "r7tQbxTImKw", "s_YLPcW4Tu8", "9wIbhES2UkA",
"YkX9X4td7j8", "14cHz4ebonY", "saVUUZE50Co", "N1K4NYHqMx4", "iCBL33NKvPA",
"QPTNS3llm2E", "pFS4zYWxzNA"]
THREAD_NR = 8
infos = []
class Base(object):
def __init__(self, **kwargs):
super(Base, self).__init__(**kwargs)
self.feed_q = Queue()
self.threads = []
with youtube_dl.YoutubeDL({}) as ydl:
ydl.cache.remove()
for i in range(THREAD_NR):
thread = Worker(self.feed_q)
thread.daemon = True
self.threads.append(thread)
thread.start()
class Node(object):
pass
class Worker(Thread):
def __init__(self, feed_q):
super(Worker, self).__init__()
self.node = None
self.feed_q = feed_q
self.stop = False
def run(self):
while not self.stop:
self.node = self.feed_q.get()
url = "https://www.youtube.com/watch?v=" + self.node.id
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
try:
ydl_info = ydl.extract_info(url, download=False)
except Exception as e:
error_text = "A {0} occurred for {2}. Arguments:\n{1!r}"
print(error_text.format(type(e).__name__, e.args, url))
self.feed_q.task_done()
continue
infos.append(ydl_info)
print("Got info from {}".format(url))
self.node.updated.set()
while True:
if self.node.updated.wait(timeout=2):
break
self.feed_q.task_done()
if __name__ == "__main__":
base = Base()
print("Getting info for {} YouTube videos".format(len(link_ids)))
for tube_id in link_ids:
node = Node()
node.id = tube_id
node.updated = Event()
base.feed_q.put(node)
from timeit import default_timer as timer
start = timer()
base.feed_q.join()
print("Finished in {} seconds".format(round(timer() - start)))
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 35 (18 by maintainers)
Links to this issue
Commits related to this issue
- [jsinterp] Some optimizations and refactoring Motivated by: https://github.com/ytdl-org/youtube-dl/issues/30641#issuecomment-1041904912 Authored by: dirkf, pukkandan — committed to yt-dlp/yt-dlp by pukkandan 2 years ago
- Update to ytdl-commit-d1c6c5 [YouTube] [core] Improve platform debug log, based on yt-dlp https://github.com/ytdl-org/youtube-dl/commit/d1c6c5c4d618fa950813c0c71aede34a5ac851e9 Except: * 6ed3433... — committed to yt-dlp/yt-dlp by pukkandan a year ago
- Update to ytdl-commit-d1c6c5 [YouTube] [core] Improve platform debug log, based on yt-dlp https://github.com/ytdl-org/youtube-dl/commit/d1c6c5c4d618fa950813c0c71aede34a5ac851e9 Except: * 6ed3433... — committed to yt-dlp/yt-dlp by pukkandan a year ago
- Update to ytdl-commit-d1c6c5 [YouTube] [core] Improve platform debug log, based on yt-dlp https://github.com/ytdl-org/youtube-dl/commit/d1c6c5c4d618fa950813c0c71aede34a5ac851e9 Except: * 6ed3433... — committed to stanoarn/yt-dlp by pukkandan a year ago
So, on a dump T7700 laptop:
2.7: 146s (
from Queue import Queue) 3.5: 110s 3.9: 70sMight the difference between yt-dlp and yt-dl be related to the Python version?
Also, https://lwn.net/Articles/872869/.