youtube-dl: Multi-threading slowdown for YouTube

Checklist

  • I’m reporting a broken site support
  • I’ve verified that I’m running youtube-dl version 2021.12.17
  • I’ve checked that all provided URLs are alive and playable in a browser
  • I’ve checked that all URLs and arguments with special characters are properly quoted or escaped
  • I’ve searched the bugtracker for similar issues including closed ones

Verbose log

PASTE VERBOSE LOG HERE

Description

Using youtube_dl with multiple threads to get information about multiple videos, is a lot slower after the last breakage. Using the code below gives me ~70 sec for 42 videos. With the yt-dlp the time it takes for the same videos is ~30 sec. Before the last code change (I’m using the current git code), youtube_dl was faster than yt-dlp. Changing the THREAD_NR didn’t change the difference…

The processing is also much higher than before. Because I use a similar strategy in an app I have, in my system (with an older i7) it got noticeable worst than before. Trying it with an older laptop, made the app totally unusable…

The ids are some random links, you can use whatever you like.

# coding=utf-8
from __future__ import absolute_import, division, print_function, unicode_literals
from queue import Queue
from threading import Thread, Event
import youtube_dl
# import yt_dlp as youtube_dl

ydl_opts = {"quiet": True, "no_warnings": True}
link_ids = ["4jduuQh-Uho", "9GNpv7QDvMY", "MbEOR2Flc-4", "ZKUzNF21n9w", "y-JqH1M4Ya8",
            "pUqfaiUb3l4", "bL5eqSOXMtE", "HyMm4rJemtI", "BU4kGkrrJEw", "wA1v207xlOw",
            "pFS4zYWxzNA", "aF6hDcAbSoE", "G1ckKDRc69w", "o9_jzBtdMZ4", "AGoQZx8Mn0g",
            "6W-pHCD6Tow", "kszLwBaC4Sw", "mwTd_PzGY-c", "iqLTYD_nhsU", "X335gdcPE7A",
            "z_54vDk8lWw", "8a82arE0JSQ", "tJmzQHWl9kc", "8jPQjjsBbIc", "ENJUB5thpB4",
            "dEhUMvjFuQY", "D6XyJh1tsGI", "tFCfb-Qqdz0", "UkafA6r1caQ", "OO8HtAXnRqQ",
            "--da0m2K4I4", "EOlI0UtLDk4", "r7tQbxTImKw", "s_YLPcW4Tu8", "9wIbhES2UkA",
            "YkX9X4td7j8", "14cHz4ebonY", "saVUUZE50Co", "N1K4NYHqMx4", "iCBL33NKvPA",
            "QPTNS3llm2E", "pFS4zYWxzNA"]
THREAD_NR = 8
infos = []


class Base(object):
    def __init__(self, **kwargs):
        super(Base, self).__init__(**kwargs)
        self.feed_q = Queue()
        self.threads = []
        with youtube_dl.YoutubeDL({}) as ydl:
            ydl.cache.remove()
        for i in range(THREAD_NR):
            thread = Worker(self.feed_q)
            thread.daemon = True
            self.threads.append(thread)
            thread.start()


class Node(object):
    pass


class Worker(Thread):
    def __init__(self, feed_q):
        super(Worker, self).__init__()
        self.node = None
        self.feed_q = feed_q
        self.stop = False

    def run(self):
        while not self.stop:
            self.node = self.feed_q.get()
            url = "https://www.youtube.com/watch?v=" + self.node.id
            with youtube_dl.YoutubeDL(ydl_opts) as ydl:
                try:
                    ydl_info = ydl.extract_info(url, download=False)
                except Exception as e:
                    error_text = "A {0} occurred for {2}. Arguments:\n{1!r}"
                    print(error_text.format(type(e).__name__, e.args, url))
                    self.feed_q.task_done()
                    continue
                infos.append(ydl_info)
                print("Got info from {}".format(url))
                self.node.updated.set()
            while True:
                if self.node.updated.wait(timeout=2):
                    break
            self.feed_q.task_done()


if __name__ == "__main__":
    base = Base()
    print("Getting info for {} YouTube videos".format(len(link_ids)))
    for tube_id in link_ids:
        node = Node()
        node.id = tube_id
        node.updated = Event()
        base.feed_q.put(node)

    from timeit import default_timer as timer
    start = timer()
    base.feed_q.join()
    print("Finished in {} seconds".format(round(timer() - start)))

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 35 (18 by maintainers)

Commits related to this issue

Most upvoted comments

So, on a dump T7700 laptop:

2.7: 146s (from Queue import Queue) 3.5: 110s 3.9: 70s

Might the difference between yt-dlp and yt-dl be related to the Python version?

Also, https://lwn.net/Articles/872869/.