yt-dlp: How to handle missing/incomplete fragments more elegantly

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

  • I understand that I will be blocked if I remove or skip any mandatory* field

Checklist

  • I’m asking a question and not reporting a bug or requesting a feature
  • I’ve looked through the README
  • I’ve verified that I’m running yt-dlp version 2023.01.06 (update instructions) or later (specify commit)
  • I’ve searched the bugtracker for similar questions including closed ones. DO NOT post duplicates
  • I’ve read the guidelines for opening an issue

Please make sure the question is worded well enough to be understood

First some observations regarding fragments:

  • --fragment-retries infinite bangs its head against the same wall forever
  • --abort-on-unavailable-fragments bails on the first sign of trouble
  • --keep-fragments floods the directory with hundreds of fragments
  • --no-keep-fragments iteratively merges fragments while downloading (but if a fragment is missing it will be left as a separate file, the merged video will lack this fragment, and video/audio/subtitles will be out of sync)

So, is there a way to tell yt-dlp:

  1. If some fragments can’t be downloaded, skip them
  2. Focus on getting all the fragments that are available
  3. Iteratively merge all sequential fragments that are complete
  4. When finished, if some fragments are missing/incomplete, don’t merge or delete anything
  5. Let the user re-run the download at a later time to try to quickly get the (probably few) remaining fragments
  6. Merge and delete once all fragments are available

Pseudo:

foreach video in batch
  failures = 0
  foreach fragment in video
    if fragment missing or incomplete
      get fragment, try=10
    if fragment complete
      merge fragment with preceding complete fragments
    else
      print "fragment # missing. download will continue, but merging will be skipped."
      failures++
  if failures == 0
    merge video
    delete fragments

Example of a condensed way to keep/merge fragments:

.part-Frag001-020
.part-Frag021.part
.part-Frag022-049
.part-Frag050.part
.part-Frag051-099

PS! When running a standard yt-dlp <video> that was merged, but has at least one .part-Frag###.part (in which case video/audio/subtitles will be out of sync), is there currently any way to re-run the download to get the missing fragment and merge with the already merged video? (I assume no, and though possibly technically possible to implement, would be better handled pre-emptively as described above.)

Provide verbose output that clearly demonstrates the problem

  • Run your yt-dlp command with -vU flag added (yt-dlp -vU <your command line>)
  • Copy the WHOLE output (starting with [debug] Command-line config) and insert it below

Complete Verbose Output

$ yt-dlp -vU
[debug] Command-line config: ['-vU']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.01.06 [6becd2508] (source)
[debug] Git HEAD: 88d8928
[debug] Python 3.10.9 (CPython x86_64 64bit) - macOS-12.6.2-x86_64-i386-64bit (OpenSSL 1.1.1s  1 Nov 2022)
[debug] exe versions: ffmpeg git-2022-06-01-c6364b71 (setts), ffprobe git-2022-06-01-c6364b71, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.06.15, sqlite3-2.6.0, xattr-0.9.9
[debug] Proxy map: {}
[debug] Loaded 1735 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.01.06, Current version: 2023.01.06
yt-dlp is up to date (2023.01.06)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 6
  • Comments: 19 (4 by maintainers)

Most upvoted comments

Thanks for reminding me about the importance of explaining core motivation when asking questions 😃

The core motivation is to not be stuck with a merged/post-processed file that has video/audio/subtitles out of sync, because then you have to re-download the entire file, which may take hours under certain circumstances.

So, one can use keep-fragments to prevent this, but this even keeps fragments even if download/merge/PP is 100% successful, and if you download by batch, there will be thousands of fragments in the directory that you don’t really need.

(Also, I forgot to mention that I had -i on by default, which causes generation of the final file and removal of fragments even with missing fragments. I doubt anyone ever wants to do this as it causes video/audio/subtitles to be out of sync.

So my thought was:

  • Detect if missing fragments causes video/audio/subs to be out of sync, and then either refuse to delete fragments, or merge file by “least common denominator”, eg. if video is missing 30:00-30:10, remove that for audio and subtitles too.
  • Keep directory flooding of fragment files down by merging consequtive fragments (as illustrated in OP).
  • Only keep fragments as long as they are actually needed. (If all fragments are downloaded, and the final file is successfully merged, they are not needed anymore)

Options for this could be something in the lines of:

--merge-final-file=[always | only-if-have-all-fragments | yes-but-prevent-out-of-sync-issues]
--merge-fragments-while-downloading=[always | consequtive-only | never]
--keep-fragments-after-merging-final-file=[always | only-if-missing-fragments | never]

I hope this is more understandable and contributes to bringing the discussion forward.

  1. Continuously merge all consequtive fragments (see OP) to keep the dir as clean as possible.

Keeping directory “clean” is not an objective of yt-dlp. Set a -P temp: instead to keep the main directory clean.

This is actually a pretty decent workaround. Thanks for pointing this out.

I will not change the defaults as explained here. However, a new option (say, --continue-on-unavailable-fragments) can be added to get this behavior #6757 (comment). As I understood OP, this is really the only thing you want anyway. Idk why you are complicating it with so many other “ideas”.

  1. When merging media with separate video/audio/subs, if eg. an audio fragment is missing, merging should cut out the corresponding pieces in the other media (ie. video/subs) to always prevent out-of-sync (no-one ever wants that).

Not possible to implement. Audio and video fragments may not have same durations/count. Or one of them may not even be fragmented. We can’t even detect, much less cut out the correct portions.

Important clarification. Out-of-sync media was my initial concern. If suggestions got unnecessarily complicated it was trying to be creative, not knowing every detail of the application, and related issues from other users.

Aren’t HLS segments usually 10 seconds for both video and audio, and disregarding weird edge cases where they aren’t, isn’t it quite easy to do a logical “and” between available video and audio pieces? (Subtitles require a bit of code, but not that much, or one could simply choose to forsake subtitles for edge cases like this.)

If this is completely undoable (or you can’t be bothered), at least --continue-on-unavailable-fragments will prevent out-of-sync situations, but at the risk and cost of the following:

  1. You start a download that will take hours before you leave home or go to sleep.
  2. An early fragment is missing. The application bails.
  3. You return, expecting the download to be complete, and get upset because you have to restart it, which the further chance of the same thing happening any time.

With this in mind, putting the temp directory to /tmp, and using --keep-fragments for sites known to take hours and have missing fragments, absolutely seems like the best solution. The download dir is kept clean, downloading is not abruted by mishaps, and the (usually few) fragments missing at the end can quite quickly be fetched with a second manual run. This can even be semi-automated with a until shell loop, assuming the application gives the proper exit codes depending on whether the download was complete and successful.

PS! Is there a way with the current version to delete fragments if the download was 100% successful? Off the top, a calling script could check the exit code assuming this gives an unambiguous answer, but how to find out which fragments belong to the download, other than doing a fuzzy string match against the merged filename? Could this be added as an option? As mention in OP: --keep-fragments=[always | missing | never]

So --continue-on-unavailable-fragments means --merge-final-file-despite-missing-fragments? If so, I’m totally on board. I would set this to “NO” then, and TBH would expect it to be the default.

What about the part about deleting fragments when the file is 100%? (ie. only keeping fragments as a safety measure as long as they might be needed to ultimately get a full and complete merged file)

@chrizilla: Thanks for the support. Yes, it’s about being smarter about exceptions. Usually, you just want to download some media into a single file, and usually that goes all well. But sometimes, some of the below happens for a shorter or longer period of time, and you want to handle that smart and gracefully, but also non-intrusively, and as efficiently as possible in long run. (Eg. if a fragment is missing, is it best to try again immediately, in 1 minute, 1 hour or 1 day?)

  • Missing fragments
  • Missing connections
  • Missing disk space
  • Interrupted program

My initial problem was files being merged when there were missing fragments that cause out-of-sync video/audio/subs, and I don’t want to keep fragments as a default, since that’s meaningless for the majority of files that finish 100%.

General questions during code design:

  • What information is available on which to base decisions?
  • What confident choices can you make on behalf of the user?
  • When choices are not evident, when do you introduce program options and when do you prompt user input?

Maybe borrow some ideas from Torrent protocols. They are built specifically with exception resilience in mind.