yt-dlp: How to handle missing/incomplete fragments more elegantly
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
- I understand that I will be blocked if I remove or skip any mandatory* field
Checklist
- I’m asking a question and not reporting a bug or requesting a feature
- I’ve looked through the README
- I’ve verified that I’m running yt-dlp version 2023.01.06 (update instructions) or later (specify commit)
- I’ve searched the bugtracker for similar questions including closed ones. DO NOT post duplicates
- I’ve read the guidelines for opening an issue
Please make sure the question is worded well enough to be understood
First some observations regarding fragments:
--fragment-retries infinite
bangs its head against the same wall forever--abort-on-unavailable-fragments
bails on the first sign of trouble--keep-fragments
floods the directory with hundreds of fragments--no-keep-fragments
iteratively merges fragments while downloading (but if a fragment is missing it will be left as a separate file, the merged video will lack this fragment, and video/audio/subtitles will be out of sync)
So, is there a way to tell yt-dlp:
- If some fragments can’t be downloaded, skip them
- Focus on getting all the fragments that are available
- Iteratively merge all sequential fragments that are complete
- When finished, if some fragments are missing/incomplete, don’t merge or delete anything
- Let the user re-run the download at a later time to try to quickly get the (probably few) remaining fragments
- Merge and delete once all fragments are available
Pseudo:
foreach video in batch
failures = 0
foreach fragment in video
if fragment missing or incomplete
get fragment, try=10
if fragment complete
merge fragment with preceding complete fragments
else
print "fragment # missing. download will continue, but merging will be skipped."
failures++
if failures == 0
merge video
delete fragments
Example of a condensed way to keep/merge fragments:
.part-Frag001-020
.part-Frag021.part
.part-Frag022-049
.part-Frag050.part
.part-Frag051-099
PS! When running a standard yt-dlp <video>
that was merged, but has at least one .part-Frag###.part
(in which case video/audio/subtitles will be out of sync), is there currently any way to re-run the download to get the missing fragment and merge with the already merged video? (I assume no, and though possibly technically possible to implement, would be better handled pre-emptively as described above.)
Provide verbose output that clearly demonstrates the problem
- Run your yt-dlp command with -vU flag added (
yt-dlp -vU <your command line>
) - Copy the WHOLE output (starting with
[debug] Command-line config
) and insert it below
Complete Verbose Output
$ yt-dlp -vU
[debug] Command-line config: ['-vU']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version 2023.01.06 [6becd2508] (source)
[debug] Git HEAD: 88d8928
[debug] Python 3.10.9 (CPython x86_64 64bit) - macOS-12.6.2-x86_64-i386-64bit (OpenSSL 1.1.1s 1 Nov 2022)
[debug] exe versions: ffmpeg git-2022-06-01-c6364b71 (setts), ffprobe git-2022-06-01-c6364b71, rtmpdump 2.4
[debug] Optional libraries: Cryptodome-3.15.0, certifi-2022.06.15, sqlite3-2.6.0, xattr-0.9.9
[debug] Proxy map: {}
[debug] Loaded 1735 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: 2023.01.06, Current version: 2023.01.06
yt-dlp is up to date (2023.01.06)
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 6
- Comments: 19 (4 by maintainers)
Thanks for reminding me about the importance of explaining core motivation when asking questions 😃
The core motivation is to not be stuck with a merged/post-processed file that has video/audio/subtitles out of sync, because then you have to re-download the entire file, which may take hours under certain circumstances.
So, one can use
keep-fragments
to prevent this, but this even keeps fragments even if download/merge/PP is 100% successful, and if you download by batch, there will be thousands of fragments in the directory that you don’t really need.(Also, I forgot to mention that I had
-i
on by default, which causes generation of the final file and removal of fragments even with missing fragments. I doubt anyone ever wants to do this as it causes video/audio/subtitles to be out of sync.So my thought was:
Options for this could be something in the lines of:
I hope this is more understandable and contributes to bringing the discussion forward.
This is actually a pretty decent workaround. Thanks for pointing this out.
Important clarification. Out-of-sync media was my initial concern. If suggestions got unnecessarily complicated it was trying to be creative, not knowing every detail of the application, and related issues from other users.
Aren’t HLS segments usually 10 seconds for both video and audio, and disregarding weird edge cases where they aren’t, isn’t it quite easy to do a logical “and” between available video and audio pieces? (Subtitles require a bit of code, but not that much, or one could simply choose to forsake subtitles for edge cases like this.)
If this is completely undoable (or you can’t be bothered), at least
--continue-on-unavailable-fragments
will prevent out-of-sync situations, but at the risk and cost of the following:With this in mind, putting the temp directory to /tmp, and using
--keep-fragments
for sites known to take hours and have missing fragments, absolutely seems like the best solution. The download dir is kept clean, downloading is not abruted by mishaps, and the (usually few) fragments missing at the end can quite quickly be fetched with a second manual run. This can even be semi-automated with auntil
shell loop, assuming the application gives the proper exit codes depending on whether the download was complete and successful.PS! Is there a way with the current version to delete fragments if the download was 100% successful? Off the top, a calling script could check the exit code assuming this gives an unambiguous answer, but how to find out which fragments belong to the download, other than doing a fuzzy string match against the merged filename? Could this be added as an option? As mention in OP:
--keep-fragments=[always | missing | never]
So
--continue-on-unavailable-fragments
means--merge-final-file-despite-missing-fragments
? If so, I’m totally on board. I would set this to “NO” then, and TBH would expect it to be the default.What about the part about deleting fragments when the file is 100%? (ie. only keeping fragments as a safety measure as long as they might be needed to ultimately get a full and complete merged file)
@chrizilla: Thanks for the support. Yes, it’s about being smarter about exceptions. Usually, you just want to download some media into a single file, and usually that goes all well. But sometimes, some of the below happens for a shorter or longer period of time, and you want to handle that smart and gracefully, but also non-intrusively, and as efficiently as possible in long run. (Eg. if a fragment is missing, is it best to try again immediately, in 1 minute, 1 hour or 1 day?)
My initial problem was files being merged when there were missing fragments that cause out-of-sync video/audio/subs, and I don’t want to keep fragments as a default, since that’s meaningless for the majority of files that finish 100%.
General questions during code design:
Maybe borrow some ideas from Torrent protocols. They are built specifically with exception resilience in mind.