decord: Extremely slow accurate seek
Hi, I’ve hit an unexpected regression with accurate seeks - they seem about one order of magniture too slow. This is with a CUDA-enabled manual install from current git HEAD:
python3 decord/tests/benchmark/bench_decord.py --file test.mp4
15000 frames, elapsed time for sequential read: 9.465791463851929
300 frames, elapsed time for random access(not accurate): 2.511744737625122
300 frames, elapsed time for random access(accurate): 416.69994831085205
The machine (16 true core, nvme SSD) is neither CPU nor IO limited during the benchmark. CPU load is ~8 (out of 16), so half the cores are idle. The file is a 50MB h.264 video encoded by ffmpeg.
Results are slightly faster for random but even worse for accurate seeks when running on GPU:
root@428b7218513b:/orb# python3 decord/tests/benchmark/bench_decord.py --file test.mp4 --gpu 0
15000 frames, elapsed time for sequential read: 5.069800615310669
300 frames, elapsed time for random access(not accurate): 0.4220912456512451
300 frames, elapsed time for random access(accurate): 535.492870092392
This is on a RTX 2070 super.
Other specs:
# python3 -V
Python 3.6.9
# pip3 list | grep decord
decord 0.4.2
Comparison benchmarks (don’t have pyav, so for opencv only):
# python3 decord/tests/benchmark/bench_
bench_decord.py bench_opencv.py bench_pyav.py
root@428b7218513b:/orb# python3 decord/tests/benchmark/bench_opencv.py --file test.mp4
15000 frames. Elapsed time for sequential read: 5.646612882614136
300 frames, elapsed time for random access(not accurate): 13.583254337310791
If there’s anything else I can do to help debugging, let me know!
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 25 (15 by maintainers)
I had a similar problem in simply doing something like this on cpu on ubuntu:
for i in range(10): frame = vr[i*100].asnumpy()
Each line got slower as the loop progressed. The problem went away when I reverted from 0.4.2 back to 0.4.0
Can you print the keyframes of the testing video you have? You can call
vr.get_key_indices(). Another potential cause is the new RTX 2000 series with tensor cores. I haven’t really tried the decoding with the newer cards, and not exactly sure the difference in cuviddec since I am using header definition from older cuda drive which drivers have been updated.