decord: Extremely slow accurate seek

Hi, I’ve hit an unexpected regression with accurate seeks - they seem about one order of magniture too slow. This is with a CUDA-enabled manual install from current git HEAD:

python3 decord/tests/benchmark/bench_decord.py --file test.mp4
15000  frames, elapsed time for sequential read:  9.465791463851929
300  frames, elapsed time for random access(not accurate):  2.511744737625122
300  frames, elapsed time for random access(accurate):  416.69994831085205

The machine (16 true core, nvme SSD) is neither CPU nor IO limited during the benchmark. CPU load is ~8 (out of 16), so half the cores are idle. The file is a 50MB h.264 video encoded by ffmpeg.

Results are slightly faster for random but even worse for accurate seeks when running on GPU:

root@428b7218513b:/orb# python3 decord/tests/benchmark/bench_decord.py --file test.mp4 --gpu 0
15000  frames, elapsed time for sequential read:  5.069800615310669
300  frames, elapsed time for random access(not accurate):  0.4220912456512451
300  frames, elapsed time for random access(accurate):  535.492870092392

This is on a RTX 2070 super.

Other specs:

# python3 -V
Python 3.6.9
# pip3 list | grep decord
decord                 0.4.2

Comparison benchmarks (don’t have pyav, so for opencv only):

# python3 decord/tests/benchmark/bench_
bench_decord.py  bench_opencv.py  bench_pyav.py
root@428b7218513b:/orb# python3 decord/tests/benchmark/bench_opencv.py --file test.mp4
15000  frames. Elapsed time for sequential read:  5.646612882614136
300  frames, elapsed time for random access(not accurate):  13.583254337310791

If there’s anything else I can do to help debugging, let me know!

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 25 (15 by maintainers)

Most upvoted comments

I had a similar problem in simply doing something like this on cpu on ubuntu:

for i in range(10): frame = vr[i*100].asnumpy()

Each line got slower as the loop progressed. The problem went away when I reverted from 0.4.2 back to 0.4.0

Can you print the keyframes of the testing video you have? You can call vr.get_key_indices(). Another potential cause is the new RTX 2000 series with tensor cores. I haven’t really tried the decoding with the newer cards, and not exactly sure the difference in cuviddec since I am using header definition from older cuda drive which drivers have been updated.