vector: route transform performance regression in 0.20.0

A note for the community

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We have a route transform with 50 different routes. We run this in vector with 8 threads (-t8). Prior to 0.20.0 we could maximize CPU fully (800%), but after 0.20.0 we cannot get above ~170%. I suspect this to be related to the changes to route which turn it into a single transform internally. We can obviously run with 1 thread to get similar performance, but our deployment (kubernetes) has per replica overhead which is better to spread out across instances that are multithreaded. Is it expected for this transform to scale gracefully across many routes and many cores?

Configuration

No response

Version

vector 0.20.0 (x86_64-unknown-linux-gnu 2a706a3 2022-02-11)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 21 (11 by maintainers)

Most upvoted comments

I am very pleased to report that 0.22.0 has resolved our scalability issues. We have also seen significant (~80%) performance increase. Great work!

image (9)

Didnโ€™t know about hyperfine. Just to confirm, is this its repository? Looks like a really nice tool for any kind of benchmarking. Appreciate the tip!

Thatโ€™s the one!

Thanks so much for providing the reproduction case @hhromic !

I do see a substantial regression between 0.19.3 and 0.20.1 for this case. 0.21.1 seems to make up a significant portion of the losses though, itโ€™s 15% slower rather than more than 100% slower. Capping the demo_logs at 10000 (count) and running with hyperfine I see:

0.19.3: 703ms +/- 6.9ms
0.20.1: 1524ms +/- 13ms
0.21.1: 803ms +/- 10ms

Iโ€™m going to dig in some more to see where the 0.19.3 -> 0.20.1 regression came from, since we might be able to make more of that back, but maybe 0.21.1โ€™s performance will be good enough for you @debugmiller ?