pravega: System test: txn commit slowdown

Problem description We are observing a very important slowdown of txn commits in system test runs. Here is a sample as observed by @shiveshr in a failed run of readtxnwritescalewithfailovertest:

Time | # of commits
8:48    54
8:49   31
8:50   34
8:51  37
8:52  38
8:53  11
8:54   13
// scale started
8:55    12
8:56    6
8:57    5
8:58    4
8:59    5
9:00    4
9:01   3
9:02   3
9:03   1
9:04   1
9:05  3
9:06  3
9:07  3
9:08  3
9:09  2
9:10  2
9:11  2
9:12  2
9:13  1

Further investigation of the issue seems to indicate of a slowdown of the segment store to commit transactions after failover. It is not clear, however, what is causing the slow down precisely

Problem location Transaction commits, controller, segment store

Suggestions for an improvement Investigate the bug and determine the cause of the slowdown.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 30 (30 by maintainers)

Most upvoted comments

@shiveshr It is not a good idea to derive performance conclusions out of these tests because they are running on pretty scarce resources, we are not provisioning IO properly in these clusters. BK latency is much shorter than that in a regular deployment, there is enough evidence of that even in our own testing.

In any case, I just wanted to illustrate that the writes themselves were not taking seconds, so it is not BK that is inducing the multi-second latency we are observing.