waggle-dance: Performance degradation on tunneled connection

I’ve experienced performance degradation when upgraded from 2.2.2 to 2.3.7. see measurements in attachment which was made by Spark application calling spark.catalog.listTables(). newer WD is 3 times slower impacting the ssh-tunneled connections (see highlighted rows) the most. image

how much can it be eliminated?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (16 by maintainers)

Most upvoted comments

@bAndie91 , @patduin: I did some investigation and found this:

Test Case 1 - listTables on non-tunneled connection

wd version run-1 run-2
2.3.0 20 s 19 s
2.3.1 20 s 23 s
2.3.2 25 s 24 s
2.3.3 36 s 32 s
2.3.4 28 s 25 s
2.3.5 44 s 56 s
2.3.6 48 s 46 s

image

Test Case 2 - listTables on tunneled connection

wd version run-1 run-2
2.3.0 5 m 37 s 4 m 52 s
2.3.1 4 m 59 s 5 m 04 s
2.3.2 5 m 15 s 5 m 11 s
2.3.3 7 m 27 s 7 m 12 s
2.3.4 5 m 14 s 5 m 15 s
2.3.5 13 m 02 s 13 m 27 s
2.3.6 12 m 33 s 13 m 07 s

image

Summary I had 2 runs for both test cases, the test durations are pretty much consistent between runs and we can observe ~150% performance degradation between 2.3.4 and 2.3.5 releases.

IMPORTANT: The performance degradation does not seem to be specific to tunneled connection, the same trend can be observed in both cases.

@bAndie91 , @patduin

Re-run the test cases on the version built from issue-115 branch and added the results to the charts.

Test Case 1 - listTables on non-tunneled connection

image

Test Case 2 - listTables on tunneled connection

image

I can confirm that the fix resolves the performance degradation issue.

Also the RetryingMetaStoreClient:184 - MetaStoreClient lost connection. Attempting to reconnect. warning disappeared from the logs.

@patduin all the metastore connections are AVAILABLE during the tests run.

@patduin Sure, will check that branch and get back to you with the results.