flank: Test count and smart sharding count dont match

Describe the bug

A clear and concise description of what the bug is.

Test result shows app module has 349 tests but smart sharding show and calculate with 232.

To Reproduce

Steps to reproduce the behavior:

It happens with out project. i dont have a repo to share.

Expected behavior

A clear and concise description of what you expected to happen.

Test result and smart sharding should have same count.

Details (please complete the following information):

Have you tested on the latest Flank snapshot?

Post the output of flank --version.

Flank 20.08.1 In the repo, there are Ignored, suppressed and parameterized tests. With no orchectrator

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

tl;dr

Flank does not support parameterized tests sharding. Every class with parameterized is considered as one test during shard calculation.


Flank is using DEX parser to decompile apks and gather info about all the tests inside. As for now, Flank is unable to determine how many times a test in parameterized class is invoked. Due to this fact scans apks for any class with an annotation that contains JUnitParamsRunner or Parameterized:

@RunWith(JUnitParamsRunner::class)
...
@RunWith(Parameterized::class)

and consider this class as a single 'test`, which means - ‘put all test from this class into this shard’

Example:
@RunWith(JUnitRunner::class)
class A {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(Parameterized::class)
class B(param1, param2, ...) {
    fun test1() {...}

    companion object {
        @JvmStatic
        @Parameterized.Parameters
        fun data() = listOf(
                arrayOf(...),
                arrayOf(...),
                arrayOf(...)
        )
    }
}

@RunWith(JUnitRunner::class)
class C {
    fun test1() {...}
    fun test2() {...}
    fun test3() {...}
}

@RunWith(JUnitParamsRunner::class)
class D {
    @Parameters(value = [a,b,c,d]
    fun test1() {...}
}

For apk with classes like above Flank will have a list of tests* to shard: (*of course ‘test’ from flank’s perspective)

  • class A => 3 tests (foo.A#test1, foo.A#test2, foo.A#test3)
  • class B (parameterized) => 1 test (foo.B)
  • class C => 3 test (foo.C#test1, foo.C#test2, foo.C#test3)
  • class D (parameterized) => 1 test (foo.D)

So it gives 3 + 1 + 3 + 1 = 8 and let’s say we want to have 2 shards so flank will calculate shards like

  1. shard_1 = [foo.A#test1, foo.A#test2, foo.A#test3, foo.B]
  2. shard_2 = [foo.C#test1, foo.C#test2, foo.C#test3, foo.D]

and will print (same numbers are used to print cache hit)

8 tests / 2 shards

To get outcome Flank uses FTL API which provides results for all the tests, regardless if it’s parameterized or not. So flank creates outcome for all the tests (A - 3 tests, B - 3 tests, C - 3 tests, D - 4 tests and that all sums up to 13)

┌─────────┬──────────────────────┬──────────────────────┐
│ OUTCOME │      MATRIX ID       │     TEST DETAILS     │
├─────────┼──────────────────────┼──────────────────────┤
│ success │ matrix-funnyid       │ 13 test cases passed │
└─────────┴──────────────────────┴──────────────────────┘

Conclusions:

  1. add sharding support for parameterized tests (It’s a dragon! We could create an issue to make a research at least) And I think it’s impossible to do (but I might be wrong)
  2. flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid
  3. inform differently?

@bootstraponline @jan-gogo @Sloox @piotradamczyk5 @adamfilipow92 I’d like to get your opinion

default time for class can be 2x test time = 240s

default time for classes makes sense to me, as I expect this might be different compared to individual test methods.

0 tests + 12 parameterized classes / 9 shards 50 tests + 0 parameterized classes / 9 shards

I think if there’s a 0 we should skip it.

12 parameterized classes / 9 shards 50 tests / 9 shards

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

I like it

  • add sharding support for parameterized tests (it’s a dragon! we could create an issue to make a research at least)

I think we already researched this and determined that parameterized tests can only be run by class?

  • flank could explicitly inform a user that there are parameterized test detected and test per shard number may not be valid

It might be more accurate to say classes per shard when detecting parameterized tests?

If there are no parameterized tests then it’s true to say we are doing per method sharding. If there are parameterized tests, then we do a hybrid of class sharding + per test sharding.

Here is obfuscated android_shards.json and flank.yml

flank.zip

We have a parameter --default-test-time so in a case where we have set this param I see three potential solutions

We probably want a --default-class-time, which by default can equal the default test time. Then users can independently set default times for parameterized classes vs individual test methods.

Screenshot 2020-08-27 at 06 51 04 Screenshot 2020-08-27 at 07 20 21

How about the case when either class count or test count equals 0? Do we want to print it that way:

0 tests + 12 parameterized classes / 9 shards
50 tests + 0 parameterized classes / 9 shards

or should we skip it? Or skip it only for the class count?

Flank knows exactly how many tests and classes are being sent to firebase test lab, maybe we could do something like this:

50 tests + 12 parameterized classes / 9 shards

And what is actually bizarre is we should have 542 tests 😄