serving: Address istio mesh e2e mesh failures

/area test-and-release /area networking /kind cleanup /kind good-first-issue

Expected Behavior

e2e tests pass when running our e2e tests with istio and the local gateway disabled (mesh only) with mTLS enabled (STRICT). At a minimum we should be skipping tests we know that will fail when strict mTLS is enabled.

This also affects DomainMapping but that’s being tracked by https://github.com/knative-sandbox/net-istio/issues/562.

Actual Behavior

The following tests will always fail since we have global mTLS turned on (STRICT)

test/e2e: TestSvcToSvcViaActivator/both-disabled
test/e2e: TestSvcToSvcViaActivator/a-disabled
test/e2e: TestCallToPublicService/local_address 

Steps to Reproduce the Problem

  1. Ensure your net-istio latest/stable mesh has the right PeerAuthentication to ensure global mTLS is on ie. https://github.com/knative-sandbox/net-istio/pull/617
  2. Once https://github.com/knative/serving/pull/11175 merges open a PR with a noop change and run
/test pull-knative-serving-istio-latest-mesh
/test pull-knative-serving-istio-stable-mesh

Additional Info

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 25 (22 by maintainers)

Commits related to this issue

Most upvoted comments

So we will proceed this issue by the following steps:

  1. Do not enable local-gateway.mesh in e2e (remove it from net-istio).
  2. Consider (open an issue/send a mail knative-user@) to deprecate local-gateway.mesh option. (The deprecation was mentioned in https://github.com/knative-sandbox/net-istio/issues/562#issuecomment-809934621. And we should deprecate it as we no longer test it.)
  3. Add a new test to verify svc-to-svc in mesh does not use local gateway.

I will send the PRs so please help your reviews 🙏

Now, mesh test fails with only scale-200. HA test and TestHPAAutoscaleUpDownUp fails sometime but it was also caused by scale-200 test 😓 (It can be fixed by tweaking the test order as https://github.com/knative/serving/pull/11552).

Should we close this issue? Or keep open until scale-200 test could be fixed?

Thanks all, sorry for dropping the ball on the previous PR (got busy with other things and then went on vacation for 2 weeks). Sounds like we’re going with a different solution and assign this issue to kenjiro.

/assign @nak3