fission: Executor unable to respond to router under massive traffic
In PR https://github.com/fission/fission/commit/f3fe3123b3130afdc661cdde648d04e5efae70df , router no longer checks it’s own cache before talking to executor which become a problem if router is under heavy workload.
It’s dangerous for router to send a request for asking function service url whenever it receives a request from client. Router should have a mechanism that somehow able to reduce the requests to executor.
How to reproduce
- Create two hello-world functions with executor type
newdeploy.hw1&hw2 - Call to hw2 with
ab -n 500000 -c 200 http://<router-ip>/hw2 - Before
abis finished, callhw1 - The call to
hw1failed and router prints logs
router-598c9f685d-2s6bg router 2020-11-20T01:13:33.887Z ERROR triggerset.http_trigger_set.tr
router/functionHandler.go:535 error from GetServiceForFunction {"error": "error posting to getting
service for function: context deadline exceeded", "errorVerbose": "context deadline exceeded\nerror
posting to getting service for function\ngithub.com/fission/fission/pkg/executor/client.
(*Client).GetServiceForFunction\n\t/go/src/pkg/executor/client/client.go:78\ngithub.com/fission/fission
/pkg/router.functionHandler.getServiceEntryFromExecutor\n\t/go/src/pkg/router/functionHandler.go:532\ng
ithub.com/fission/fission/pkg/router.
(*RetryingRoundTripper).RoundTrip\n\t/go/src/pkg/router/functionHandler.go:187\nnet/http/httputil.
(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:259\ngithub.com/fissio
n/fission/pkg/router.functionHandler.handler\n\t/go/src/pkg/router/functionHandler.go:424\nnet/http.Han
dlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:1995\ngithub.com/gorilla/mux.
(*Router).ServeHTTP\n\t/go/pkg/mod/github.com/gorilla/mux@v1.7.0/mux.go:212\ngithub.com/fission/fission
/pkg/router.
(*mutableRouter).ServeHTTP\n\t/go/src/pkg/router/mutablemux.go:52\ngo.opencensus.io/plugin/ochttp.
(*Handler).ServeHTTP\n\t/go/pkg/mod/go.opencensus.io@v0.22.0/plugin/ochttp/server.go:86\nnet/http.serve
rHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2774\nnet/http.
(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1878\nruntime.goexit\n\t/usr/local/go/src/runtime
/asm_amd64.s:1337", "error_message": "error posting to getting service for function: context deadline
exceeded", "function": {"kind":"Function","apiVersion":"fission.io/v1","metadata":
{"name":"tracker2","namespace":"default","selfLink":"/apis/fission.io/v1/namespaces/default/functions/t
racker2","uid":"8e7f614b-896c-4696-b872-
88518fb955b8","resourceVersion":"1001184","generation":1,"creationTimestamp":"2020-11-
20T00:48:52Z","managedFields":[{"manager":"fission-
bundle","operation":"Update","apiVersion":"fission.io/v1","time":"2020-11-20T00:48:52Z"}]},"spec":
{"environment":{"namespace":"default","name":"nodebeta4"},"package":{"packageref":
{"namespace":"default","name":"tracker2-713cdf6f-4fa6-425a-a6b4-
e53d4dc0144a","resourceversion":"1001179"}},"secrets":null,"configmaps":
[{"namespace":"default","name":"fissionfunction-tracker2"}],"resources":{"limits":
{"cpu":"100m"},"requests":{"cpu":"1m"}},"InvokeStrategy":{"ExecutionStrategy":
{"ExecutorType":"newdeploy","MinScale":3,"MaxScale":12,"TargetCPUPercent":50,"SpecializationTimeout":20
0},"StrategyType":"execution"},"functionTimeout":5,"idletimeout":120,"concurrency":200}},
"status_code": 500}
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (8 by maintainers)
I tried the scenario mentioned in the issue description, with the latest code(upcoming 1.14 release).
Setup
Create a kind cluter locally and two fission functions with newdeploy.
Running load
In meantime I made requests hw1 also concurrently, requests were getting processed successfully. I did small workloads as,
So we can assume bottlenecks for new deploy seems a lot lower.
This analysis may not apply for pool-manager as for pool-manager we will still reach out to the executor for each request & we don’t utilize router cache.