fission: Executor unable to respond to router under massive traffic

In PR https://github.com/fission/fission/commit/f3fe3123b3130afdc661cdde648d04e5efae70df , router no longer checks it’s own cache before talking to executor which become a problem if router is under heavy workload.

It’s dangerous for router to send a request for asking function service url whenever it receives a request from client. Router should have a mechanism that somehow able to reduce the requests to executor.

How to reproduce

  1. Create two hello-world functions with executor type newdeploy. hw1 & hw2
  2. Call to hw2 with ab -n 500000 -c 200 http://<router-ip>/hw2
  3. Before ab is finished, call hw1
  4. The call to hw1 failed and router prints logs
router-598c9f685d-2s6bg router 2020-11-20T01:13:33.887Z	ERROR	triggerset.http_trigger_set.tr	
router/functionHandler.go:535	error from GetServiceForFunction	{"error": "error posting to getting 
service for function: context deadline exceeded", "errorVerbose": "context deadline exceeded\nerror 
posting to getting service for function\ngithub.com/fission/fission/pkg/executor/client.
(*Client).GetServiceForFunction\n\t/go/src/pkg/executor/client/client.go:78\ngithub.com/fission/fission
/pkg/router.functionHandler.getServiceEntryFromExecutor\n\t/go/src/pkg/router/functionHandler.go:532\ng
ithub.com/fission/fission/pkg/router.
(*RetryingRoundTripper).RoundTrip\n\t/go/src/pkg/router/functionHandler.go:187\nnet/http/httputil.
(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:259\ngithub.com/fissio
n/fission/pkg/router.functionHandler.handler\n\t/go/src/pkg/router/functionHandler.go:424\nnet/http.Han
dlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:1995\ngithub.com/gorilla/mux.
(*Router).ServeHTTP\n\t/go/pkg/mod/github.com/gorilla/mux@v1.7.0/mux.go:212\ngithub.com/fission/fission
/pkg/router.
(*mutableRouter).ServeHTTP\n\t/go/src/pkg/router/mutablemux.go:52\ngo.opencensus.io/plugin/ochttp.
(*Handler).ServeHTTP\n\t/go/pkg/mod/go.opencensus.io@v0.22.0/plugin/ochttp/server.go:86\nnet/http.serve
rHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2774\nnet/http.
(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1878\nruntime.goexit\n\t/usr/local/go/src/runtime
/asm_amd64.s:1337", "error_message": "error posting to getting service for function: context deadline 
exceeded", "function": {"kind":"Function","apiVersion":"fission.io/v1","metadata":
{"name":"tracker2","namespace":"default","selfLink":"/apis/fission.io/v1/namespaces/default/functions/t
racker2","uid":"8e7f614b-896c-4696-b872-
88518fb955b8","resourceVersion":"1001184","generation":1,"creationTimestamp":"2020-11-
20T00:48:52Z","managedFields":[{"manager":"fission-
bundle","operation":"Update","apiVersion":"fission.io/v1","time":"2020-11-20T00:48:52Z"}]},"spec":
{"environment":{"namespace":"default","name":"nodebeta4"},"package":{"packageref":
{"namespace":"default","name":"tracker2-713cdf6f-4fa6-425a-a6b4-
e53d4dc0144a","resourceversion":"1001179"}},"secrets":null,"configmaps":
[{"namespace":"default","name":"fissionfunction-tracker2"}],"resources":{"limits":
{"cpu":"100m"},"requests":{"cpu":"1m"}},"InvokeStrategy":{"ExecutionStrategy":
{"ExecutorType":"newdeploy","MinScale":3,"MaxScale":12,"TargetCPUPercent":50,"SpecializationTimeout":20
0},"StrategyType":"execution"},"functionTimeout":5,"idletimeout":120,"concurrency":200}}, 
"status_code": 500}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I tried the scenario mentioned in the issue description, with the latest code(upcoming 1.14 release).

Setup

Create a kind cluter locally and two fission functions with newdeploy.

fission env create --name nenv1 --image fission/node-env --poolsize 1
fission env update --name nenv1 --poolsize 0
fission env create --name nenv2 --image fission/node-env --poolsize 1
fission env update --name nenv2 --poolsize 0
fission fn create --name hw1 --env nenv1 --code examples/nodejs/hello.js --executortype newdeploy
fission fn create --name hw2 --env nenv2 --code examples/nodejs/hello.js --executortype newdeploy
fission route create --name hw1 --function hw1 --url /hw1
fission route create --name hw2 --function hw2 --url /hw2

Running load

$ ab -n 500000 -c 200 http://localhost:8888/hw2
Finished 500000 requests
Concurrency Level:      200
Time taken for tests:   2667.757 seconds
Complete requests:      500000
Failed requests:        0
Total transferred:      97000000 bytes
HTML transferred:       7000000 bytes
Requests per second:    187.42 [#/sec] (mean)
Time per request:       1067.103 [ms] (mean)
Time per request:       5.336 [ms] (mean, across all concurrent requests)
Transfer rate:          35.51 [Kbytes/sec] received

In meantime I made requests hw1 also concurrently, requests were getting processed successfully. I did small workloads as,

$ ab -n 100  -c 10  http://localhost:8888/hw1
Concurrency Level:      10
Time taken for tests:   10.739 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      19400 bytes
HTML transferred:       1400 bytes
Requests per second:    9.31 [#/sec] (mean)
Time per request:       1073.850 [ms] (mean)
Time per request:       107.385 [ms] (mean, across all concurrent requests)
Transfer rate:          1.76 [Kbytes/sec] received

So we can assume bottlenecks for new deploy seems a lot lower.

This analysis may not apply for pool-manager as for pool-manager we will still reach out to the executor for each request & we don’t utilize router cache.