prometheus-msteams: Chunk alerts when the request body is too large.
I found a bug wherein 413 “too large” errors from MS Teams are not being caught properly.
Looking at the logs will show that everything was okay, and that the HTTP response code was 200:
time="2018-12-07T01:33:32Z" level=info msg="A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK"
But it doesn’t show up in MS Teams.
I took the request body from the logs and sent it with curl. This is what happens (verbose mode):
> Content-Type: application/json
> Content-Length: 19609
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* We are completely uploaded and fine
< HTTP/2 200
< cache-control: no-cache
< pragma: no-cache
< content-length: 185
< content-type: text/plain; charset=utf-8
< expires: -1
< request-id: 9243cf0d-6de1-4a2e-9ebe-2f0647b3ceb6
< x-calculatedfetarget: ME2PR01CU007.internal.outlook.com
< x-backendhttpstatus: 200
< x-feproxyinfo: ME2PR01CA0157.AUSPRD01.PROD.OUTLOOK.COM
< x-calculatedfetarget: SY2PR01CU001.internal.outlook.com
< x-backendhttpstatus: 200
< x-feproxyinfo: SY2PR01CA0011.AUSPRD01.PROD.OUTLOOK.COM
< x-calculatedbetarget: SYAPR01MB2863.ausprd01.prod.outlook.com
< x-backendhttpstatus: 200
< x-aspnet-version: 4.0.30319
< x-cafeserver: SY2PR01CA0011.AUSPRD01.PROD.OUTLOOK.COM
< x-beserver: SYAPR01MB2863
< x-rum-validated: 1
< x-feserver: SY2PR01CA0011
< x-feserver: ME2PR01CA0157
< x-powered-by: ASP.NET
< x-feserver: HK0PR03CA0043
< x-msedge-ref: Ref A: 8B559015A2E546309D7219160649965E Ref B: HK2EDGE1006 Ref C: 2018-12-07T01:34:28Z
< date: Fri, 07 Dec 2018 01:34:27 GMT
<
* Connection #0 to host outlook.office.com left intact
Webhook message delivery failed with error: Microsoft Teams endpoint returned HTTP error 413 with ContextId tcid=8241112485782354073,server=SG2PEPF00000467,cv=oRcpKL9MdU6iz4isFYTR6A.0..
So, infuriatingly, the HTTP response code is 200, but because there are too many alerts being sent at once, the actual response code is 413, but it’s in the body.
(It’s because the endpoint is a proxy for the actual MS Teams endpoint behind it, and the inner endpoint is the one giving the 413 HTTP Code, but that’s not important right now.)
We need a fix to be able to set a maximum size for each call to the webhook, and then just send successive calls if it all doesn’t fit in a single call.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 25
Hi folks, this is now available in v1.0.3 release. Thanks for the help @LudwigTirazona @shabeermm and @Knappek for the code enhancement!
Cool, when I find some time, I can implement it and send a PR.
I was checking whether Slack has similar limits, and it has (indeed)[https://api.slack.com/changelog/2018-04-truncating-really-long-messages]. But the difference to MS teams is that Slack handles it automatically, which is exactly what is suggested in this enhancement to split messages.
So I think this is a good idea, but I would not do it like suggested from @shabeermm , I would rather check in the code for the size of the message and if it exceeds the size limit (which is 25KB according to the official docu ) I would split the message and send it out separately.
What do you think?
Thank you very much knappek & bzon for enhancing the code to fix this. we have tested it on our dev env and it is working fine. Since i was on vacation, i could not test it earlier. Thanks to Ludwig and others for your contributions as well.
Logs follows; … time=“2019-02-08T05:06:40Z” level=debug msg=“Size of message is 153969 Bytes (~150 KB)” time=“2019-02-08T05:06:40Z” level=info msg=“Sending out 12 messages …” time=“2019-02-08T05:06:40Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:40Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:41Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:41Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:41Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:41Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:42Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:42Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:42Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:42Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:42Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:42Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:43Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:43Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:43Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:43Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:43Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:43Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:44Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:44Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:44Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:44Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” time=“2019-02-08T05:06:44Z” level=info msg=“Microsoft Teams response text: 1” time=“2019-02-08T05:06:44Z” level=info msg=“A card was successfully sent to Microsoft Teams Channel. Got http status: 200 OK” …
Please find the attached log file from my dev system, which has the request and response along with error returned by MS Teams (i have renamed/masked the environment specific text/details for security). Also find the corresponding big json data (attached) along with curl command using which we can reproduce the error.
I think data can be splitted based on number of alerts (which should be configurable) in a group. You may provide an external variable which users can provide either in a config file or in argument (eg:- ALERTS_COUNT_IN_A_CHUNK=10). Users may increase or decrease this count based on the nature of alert size in their environments. Also you may generate and append a chunkID when you are doing chunking, which will help users in searching chunked alerts. Just a suggestion. Your / Others would have even more nice ideas.
json-data–to-reproduce-error-14Jan2019-001.txt error-log-with-json-data-14Jan2019-001.txt
If you’re reopening this, then I’d love to help with generating the data needed.
I can provide it in a day or two.
On Mon, Jan 14, 2019, 16:44 John Bryan Sazon <notifications@github.com wrote: