tfjs: model inference causes unresponsiveness and even system crash when using `webgl` backend

model inference causes unresponsiveness and even system crash when using webgl backend
while running perfectly fine using tfjs-node

using webgl in browser

gpu memory usage raises to >4GB although model is not that heavy at all

inference time is measured as below 40 ms, but actual wall time between frames is closer to 3,000 ms!

during that time browser is completely unresponsive (not just active tab)
and overall system responsiveness is reduced

and after just several frames it will result in either browser crash or webgl error logged in console

it even resulted in a system crash - BSOD with stop code VIDEO_SCHEDULER_INTERNAL_ERROR !!!

yes, its a client-side code that can result in a system crash - doesn’t get much worse than that

its almost like a some webgl call is causing something really bad to happen between browser and gpu drivers

using tfjs-node-gpu in node

works without any problems
low memory usage and inference time below 30ms

even round-trip workflow (browser client talking via websockets to server that does processing) results in pretty good frame rate and no overall issues

model

model itself is a simple tfjs graph model with 8.8MB weights
it takes 720x720 image as input and produces 720x720 image as output
converted from tf saved model with original at https://systemerrorwang.github.io/White-box-Cartoonization/

reproduction

full reproduction using https://github.com/vladmandic/anime

environment

  • tensorflow 3.19.0
  • chome 103.0.1264.71
  • windows 11 build 22621.436
  • nvidia drivers 516.59

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

@pyu10055 confirmed!

actually, its not 10x, its closer to 20x on my system plus no crashes
great job with #6639

setInterval would cause a crash due to exploding overlapping inference requests, but why would setTimeout cause any problems?