shaka-player: [ViacomCBS] Samsung fatal stall on preroll to first content period.

Have you read the FAQ and checked for duplicate open issues? Yes completely.

What version of Shaka Player are you using? Tested in Shaka 2.5.12, 2.5.11, 2.5.5 and 2.5.1 - Same results on all

Can you reproduce the issue with our latest release version? Yes

Can you reproduce the issue with the latest code from master? Yes

Are you using the demo app or your own custom app? Demo raw simple setup. Page urls included in support email with content and la urls.

If custom app, can you reproduce the issue using our demo app? yes

What browser and OS are you using? This issue is only being reported and reproduced by our team for SOME Samsung Tizen based Smart TV mainly years 2017 and 2018. 2019 and 2020 year seems unaffected. Some 2017 and 2018 are also working as expected!!

For embedded devices (smart TVs, etc.), what model and firmware version are you using? <MODEL numbers here and if they fail or pass which test. >

  • 2017 UN50MU6070 Fails 100%
  • 2018 UN55NU710D Fails 100%
  • 2018 UN32N5300AF PASSES
  • 2018 UN49NU8000 PASSES

What are the manifest and license server URIs? Sent over on shaka-player-issues@google.com

What did you do?

  • Play from 0 start time and let preroll finish, on the first transition to a content period player stalls.
  • When a samsung model year has the issue, it happens 100%
  • Does not happen with the exact same encode of the stream that is not passed though DAI.
  • Does not happen with the exact same encode pass to DAI but no DRM.
  • In logs. DRM keysystem, when we fail, both seem to init fine. I tested WV and PR here.
  • Non zero test are interesting.
    • If you skip the pre-roll and start past that second in stream… all mid-roll DAI period transition thereafter works as expected.
    • Stream is fine on all TVs we tested.
  • Dash.js does play the same DAI/DRM steam without issue on all the tested devices.
    • Shaka is much better overall quality of experience so it is preferred on Samsung

I am also including logs in the support email with content. These logs will be from both fail and pass captures in a few different versions of Shaka mentioned above.

What did you expect to happen? The first content period initializes and plays out.

What actually happened? The content period fails to init and play properly.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 53 (51 by maintainers)

Commits related to this issue

Most upvoted comments

  1. Ask @dsparacio and @piyush2dev to test the fix in their own applications.

The fix is out in the master branch. Please test in your applications!

These fixes can be backported to v3.0.x and potentially to v2.5.x as well.

Notes from the day’s investigations. TL;DR: Good news, everyone! I may have a workaround!


I changed the test as follows:

  1. Play 30 seconds of unencrypted, single-period DASH content with no ads
  2. Seek forward to 15 seconds past the end of the buffered range (rather than close to the end, to keep the numbers simple) and round up to a segment boundary (since the platform seems to jump backward to a keyframe anyway)
  3. Don’t clear the buffer on an unbuffered seek, and don’t reset StreamingEngine internal state
  4. Make the stall detector sit around for 8 seconds before trying to seek (2 segment durations)
  5. Disable audio (to simplify buffering/streaming/decoding)

Playback still stalls out after the seek.

We buffer up to time 44 while playing to time 31. Then we seek to time 60.

Since we didn’t clear the buffer or reset StreamingEngine state, it kept streaming linearly. Effectively, the playhead seeked, but not StreamingEngine. This means streaming and buffering continued without interruption from 0 to 72, while the stall occurred at time 60. It remained stalled out for 20 seconds before the test timed out.

Looking for possible recovery steps that could shed light on the hang, I added one more step to the test. After the stall, I seeked back to time 24, a segment boundary and a time that not only was buffered, but had already been played. It remained stalled out at 24 for 20 seconds before the test timed out.

I noticed that the timing of the “appended media segment” messages in the logs speeds up significantly after the seek. Not sure what that means. Maybe they aren’t being decoded because they are behind the playhead?

Next I tried recovering by calling pause and play, with 1 second delays before each of them. This worked!

Next I tried making the test call pause & play immediately after seeking, rather than waiting for the stall/timeout, still with 1-second delays. This worked, too!

Next I tried calling play only, not pause, 1 second after seeking. This doesn’t work, which makes some kind of sense, given that the state of the video element was paused: false at that time.

sequence result
seek, delay, pause, delay, play OK
seek, delay, play FAIL
seek, pause, delay, play OK
seek, pause, play FAIL

So the delay seems to be important. It’s not yet clear how large a delay is necessary between pause and play. (It may be a thing that we have to poll state on before calling play.) I also have lots of other hacks in place, which now need to be unwound one by one to determine if any of them are helping, or if pause/play is doing the job all on its own. Finally, if this turns out to be the big winner, it’s not clear when this workaround should be called. (On seek, on stall, just on Tizen, on all TVs, on all platforms, etc)

Calling video.pause(), then polling state on video.paused to call video.play()… sadly doesn’t work. After calling video.pause(), video.paused is immediately true, which follows from the HTML video spec. But some hidden state deeper within Tizen does not seem to update immediately, so some hard-coded delay will be necessary. More work will be required to determine what that is.

delay result
1s OK
0s FAIL
500ms FAIL
800ms OK

After this, I noticed that one difference between the successful 800ms and the failing 500ms was the timing of appending the segments. At 500ms, the content at the target time was not yet buffered. At 800ms, it was buffered. So the workaround really only seems to help after content is appended to the position of the playhead. Perhaps instead of applying this workaround on seek, it should be applied by the stall detector, which only fires when the playhead is buffered. It could be a more effective stall recovery mechanism than seeking, particularly on TV platforms (which seem to need stall detection more than desktops anyway).


I reverted all of my other hacks, and I patched the stall detector so that when streaming.stallSkip is configured to 0, we call pause & play (with no delay) instead of seeking. Then I changed the default config to 0. The simple, unencrypted DASH content now passes the automated test!

Next steps:

  1. Test encrypted demo content on Tizen with the fix.
  2. Test content from @dsparacio and @piyush2dev on Tizen with the fix.
  3. Test all demo content on Tizen with and without the improved stall detector to prove that it is truly an improvement in all cases.
  4. Get the fix into code review.
  5. Ask @dsparacio and @piyush2dev to test the fix in their own applications.

But please let us know if v2.5.14 works on your 2017 Tizen device. It was playing in our tests of our own 2017 device.

Because most of the comments on this issue are focused on Tizen, and because we may not be able to work on XBox at this time, I think opening a separate ticket for XBox would make sense. It may help us keep track of things that way.

@dsparacio, sounds good to me. I’m cleaning up additional Tizen-related issues in our tests (test coverage for this platform was down until recently) to make sure my fix for the stall doesn’t have any unintended side-effects, and then I should have everything in code review soon. So far, it looks like the right fix.

@piyush2dev, please follow along for updates as we work on this issue. I’m not using your content specifically at the moment, but you may be experiencing the same issue as @dsparacio. As soon as we have a fix or tentative workaround for some part of the problem, we will retest everything and include your content in that.

I’ve been focused until now on the content from @dsparacio, but I now have reproduced the Tizen stalling behavior on clear, single-period content created by us for the demo. This eliminates IMA and encryption from the picture, which means the problem space just got simpler. The exact behavior I’m looking at (stall on seek in our automated asset test) may or may not turn out to be the same as what’s reported by @dsparacio and @piyush2dev, but I’m hopeful that a fix for this will be relevant for both of you.

The content I’m using now is https://storage.googleapis.com/shaka-demo-assets/sintel/dash.mpd

I’ll resume this on Monday.

We now have access to our Tizen device again, at least as far as automated tests in Karma/Jasmine are concerned. I’ll try to focus on reproducing this situation in an automated test. After that, we may be able to use that as a platform to debug and make progress on a workaround.

Thank you for the extra info. We also have an Xbox One in the lab, but we’ve never had any automation or remote access for that platform, and we’re all still working from home.

The Tizen device in our lab, however, will be accessible again soon. I’ve been working on repairing the infrastructure that will allow us to access it remotely in automated tests. It won’t be as good as interactive debugging, but hopefully, with that and some concentrated effort on reproducing the issue through automation, it might be enough for us to make progress.

We apologize for the inconvenience and the time it has taken us to help with this, and we thank you again for your patience.

I’m not sure how we could tear that down cleanly, or how we would make a special case for it. It would be a big change from what we do now.

I’ll be trying out your content shortly in Shaka Player v3.0, in which the way we parse and represent multi-period DASH has completely changed. It’s possible that this will make a difference.