livekit: Lots and lots of race conditions
Describe the bug
building and running the server with -race reveals a lot of race conditions going on.
Server
- Version: 0.15.6
- Environment: local dev
Client
- SDK: flutter
- Version: 0.5.6
To Reproduce Steps to reproduce the behavior:
- build the server with -race
- two clients are connected to room (one can be the go server sdk)
- See error
Expected behavior No race conditions should occur
Screenshots Lots. E.g.
==================
WARNING: DATA RACE
Write at 0x00c005dda910 by goroutine 114:
github.com/livekit/livekit-server/pkg/rtc.(*MediaTrack).ToProto()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/mediatrack.go:103 +0xa6
github.com/livekit/livekit-server/pkg/rtc.(*UpTrackManager).ToProto()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/uptrackmanager.go:76 +0x1c1
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).ToProto()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:360 +0x698
github.com/livekit/livekit-server/pkg/rtc.ToProtoParticipants()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/utils.go:51 +0xdd
github.com/livekit/livekit-server/pkg/rtc.(*Room).broadcastParticipantState()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/room.go:705 +0x93
github.com/livekit/livekit-server/pkg/rtc.(*Room).onTrackPublished()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/room.go:592 +0x84
github.com/livekit/livekit-server/pkg/rtc.(*Room).onTrackPublished-fm()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/room.go:590 +0x6d
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).handleTrackPublished()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:1405 +0xa3
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).mediaTrackReceived()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:1393 +0x484
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).onMediaTrack()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:920 +0xce
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).onMediaTrack-fm()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:910 +0x4d
github.com/pion/webrtc/v3.(*PeerConnection).onTrack·dwrap·70()
/home/matthew/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.27/peerconnection.go:459 +0x58
Previous read at 0x00c005dda910 by goroutine 111:
reflect.Value.Bool()
/nix/store/j8zd71jnc6r7lhh45jwk9ywygr4w68c9-go-1.17.8/share/go/src/reflect/value.go:285 +0x51
google.golang.org/protobuf/internal/impl.fieldInfoForScalar.func1()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/message_reflect_field.go:286 +0x28f
google.golang.org/protobuf/internal/impl.(*messageState).Range()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/message_reflect_gen.go:48 +0x21e
google.golang.org/protobuf/internal/order.RangeFields()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/order/range.go:50 +0x21a
google.golang.org/protobuf/encoding/protojson.encoder.marshalMessage()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:223 +0x452
google.golang.org/protobuf/encoding/protojson.encoder.marshalSingular()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:304 +0x6c8
google.golang.org/protobuf/encoding/protojson.encoder.marshalValue()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:248 +0x18f
google.golang.org/protobuf/encoding/protojson.encoder.marshalMessage.func1()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:232 +0x213
google.golang.org/protobuf/internal/order.RangeFields()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/order/range.go:60 +0x3d9
google.golang.org/protobuf/encoding/protojson.encoder.marshalMessage()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:223 +0x452
google.golang.org/protobuf/encoding/protojson.MarshalOptions.marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:136 +0x1cb
google.golang.org/protobuf/encoding/protojson.MarshalOptions.Marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:110 +0xa4
google.golang.org/protobuf/encoding/protojson.Marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/encoding/protojson/encode.go:39 +0xa5
github.com/livekit/protocol/webhook.(*notifier).Notify()
/home/matthew/go/pkg/mod/github.com/livekit/protocol@v0.12.0/webhook/notifier.go:50 +0x73
github.com/livekit/livekit-server/pkg/telemetry.(*telemetryServiceInternal).notifyEvent.func1()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/telemetry/telemetryserviceinternalevents.go:278 +0x8f
github.com/gammazero/workerpool.worker()
/home/matthew/go/pkg/mod/github.com/gammazero/workerpool@v1.1.2/workerpool.go:243 +0x34
github.com/gammazero/workerpool.startWorker·dwrap·6()
/home/matthew/go/pkg/mod/github.com/gammazero/workerpool@v1.1.2/workerpool.go:234 +0x39
In general, there seem to be a lot involving how protobuf msgs are constructed.
==================
WARNING: DATA RACE
Write at 0x00c005dda928 by goroutine 114:
github.com/livekit/livekit-server/pkg/rtc.(*MediaTrack).ToProto()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/mediatrack.go:111 +0x217
github.com/livekit/livekit-server/pkg/rtc.(*MediaTrackSubscriptions).AddSubscriber()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/mediatracksubscriptions.go:270 +0x153c
github.com/livekit/livekit-server/pkg/rtc.(*MediaTrackReceiver).AddSubscriber()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/mediatrackreceiver.go:198 +0x3d4
github.com/livekit/livekit-server/pkg/rtc.(*MediaTrack).AddSubscriber()
<autogenerated>:1 +0x77
github.com/livekit/livekit-server/pkg/rtc.(*UpTrackManager).AddSubscriber()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/uptrackmanager.go:127 +0x67b
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).AddSubscriber()
<autogenerated>:1 +0xb9
github.com/livekit/livekit-server/pkg/rtc.(*Room).onTrackPublished()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/room.go:615 +0x745
github.com/livekit/livekit-server/pkg/rtc.(*Room).onTrackPublished-fm()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/room.go:590 +0x6d
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).handleTrackPublished()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:1405 +0xa3
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).mediaTrackReceived()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:1393 +0x484
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).onMediaTrack()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:920 +0xce
github.com/livekit/livekit-server/pkg/rtc.(*ParticipantImpl).onMediaTrack-fm()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/rtc/participant.go:910 +0x4d
github.com/pion/webrtc/v3.(*PeerConnection).onTrack·dwrap·70()
/home/matthew/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.27/peerconnection.go:459 +0x58
Previous read at 0x00c005dda928 by goroutine 129:
google.golang.org/protobuf/internal/impl.pointer.Elem()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/pointer_unsafe.go:119 +0x3f7
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/encode.go:136 +0x3a9
google.golang.org/protobuf/internal/impl.appendMessageSliceInfo()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/codec_field.go:485 +0x20e
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/encode.go:139 +0x482
google.golang.org/protobuf/internal/impl.appendMessageSliceInfo()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/codec_field.go:485 +0x20e
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/encode.go:139 +0x482
google.golang.org/protobuf/internal/impl.appendMessageInfo()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/codec_field.go:238 +0x190
google.golang.org/protobuf/internal/impl.(*MessageInfo).initOneofFieldCoders.func4()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/codec_field.go:96 +0x105
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshalAppendPointer()
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/encode.go:107 +0xd0
google.golang.org/protobuf/internal/impl.(*MessageInfo).marshal-fm()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/internal/impl/encode.go:100 +0xd4
google.golang.org/protobuf/proto.MarshalOptions.marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/proto/encode.go:163 +0x3b9
google.golang.org/protobuf/proto.Marshal()
/home/matthew/go/pkg/mod/google.golang.org/protobuf@v1.28.0/proto/encode.go:79 +0x59
github.com/livekit/livekit-server/pkg/service.(*WSSignalConnection).WriteResponse()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/service/wsprotocol.go:84 +0x105
github.com/livekit/livekit-server/pkg/service.(*RTCService).ServeHTTP.func2()
/home/matthew/go/pkg/mod/github.com/livekit/livekit-server@v0.15.6/pkg/service/rtcservice.go:230 +0x4d4
There’s not a shortage of them.
Data races can lead to unexpected behaviour and very hard to debug problems. These should all be solved, most likely by judiciously adding mutexes, or using atomics.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (8 by maintainers)
Commits related to this issue
- Fix a couple of more races Use a shadow copy of down tracks in DownTrackSpreader. Read always uses the shadow. On Add/Delete of down track, make a new copy. Copying is done only on add/delete. If som... — committed to livekit/livekit by boks1971 2 years ago
- Fix a couple of more races (#717) * Use grants clone * Fix a couple of more races Use a shadow copy of down tracks in DownTrackSpreader. Read always uses the shadow. On Add/Delete of down tr... — committed to livekit/livekit by boks1971 2 years ago
Thank you for confirming @msackman . Yeah, not all of it is addressed. Going through and trying to catch more.
Thanks for confirming @msackman ! I’m going to close this issue and open one for Go SDK.
ok, I’ve just done another test with:
I’m not seeing any more races in livekit-server. The only races I’m seeing are in the server sdk (and in my own code too!).
Thank you so much for all your work, time, and effort on this project - without which I just wouldn’t be able to do what I’m currently doing.
@msackman we have yet to go through a pass with the Go SDK on data race. However, we did fix an issue that would result in connection failures when used against the latest unreleased version of the server. The summary is that we’ve made a bunch of improvements in server performance that exposes a timing issue in Go SDK. Try the latest commit in server-sdk-go.
Callbacks shouldn’t really have a race condition as they should be set prior to spinning up any Goroutines.
Sorry for the delay in testing this. I’ve just had a go but I’ve got problems with timeouts when creating rooms via the go sdk - which I’ve also just tried updating to tip of main branch. Given the way I integrate livekit into my server, it’s most likely my own fault - something’s probably changed in wiring which I need to fix up - I don’t run livekit as a standalone server. So it’ll take me a few more days to dig into this. Sorry.