swift-protobuf: EXC_BAD_ACCESS in decodeMessage(decoder:) due to stack overflow

Note: This is likely a duplicate of https://github.com/apple/swift-protobuf/issues/1014 but I didn’t want to cram all of this into a comment in a long chain. Feel free to close as duplicate if you want.

Summary/TLDR

The change to move computed properties backed by _storage to properties stored directly on the message structs has caused massive size increases for messages at the top of a deep hierarchy. The switch code generated for oneof parsing is sensitive to these size increases, leading to massive stack frames. In debug mode without optimization these stack frames can easily exhaust the 512KB stack of a background dispatch thread and cause a crash.

Details

We are experiencing a crash in debug builds after upgrading SwiftProtobuf and protoc-gen-swift from 1.7 to 1.10.2. Our crash happens in the decodeMessage(decoder:) method of a message containing a oneof with lots of cases. From what I can tell the stack frame generated for the withExtendedLifetime closure in that method is huge, about 800KB. Since the decoding is happening on a background thread with a 512KB stack, the app crashes as soon as this closure is invoked.

The message is of the form:

message RenderDataDTO {
  SomeMessage1 m1 = 1;
  SomeMessage2 m2 = 2;
  SomeMessage3 m3 = 3;
  oneof render {
    SomeCaseMessage1 case1 = 100;
    SomeCaseMessage2 case2 = 101;
    SomeCaseMessage3 case3 = 102;
    // 20 cases in total
    SomeCaseMessage20 case20 = 120;
  }

The code that gets generated for decodeMessage(decoder:) has the form:

  public mutating func decodeMessage<D: SwiftProtobuf.Decoder>(decoder: inout D) throws {
    _ = _uniqueStorage()    // LINE A
    try withExtendedLifetime(_storage) { (_storage: _StorageClass) in    // LINE B
      while let fieldNumber = try decoder.nextFieldNumber() {
        switch fieldNumber {
        case 1: try decoder.decodeSingularMessageField(value: &_storage._m1)
        case 2: try decoder.decodeSingularMessageField(value: &_storage._m2)
        case 3: try decoder.decodeSingularStringField(value: &_storage._m3)
        case 100:
          var v: SomeCaseMessage1?
          if let current = _storage._render {
            try decoder.handleConflictingOneOf()
            if case .case1(let m) = current {v = m}
          }
          try decoder.decodeSingularMessageField(value: &v)
          if let v = v {_storage._case1 = .case1(v)}
        case 101:
          var v: SomeCaseMessage2?
          if let current = _storage._render {
            try decoder.handleConflictingOneOf()
            if case .case2(let m) = current {v = m}
          }
          try decoder.decodeSingularMessageField(value: &v)
          if let v = v {_storage._case2 = .case2(v)}
       // etc...

If I set a breakpoint at the line marked LINE A and check the stack pointer, then set a breakpoint at LINE B and do the same I find that the difference is about 800KB.

Here’s one example:

RenderDataDTO.decodeMessage(decoder:)
       rsp = 0x00007000087cdb60

In closure:
       rsp = 0x0000700008704780   <-- 800KB jump!

I have a couple of observations to explain this:

Observation 1:

In the earlier version of swift-protobuf almost all fields of Message structs were stored in _storage, a reference-typed container, and exposed as computed vars. In the current version they seem to much more often (but not always?) be implemented as properties on the message itself. Thus the size of a leaf message struct with no child messages became O(n) instead of O(1), where n is the number of fields in the message.

The story gets a lot worse when you have child messages. Since messages are structs, children get embedded directly in their parents, so the size of a non-leaf message grows like the sum of its entire message sub-tree. So the size scales more like O(n^m), where n is the average number of children per level and m is the average number of levels in the message hierarchy. Since Swift always likes to allocate structs on the stack, this can lead to massive stack frames.

This agrees with what I saw when I measured the size of the SomeCaseMessageN messages using MemoryLayout<SomeCaseMessageN>.size. In the older code, they were all 24 bytes, with a few exceptions. The largest was 80 bytes. In the new code the sizes tended to increase by 10X-20X. 224 and 448 bytes were typical sizes, with several messages growing above 1000B. One grew to 5256 bytes, while the biggest was 6080 bytes.

Observation 2:

If the compiler is smart, it will know that all the cases in the switch statement inside that closure are mutually independent. Only one of them will ever execute per invocation, so their local variables can share space on the stack, and the size of the stack frame will be the size of the biggest case in the switch. Since we’re in debug mode it might not be making that optimization, and instead allocating separate space for every case in the switch. Given the size impact of the changes I discussed above, this could add up to a lot of space.

In fact, when I change the optimization mode from none to size or speed, the message sizes don’t change but the size of the stack frame drops to about 26KB. That’s still an oversized stack frame eating 5% of the stack in release mode, but at least it’s not bigger than the entire stack!

For reference, with the code generated from 1.7, the stack frame is about 16KB with optimization mode none. With optimization mode speed the stack frame drops to just 400B.

Wrapping up

There seem to be problems with putting so many stored properties on messages. Perhaps this could be dialed back to where it was before?

Environment: OS: MacOS Catalina 10.15.6 Xcode: Version 11.6 Swift 5

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I just did some testing on 6806de79a49cb2b81155b8b431af6bf9be1d4af9 and it looks a lot better. The sizes of the message structs I mentioned before have dropped significantly:

Size: 6080 -> 24
Size: 5256 -> 160
Size: 1944 -> 24
Size: 1400 -> 24
Size: 1096 -> 24
Size: 704 -> 144
Size: 456 -> 144
Size: 448 -> 72
Size: 440 -> 24
Size: 432 -> 24
Size: 256 -> 256
Size: 224 -> 224
Size: 224 -> 224
Size: 208 -> 208
Size: 136 -> 136
Size: 128 -> 128
Size: 88 -> 88
Size: 80 -> 80
Size: 80 -> 80
Size: 48 -> 48
Size: 48 -> 48

I still have misgivings about letting structs grow to 256 bytes but the worst case seems to be under control. Thanks!