flatbuffers: Reading and Writing Binary blobs is incredibly slow with the default API!

This is related to this issue: https://github.com/google/flatbuffers/issues/4090 But it’s really more general than that. It is really not unreasonable to want to store a binary blob in a flatbuffer. Large multidimensional arrays are one use case. In fact, the advantages of using something like FB over Protobuf are even more pronounced in such a use case.(reduced serialization time, no maximum size of a table)

This makes it really awkward that the only API-provided (and documentation recommended) way to store a non-ASCII encoded blob is with a [byte] or [ubyte]. However, because there is no way to grab an entire vector at once, you have to iterate through and index into it. If your binary format can be parsed without copying, too bad, you have to copy it anyway. (there goes a whole bunch of my speed gain that I got from using flatbuffers in the first place!)

Please provide either: a primitave blob type in the schema that functions like string, but can hold arbitrary bytes (note this is supported by Cap’n proto. I might be using that, but congrats on having far superior language support!), OR an API call that can grab an entire [byte] at once. Right now, you can hack your way around it, but you have to dig under the API to get the raw buffer, and look into the generated code to find the right offsets. It’s much less safe than it could be.

I could imagine the argument that whatever code you’re using should just use the FB API always, so you don’t need your own custom binary format…but it doesn’t support things like complex slicing over vectors. Multidimensional arrays is a clear use case where this falls down. Using other binary formats for legacy compatibility is another. For example, if a bunch of things are saved as BSON. That has to deserialize anyway, so you won’t save THAT much time, but there is an unnecessary copy in there!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 9
  • Comments: 19

Most upvoted comments

@wrigby : cool that you made it work, but this type of low level work should really be done by the API itself.

If you only generate such functions for byte arrays there should be no endian issues, but I guess people may want to use it for other arrays as well.

I actually ended up hacking the CreateString / table._String code to create a faster byte array implementation. The slowness just comes from having to iterate through each byte in a vector one by one. On top of that the retrieve / prepend byte methods do some bounds checking on the value and the prepend method increases the backing byte buffer by only a small amount each time.

My changes will get wiped out the next time I compile my flatbuffer schema, but when I free up on this project in a few weeks, I can look into implementing it properly in the flatbuffer compiler.