bevy: entity_ref `get_component_with_type` is slow
What problem does this solve or what need does it fill?
I’m trying to get components data from EntityIds(Vec<Entity>
) with world.get(entity)
… I’m experience some “unexpected” performance drop (in compare to Vec<Rc>
).
Looking at the code of World::get<T: Component>(&self, entity: Entity)
… there is so much indirection… Getting component index from hashmap, then bursting through 3 indirections to get component “Column”, and one branch between storage type.
What solution would you like?
Is it possible to somehow … prefetch/cache that archetype table_row/column?
That’s just too much… Thats waaaay slower then just cache misses with Rc
.
Additional context
I use Vec<Entity>
not that often… But sometimes you just have to use it …
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 28 (12 by maintainers)
Just took a look. Lets start with:
entity_iter
First: there is a methodology issue. This is an apples-to-oranges comparison because you’re running the entire scheduling infrastructure for the Query benchmark and comparing that to a single raw vec iterator. The scheduling infrastructure is cheap to run, but it will still register in a small way for benchmarks like this.
Additionally, even after accounting for that, this is still apples to oranges because the Vec benchmark doesn’t do change detection. Functionally, change detection performs the same as iterating and writing to another component (because it needs to fetch the tracking state and write to it when
Mut<Point>
is mutated).After disabling change detection (currently a manual process of commenting out the relevant code, but will ultimately be configurable per-component once we switch to
#[derive(Component)]
) I get:Then, after removing the Schedule/Stage/System in favor of iterating a raw Query, I get:
After righting the scales, we are actually faster than iterating a raw Vec (by an extremely small margin). Despite the fact that we provide way more functionality than a simple Vec, we still come out on top because the extra work required to accomplish that functionality amortizes to zero.
The test:
Additionally it seems like there is a general misunderstanding about how relevant numbers like 2x are in this context. Thanks to the cache friendly nature of both implementations, we are talking about extremely small timescales here. The second you start doing anything meaningful inside these iterators, that work will dwarf the iteration costs. We get a ~30% increase in cost, just by throwing in some more ops:
Doing any amount of work on these scales registers in a big way.
From the perspective of a game engine, the benefits of change detection far outweigh the 2x cost (by enabling us to optimize / skip expensive operations each frame and opening up extremely useful coding patterns). The cost won’t meaningfully register for anything but the most resource constrained benchmarks and in the very near future you will be able to easily opt out for those scenarios.
Lets segue into the next benchmark:
entity_get
After accounting for change detection, here are the results:
box_unique
is ~4.4x faster thanentity
. First, even if this was an apples to apples comparison, these numbers would be acceptable because as we’ve previously covered, we’re working on very small timescales here. We are also quite competitive when compared to other ECS implementations. So why isn’t box_unique a fair comparison to ECS get(Entity) operations?Lets compare the inner loops of the two benchmarks:
This amounts to the totals:
And that doesn’t even account for the extra branching to ensure Entity and sparse set lookups are valid. In total, the 4.4x number makes perfect sense!
If you don’t think too hard about it, it sure seems like we’re doing a lot of unnecessary work. We should just use Boxes! What were we thinking?
Lets do some small (zero cost) refactors so we can start building our Box ECS:
Components<T> { components: [Box<T>; BOXES_COUNT] }
(in practice this would actually be aVec<T>
for faster iteration)Now lets pretend we want Entity(0) to have component [A, B] and Entity(1) to have [B]. How will we record this?
Here is the simplest path forward:
Use an option (or null pointer) to indicate that a specific item in the array exists. (ex:
Vec<Option<T>>
or[MaybeUninit<Box<T>>; BOXES_COUNT]
). This is known as a “Sparse Array”. But now we’ve created a couple of problems relative to PointBoxes:Component<T>
arrays must be sized to accommodate the entire entity space. They are no longer “densely packed”. If Component A has 10 entities and Component B has 2 entities, both arrays need to be length 10.Vec<T>
, then having anotherVec<Option<usize>>
array to determine whether or not the entity has component T and where it is located in the dense Vec. This is called a “Sparse Set” and is a popular ECS variant.Ex: The specs ECS “vec storage” is basically a sparse array. I had benchmarks handy that compare its entity lookups to Bevy’s
Funnily enough, specs is about 4x faster than bevy_system, which makes sense given what we just talked about / the performance of your box benchmark!
However sparse sets have significant downsides:
The solution to this is “archetypal ecs”.
Entities belong to a specific archetype, with per-archetype storages that are densely packed and perfectly aligned.
This adds the indirection of finding the archetype, but it means:
query.get(entity)
, the faster it gets relative to sparse sets.Archetypal and Sparse Set ECS are the two most popular ECS implementations. They are a result of the best minds in the business putting their heads together and making informed compromises. They each have their tradeoffs, but they provide functionality that a simple
Vec<T>
cannot. It is literally a limitation in the laws of physics. You cannot have the flexibility of ECS without more branching and indirection. If you want the performance of an array, you are stuck with the rigidity of an array. If you think you can do better … awesome! Just be aware that this is an extremely competitive space and that we are extremely competitive in this space (https://github.com/rust-gamedev/ecs_bench_suite). Don’t expect to find anything but small / marginal wins (or wins that come at the cost of other benchmarks / features).@cart is this write up in a dev blog somewhere? I learned so much just reading it.
Not currently. I might ultimately consolidate this and other things like the Bevy ECS V2 description (which is basically a blog post by itself). But that takes time and I’ve got renderer code to write right now 😄
For a given world borrow and set of entity lookups using that borrow, we can assume it is stable because nothing can add/remove entities or components during that time. But we can’t store that in QueryState, which exists outside of a given World borrow.
We might be able to cache the Fetch and Filter instances inside of System Queries though (because components cannot be added / removed in a way that invalidates these pointers when the system is running). I can’t assert that is valid off the top of my head though.