Umbraco-CMS: Examine index breaks when reaching too many fields
Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)
11.4.2
Bug summary
Working on a website where editors had just filled in a bunch of new content. The site relies heavily on blockgrid, and I was trying to add some custom fields to the external index via the TransformingIndexValues event.
After adding several fields (ended up being a lot as each field was added pr. culture on the multilingual site) I started experiencing strange errors. This happened on this site once we passed 1100 fields, but not sure if it was the field count or another size issue.
The errors:
- First of all no errors, or warnings were logged anywhere. Additionally I added some logging to my event, and it processed all nodes. The DocumentCount in the index would also update and show the expected amount.
- The first visiable issue was that any search in the backoffice Examine dashboard on the index would instantly return an empty result - also for any values for nodes where my custom fields werent applied.
- Additionally while the backoffice search returned an empty result each time, my search controller would continue to return something - however, it would always return a subset of the expected results.
There was nothing mentioned in the logs out of the ordinary.
When debugging I noticed a ton of fields for blocks due to the new way each field is indexed. I saw they all had names of the format “something.items[index].something”, so I tried to just remove all of those fields from the index as we don’t use those for our search anyways.
private void IndexOnTransformingIndexValues(object? sender, IndexingItemEventArgs e)
{
try
{
if (e.ValueSet.Category != IndexTypes.Content) return;
var rawValuesDictionary = e.ValueSet.Values?.ToDictionary(x => x.Key, x => x.Value.ToList());
// Remove all block grid fields to reduce overall index size
var valuesDictionary = rawValuesDictionary?.Where(x => !x.Key.Contains("items[")).ToDictionary(x => x.Key, x => x.Value);
if (e.ValueSet.ItemType != Product.ModelTypeAlias)
{
e.SetValues(valuesDictionary?.ToDictionary(x => x.Key, x => (IEnumerable<object>)x.Value));
return;
}
// Add custom fields to the Product type
This drastically reduced the field count:
And it also removed all the previously mentioned errors.
Steps to reproduce
Set up several block grids and implement them in different ways on different nodes to generate lots of unique field alias’ - at a certain size it breaks.
Can probably supply a database if needed for debugging, would have to check with the client first.
Expected result / actual result
I’d expect it to work out of the box, or atleast inform me of which sizes can cause issues and give me a config option to opt out of the frankly crazy amount of additional fields that are added due to the block indexing.
Actual result was that it broke without any indication or error messages telling me what the problem was.
This item has been added to our backlog AB#32578
About this issue
- Original URL
- State: open
- Created 10 months ago
- Reactions: 1
- Comments: 16 (10 by maintainers)
@kjac @bergmania I think you at HQ should consider making
Umbraco:CMS:Indexing:ExplicitlyIndexEachNestedPropertyfalseby default in v13.As described by @jemayn in this issue it can even break the backoffice with the many many extra fields that are added. This issue hasn’t been solved so therefore I don’t think we can close this issue just yet @kjac?
On a personal note I struggle to even see the use case for the for the extra fields. From what I understand it should make it easier to target content on a content node that is in a block of some sort but how would you even use the information when it’s spread across 1000 fields. Then you need to mix the 1000 fields into your search query to be able to use them.
Furthermore I sit back with some of the following questions:
I tried comparing the differences in indexed fields with and without
ExplicitlyIndexEachNestedPropertyon a solution at my company. I found that withExplicitlyIndexEachNestedPropertytrue~37.000 fields was indexed and withExplicitlyIndexEachNestedPropertyfalse~680 fields was indexed. This means that it wastes compute on indexing ~36.000 fields which in no way can be optimal. Futhermore we don’t even use examine for search in this solution so theExplicitlyIndexEachNestedProperty“feature” just adds bloat to Umbraco in this case.I think the most optimal solution if this feature was reimplemented is to only index one new field that would contain the content nodes’ “content” if that makes sense. With “content” i mean the pure text content found inside the blocks. But this properly also wouldn’t be a one size fits all because it most likely differ from solution to solution what content data you want to be in the content nodes’ “content” field.
Hi everyone 👋
Sorry for the belated reply here. We are aware of this issue and we’re looking to provide a fix in the core. At this time I do not have any specifics on how that fix will look; we’re still hashing out the details of the behavior and how to handle this in in a non-breaking way.
Just to be explicit here, the fault is not on Examine. Thank you for chipping in, @Shazwazza ❤️
For anyone affected by this, please read through this issue to find workarounds.
Reopened because we need to solve the backoffice search when facing too many fields.
Ideally Umbraco wouldn’t add more fields than are required. Maybe a feature should be to opt-in to particular block fields. I would imagine most people don’t search on many of these and the index sizes will probably become quite large with more fields and more data.
I’m not sure why the back office search stops returning results but this would be something within Umbraco.
Thanks for the blog post @Shazwazza , better way than what I ended up using as a workaround 😊
The new increased field count is due to https://github.com/umbraco/Umbraco-CMS/pull/13819 where block property editors are now indexing each individual blocks properties in their own fields - ends up with a ton of additional fields.
In my case I went from 1112 to 336 fields in total when I removed the new ones.
It might be an idea for a config setting in Umbraco to toggle between the new and old ways of indexing blocks?