zola: Missing Cachebust when using index_format = "elasticlunr_json", language code is hard coded, and will not work from subdomain

Bug Report

A new json index_format was added: https://github.com/getzola/zola/pull/1998

When using the new json format for the search index index_format = "elasticlunr_json" the cachebust is missing.

If you add new posts, repeat visitors may not have those posts in their index if the browser still has the old index cached.

The relevant line is 149 here: https://github.com/getzola/zola/blob/master/docs/static/search.js#L149

https://github.com/getzola/zola/blob/8ae4c623f24d3e7af14e3e94f92fcbcceb954bc5/docs/static/search.js#L147-L158

Because the index is fetched from the search.js file, zola would need to write to this search.js file to add the hash cachebust to the fetch line. I can think of some fairly simple ways to do this with regex. Zola would need to know ahead of time which js file handles the search, for me this is always search.js at the root level.

Another issue that I thought about was that the language code is hard coded. One possible solution would be to have the search.js file check the language code from the page source <html lang="en-gb"> and then fetch the corresponding search index. I can probably submit a pull request for this later today if this sounds like a good solution.

Another issue is that this fetch line grabs the json index from the root, this will be an issues for sites that reside in a subdomain, eg: github.io/mysite this is because this fetch will try to grab the resource from github.io/search_index.en.json when it should grab it from github.io/mysite/search_index.en.json

One way of resolving this is for the site to have set the base meta tag, then have the search.js file check this tag while forming the fetch url, I do exactly that for the old js index+search bundle: https://github.com/Jieiku/abridge/blob/master/static/search_facade.js (the same principle could be applied here)

It would require less js DOM access if we simply used the base_url defined in config.toml to form the fetch url, this would resolve the issue of using subdomains, we could do the same with the language code. (meaning dont do this in js, just have zola handle these values in addition to the cachebust.)

Environment

Zola version: 0.17.1

Expected Behavior

cachebust hash added, and a way to facilitate more than one language code.

Current Behavior

no cachebust, hard coded language code.

Step to reproduce

The search here can be used to reproduce: https://www.getzola.org/documentation/getting-started/overview/ I am also currently refactoring abridge and have it implemented there (refactor branch is messy, still work in progress): https://github.com/Jieiku/abridge/tree/refactor

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 21 (13 by maintainers)

Most upvoted comments

👋 Popping in to drop a few thoughts!

edit: Apparently pagefind runs after the SSG builds the site… so it would be handled independently of zola, I am not sure what this will entail in practice, but I am interested in trying it with abridge, once I do I will document the steps.

One thing I’m working on right now is a Node.js API which can take in raw content or files and build an index, which allows Pagefind to be integrated into the development server of SSGs. It also allows you to pass direct records in, rather than indexing HTML. Since Pagefind is a binary under the hood, this is actually a generic stdio/out communication system that could be re-implemented fairly trivially from any language.

But also, Pagefind is Rust-based, so it’s totally within reason to expose a lib interface for other Rust packages 👀

I think for any site under 1,000 posts its probably better to use elasticlunr or tinysearch, because then the entire index is loaded, and so search is instantaneous, but once a site gets to a certain point in size I think pagefind would make a lot of sense.

Pagefind should ideally expose a pagefind.loadAll() function that you can call if you think you have a reasonable amount of content to load up front, at which point search would be instant 🤔

Bundling JS is out of scope, there’s a lot to it involved. We can easily template a JS file though. It’s a bit weird to run search on the generated output of a SSG for pagefind, you might want to fine tune what you’re including rather than just looking at HTML.

I’m still thinking about to handle that, it’s not easy!

You’ll need to update the paths where get_url is looking at: grep for search_for_file. I’m not still sure it’s the right way though

That’s an easy change but we need to think about templated files like that and see how other SSGs handle it. Right now the templates dir doesn’t map to an output file so it shouldn’t be in the same folder but even then, I’m still wondering if that’s the right approach. We could eg generate a hash for the current content and tell users to assign it to window in their base template and then they can use that variable for cachebusting in JS without adding any new concepts in Zola.

We can template it but not right now, since Zola is only loading .html files.

Looking at the proposed solution, i think it’s almost better to have people copy the hash manually tbh. An alternative would be to always cachebust by appending a random string as a query param to the url loading the index but that’s not super efficient. In practice the index is big enough that you probably only want to use it for small enough sites where the index would not be too big.

I am motivated to work on this, just waiting for a little free time. hopefully within the next week or two.