zola: Missing Cachebust when using index_format = "elasticlunr_json", language code is hard coded, and will not work from subdomain
Bug Report
A new json index_format was added: https://github.com/getzola/zola/pull/1998
When using the new json format for the search index index_format = "elasticlunr_json"
the cachebust is missing.
If you add new posts, repeat visitors may not have those posts in their index if the browser still has the old index cached.
The relevant line is 149 here: https://github.com/getzola/zola/blob/master/docs/static/search.js#L149
Because the index is fetched from the search.js
file, zola would need to write to this search.js
file to add the hash cachebust to the fetch line. I can think of some fairly simple ways to do this with regex. Zola would need to know ahead of time which js file handles the search, for me this is always search.js
at the root level.
Another issue that I thought about was that the language code is hard coded. One possible solution would be to have the search.js
file check the language code from the page source <html lang="en-gb">
and then fetch the corresponding search index. I can probably submit a pull request for this later today if this sounds like a good solution.
Another issue is that this fetch line grabs the json index from the root, this will be an issues for sites that reside in a subdomain, eg: github.io/mysite
this is because this fetch will try to grab the resource from github.io/search_index.en.json
when it should grab it from github.io/mysite/search_index.en.json
One way of resolving this is for the site to have set the base meta tag, then have the search.js file check this tag while forming the fetch url, I do exactly that for the old js index+search bundle: https://github.com/Jieiku/abridge/blob/master/static/search_facade.js (the same principle could be applied here)
It would require less js DOM access if we simply used the base_url defined in config.toml to form the fetch url, this would resolve the issue of using subdomains, we could do the same with the language code. (meaning dont do this in js, just have zola handle these values in addition to the cachebust.)
Environment
Zola version: 0.17.1
Expected Behavior
cachebust hash added, and a way to facilitate more than one language code.
Current Behavior
no cachebust, hard coded language code.
Step to reproduce
The search here can be used to reproduce: https://www.getzola.org/documentation/getting-started/overview/ I am also currently refactoring abridge and have it implemented there (refactor branch is messy, still work in progress): https://github.com/Jieiku/abridge/tree/refactor
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 21 (13 by maintainers)
👋 Popping in to drop a few thoughts!
One thing I’m working on right now is a Node.js API which can take in raw content or files and build an index, which allows Pagefind to be integrated into the development server of SSGs. It also allows you to pass direct records in, rather than indexing HTML. Since Pagefind is a binary under the hood, this is actually a generic stdio/out communication system that could be re-implemented fairly trivially from any language.
But also, Pagefind is Rust-based, so it’s totally within reason to expose a lib interface for other Rust packages 👀
Pagefind should ideally expose a
pagefind.loadAll()
function that you can call if you think you have a reasonable amount of content to load up front, at which point search would be instant 🤔Bundling JS is out of scope, there’s a lot to it involved. We can easily template a JS file though. It’s a bit weird to run search on the generated output of a SSG for pagefind, you might want to fine tune what you’re including rather than just looking at HTML.
I’m still thinking about to handle that, it’s not easy!
You’ll need to update the paths where
get_url
is looking at: grep forsearch_for_file
. I’m not still sure it’s the right way thoughThat’s an easy change but we need to think about templated files like that and see how other SSGs handle it. Right now the templates dir doesn’t map to an output file so it shouldn’t be in the same folder but even then, I’m still wondering if that’s the right approach. We could eg generate a hash for the current content and tell users to assign it to
window
in their base template and then they can use that variable for cachebusting in JS without adding any new concepts in Zola.We can template it but not right now, since Zola is only loading .html files.
Looking at the proposed solution, i think it’s almost better to have people copy the hash manually tbh. An alternative would be to always cachebust by appending a random string as a query param to the url loading the index but that’s not super efficient. In practice the index is big enough that you probably only want to use it for small enough sites where the index would not be too big.
I am motivated to work on this, just waiting for a little free time. hopefully within the next week or two.