parcel: File hash does not change after its content updates
This is a 🙋 feature request.
🤔 Expected Behavior
An output file hash should change when its content updates.
😯 Current Behavior
File hash does not change.
🔦 Context
<!-- index.html -->
<html>
<body>
<script src="./main.js"></script>
</body>
</html>
// main.js
console.log(1);
When build, ‘b695675d84099f097ec37d68c8c83fce.js’ generates.
parcel build --no-cache --no-minify index.html
And then, change main.js
// main.js
console.log(2);
Build again.
parcel build --no-cache --no-minify index.html
The javascript file name is still ‘b695675d84099f097ec37d68c8c83fce.js’. I am not sure it is the expected behavior or not. However, when I using webpack, the output file hash will change every time its content updates.
🌍 Your Environment
Software | Version(s) |
---|---|
Parcel | v1.1.0 |
Node | v8.9.1 |
npm/Yarn | yarn v1.3.2 |
Operating System | macOS High Sierra 10.13.1 |
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 22
- Comments: 40 (14 by maintainers)
@davidnagli Could you reopen this issue please. IMHO, its okie to not change hash during development build. But for production build, if hash doesnt change even when the content changes, then
Cache-Control
&ETag
headers can not be effectively used.In my case, I add
react.js
,react-dom.js
etc on to a separate bundlevendor.js
, which rarely changes. So I set it to cache for1yr
. If I happen to added one or two more libs, I wouldnt be able to bust the cahce ashash
never changes as browser thinks that “I already has this file & no need to ask the server again” 😦To be honest, I’d much rather have a slow build where hashing was content based, than having users re-downloading assets when they shouldn’t need to. What about a flag to the CLI?
Should be solved by #1025 which generates content-hashed filenames for static assets. Please help test using the master branch - a release will hopefully come next week!
Here are some learnings from Assetgraph, where we solved the same issue.
You absolutely want to do content hashing to you can achieve deterministic content addressable file names that lend them selves well to far future cache expiry. Random build-specific hash busts the cache too often. Query parameters aren’t always treated correctly by proxies in between the server and client.
You do however not need to do content hashing a lot of times. You can get away with doing them once, at the point where you know you are done making source code modifications and are ready to write out to disc.
The hash renaming must be done in a depth first post-order graph traversal to ensure content hashes updating all the way up to the entry points when deeply nested dependencies update. Any other traversal algorithm will result in caching errors
Ok I fixed my issue by doing
rm -rf .cache
. This might be another issue but I’m reporting here in case someone faces the same situation. I’ll create the other one when I have more predictable results to share.This should be really fixed imho; I just implemented parcel in a project and everytime I make any change to js or css I have to manually add a progressive number and change the reference in the html, otherwise when I deploy to production (which has browser caching and a CDN) the server won’t give me the updated version of those files. In my opinion the best approach would be the content checksum approach.
This is a typical scenario I think should be supported. I’m addressing this from a web performance/UX perspective rather than DX.
A scenario like this could save the client from re-downloading hundreds of KiB’s. If the whole release would be versioned everything would be cache busted.
@benhutton That is the exact right algorithm and the correct reason you describe.
This image always helps me visualise it best:
Traversal order: A C E D B H I G F
It’s still important to start at your entry point(s) and just remember to put the hashing logic after child traversal. This is the what we do in AssetGraph: https://github.com/assetgraph/assetgraph/blob/master/lib/AssetGraph.js#L445-L462
When you extend Parcel with multiple entry points you probably want to keep track of seen assets to avoid double work as well
yes @devongovett thats a good idea. But it would be extra configuration to achieve cache busting. I am happy with current setup for now. But IMO, its nice to have it in the core, so users get cache busting for free!
@DeMoorJasper I think that maybe we’re talking about the same thing? Only change things for production, and do it at the end.
I don’t think there is any way around doing a tree traversal, though. That is, I think that this algorithm will NOT work:
Instead, we need to do the tree traversal that @Munter described.
The idea is that when any given node changes, all the nodes above it will end up changing too as the references trickle up. And any nodes that are NOT affected will NOT change. So you are busting exactly the right caches at the right time.
Here’s the big principle: A file doesn’t get edited after it gets hashed. The hash is of the FINAL content of that file.
@DeMoorJasper does that make sense? @Munter am I describing the algorithm you had in mind accurately?
@devongovett what are you thinking about this one? Is asset fingerprinting something that you agree should be baked into Parcel’s core? Is it something we should try to figure out how to add the correct hooks to write a plugin for? Is it something I should try to find some other way to write a post-processor to accomplish?
The mtime approach is undesirable though because it busts cache for every asset every time. I’d rather stick to manually renaming assets in that case because I don’t want every user to re-download everything again every time I deploy a website (possibly numerous times per day)
What about creating a random string (build time, perhaps) and using it in the filename?
like
With a
buildstamp
ofDate().now().toString(36)
you’d get a filename like:You would get a cachebust file with every build, and the contents of the file would not need to be known before the filename.
(the buildstamp would be the same for every file in that build) (entry would not get stamped)
the filenames are not currently generated based on the hash of contents. you could do something like versioning, e.g. http://mycdn.com/v1.0.0/somelib.js. when you publish a new version, the url will change.