parcel: File hash does not change after its content updates

This is a 🙋 feature request.

🤔 Expected Behavior

An output file hash should change when its content updates.

😯 Current Behavior

File hash does not change.

🔦 Context

<!-- index.html -->
<html>
<body>
  <script src="./main.js"></script>
</body>
</html>

// main.js
console.log(1);

When build, ‘b695675d84099f097ec37d68c8c83fce.js’ generates.

parcel build --no-cache --no-minify index.html

And then, change main.js

// main.js
console.log(2);

Build again.

parcel build --no-cache --no-minify index.html

The javascript file name is still ‘b695675d84099f097ec37d68c8c83fce.js’. I am not sure it is the expected behavior or not. However, when I using webpack, the output file hash will change every time its content updates.

🌍 Your Environment

Software	Version(s)
Parcel	v1.1.0
Node	v8.9.1
npm/Yarn	yarn v1.3.2
Operating System	macOS High Sierra 10.13.1

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 22
Comments: 40 (14 by maintainers)

Most upvoted comments

@davidnagli Could you reopen this issue please. IMHO, its okie to not change hash during development build. But for production build, if hash doesnt change even when the content changes, then Cache-Control & ETag headers can not be effectively used.

In my case, I add react.js, react-dom.js etc on to a separate bundle vendor.js, which rarely changes. So I set it to cache for 1yr. If I happen to added one or two more libs, I wouldnt be able to bust the cahce as hash never changes as browser thinks that “I already has this file & no need to ask the server again” 😦

+11

sudhakar on Dec 11, 2017

To be honest, I’d much rather have a slow build where hashing was content based, than having users re-downloading assets when they shouldn’t need to. What about a flag to the CLI?

jouni-kantola on Dec 20, 2017

Should be solved by #1025 which generates content-hashed filenames for static assets. Please help test using the master branch - a release will hopefully come next week!

devongovett on Mar 21, 2018

Here are some learnings from Assetgraph, where we solved the same issue.

You absolutely want to do content hashing to you can achieve deterministic content addressable file names that lend them selves well to far future cache expiry. Random build-specific hash busts the cache too often. Query parameters aren’t always treated correctly by proxies in between the server and client.

You do however not need to do content hashing a lot of times. You can get away with doing them once, at the point where you know you are done making source code modifications and are ready to write out to disc.

The hash renaming must be done in a depth first post-order graph traversal to ensure content hashes updating all the way up to the entry points when deeply nested dependencies update. Any other traversal algorithm will result in caching errors

Munter on Dec 25, 2017

Ok I fixed my issue by doing rm -rf .cache. This might be another issue but I’m reporting here in case someone faces the same situation. I’ll create the other one when I have more predictable results to share.

augnustin on Mar 15, 2018

This should be really fixed imho; I just implemented parcel in a project and everytime I make any change to js or css I have to manually add a progressive number and change the reference in the html, otherwise when I deploy to production (which has browser caching and a CDN) the server won’t give me the updated version of those files. In my opinion the best approach would be the content checksum approach.

vforvalerio87 on Dec 20, 2017

This is a typical scenario I think should be supported. I’m addressing this from a web performance/UX perspective rather than DX.

I have 1/n vendor bundles with filenames including a hash
I have n code splitted bundles with application code, styles, etc
I fix a bug in an application module, and release
Client only needs to download that specifically code splitted bundle where I fixed the bug

A scenario like this could save the client from re-downloading hundreds of KiB’s. If the whole release would be versioned everything would be cache busted.

jouni-kantola on Dec 17, 2017

@benhutton That is the exact right algorithm and the correct reason you describe.

This image always helps me visualise it best: Traversal order: A C E D B H I G F

It’s still important to start at your entry point(s) and just remember to put the hashing logic after child traversal. This is the what we do in AssetGraph: https://github.com/assetgraph/assetgraph/blob/master/lib/AssetGraph.js#L445-L462

When you extend Parcel with multiple entry points you probably want to keep track of seen assets to avoid double work as well

Munter on Jan 11, 2018

yes @devongovett thats a good idea. But it would be extra configuration to achieve cache busting. I am happy with current setup for now. But IMO, its nice to have it in the core, so users get cache busting for free!

sudhakar on Dec 11, 2017

@DeMoorJasper I think that maybe we’re talking about the same thing? Only change things for production, and do it at the end.

I don’t think there is any way around doing a tree traversal, though. That is, I think that this algorithm will NOT work:

Find the md5 hash of every file.
Rename those files to include the hash.
Go back and edit the files to include references to the new file names with the hashes.

Instead, we need to do the tree traversal that @Munter described.

Find your graph.
Find a leaf node.
Hash, rename.
Update references to that file.
Walk the tree back up, repeating for each file.

The idea is that when any given node changes, all the nodes above it will end up changing too as the references trickle up. And any nodes that are NOT affected will NOT change. So you are busting exactly the right caches at the right time.

Here’s the big principle: A file doesn’t get edited after it gets hashed. The hash is of the FINAL content of that file.

@DeMoorJasper does that make sense? @Munter am I describing the algorithm you had in mind accurately?

benhutton on Jan 11, 2018

@devongovett what are you thinking about this one? Is asset fingerprinting something that you agree should be baked into Parcel’s core? Is it something we should try to figure out how to add the correct hooks to write a plugin for? Is it something I should try to find some other way to write a post-processor to accomplish?

benhutton on Jan 9, 2018

The mtime approach is undesirable though because it busts cache for every asset every time. I’d rather stick to manually renaming assets in that case because I don’t want every user to re-download everything again every time I deploy a website (possibly numerous times per day)

vforvalerio87 on Dec 20, 2017

What about creating a random string (build time, perhaps) and using it in the filename?

filename = `${hash}.${buildstamp}.${ext}`

With a buildstamp of Date().now().toString(36) you’d get a filename like:

d710beaad39d4ee3906c24983931b45b.jb47tk6c.js

You would get a cachebust file with every build, and the contents of the file would not need to be known before the filename.

(the buildstamp would be the same for every file in that build) (entry would not get stamped)

chee on Dec 12, 2017

the filenames are not currently generated based on the hash of contents. you could do something like versioning, e.g. http://mycdn.com/v1.0.0/somelib.js. when you publish a new version, the url will change.

devongovett on Dec 11, 2017