common: regression: massive dependency tree on import
Importing github.com/prometheus/common
now causes import of 7 million lines of code since #242. Prior to that, only 1.5 MLOC were imported.
This includes:
- aws-sdk-go and aws-sdk-go-v2, and google.golang.org/api at 1M each
- the entireity of Envoy, Consul, Etcd, and Nats
- many many more
The root of this is github.com/go-kit/kit ultimately. It imports a ton of stuff recently. Note that client_golang has not yet updated, so likely the ecosystem is not yet fully impacted unless they update prometheus/common directly.
We use it only for ~100 lines of logging code, so we can almost just replace it entirely. However, github.com/prometheus/common actually depends on an older version(s) of github.com/prometheus/common due to circular imports. The circular import can be removed if github.com/mwitkow/go-conntrack dependency is dropped. go-conntrack offers tracing and monitoring (using prometheus). Our usage is only the tracing. Therefor, we could remove this circular import with a fork of go-conntrack if desired
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 38 (19 by maintainers)
Commits related to this issue
- Update dependencies and drop large go-kit update Pulling in the same changes as done in https://github.com/prometheus/common/issues/255 — committed to howardjohn/statsd_exporter by howardjohn 3 years ago
- Update dependencies and drop large go-kit update Pulling in the same changes as done in https://github.com/prometheus/common/issues/255 Signed-off-by: John Howard <howardjohn@google.com> — committed to howardjohn/statsd_exporter by howardjohn 3 years ago
- Replace go-kit/kit with go-kit/log github.com/go-kit/kit is only being used for the log packages and brings in a lot of dependencies as compared to github.com/go-kit/log. This would help determine ho... — committed to rnaveiras/postgres_exporter by rnaveiras 2 years ago
released in v0.26.0
It might not be definitive, but additions to
go.sum
are often a sign the overall dependency tree is growing.go mod graph
is the actual tree of all transitive dependencies and versions, and shows everythinggo
considers when deciding the minimum required version of transitive dependencies. That includes the dependencies/versions captured in this update.Even if it doesn’t result in vendored/linked code, a large dependency tree greatly increases the likelihood of incompatible diamond dependencies and update gridlock, which is why we work to avoid large numbers of expressed-but-unused deps.
I disagree with this. Prometheus import is present in virtually every single go project’s dependency tree. Once this propagates to a few key packages (prometheus/client_go being an obvious one, other common targets like grpc will follow) I don’t see how it can be prevented - someone will always be importing the bloated version.
I think its fair to say that there is a difference between a direct dependency (ie shows up in vendor), indirect dependency (shows up in go.sum, must be downloaded, etc), and no dependency.
Making the claim that indirect dependency and no dependency are close enough that it doesn’t really matter if we import a ton of bloat is a reasonable one for certain projects I think - and ultimately this is the Prometheus project’s decision to make for their own projects. However, I encourage you to consider that by making this choice you are essentially forcing this decision on majority of the golang ecosystem. No project will be able to decide they want to avoid downloading GBs of dependencies if they want to depend on any project that imports Prometheus.
Personally, I don’t think that is the correct decision, and will be very disappointed to bring in even more dependencies next time we update our Prometheus dependency.
I am not sure its as simple as that? Anyone using client_golang is also using
common
, indirectly or otherwise. So this doesn’t just impact Prometheus libraries importing it directly, it also impacts anyone importing any prometheus libraries (then anyone importing a library that imports prometheus, etc).I have some concerns that once this propagates out to more and more downstream dependencies, the entire go ecosystem will end up importing this bloat. A huge amount of libraries are importing Prometheus, so if any single module in a dependency chain imports a Prometheus version with this dependency on go-kit v0.10, we will get all of these dependencies as well.
@gouthamve the
go-kit
especially drags in totally unnecessary stuff since a go.mod was added in their repo. this repo relies just on the go-kit/log but this causes the bucket-load of stuff into go.sum in this repo.Oh! yes, see the kubernetes PR above of what we end up seeing, this is bad! please help fix!
This right here is my main issue with huge monorepos or dependency thirsty projects.
You can still have a single git repo codebase for multiple modules in Go.
I’ve also seen pgx (jackc/pgx#977) use go-kit for the logging bits of code. Pgx then is used by GORM, so dependency hell can easily propagate.
Everything below is just a rant
It is a shame to not see more developers take this (managing your dependencies) seriously. It adds unnecessary bloat and risks (what happens when a N-th degree, random dependency is removed and your build pipelines start failing?, how long will it take for the fix to propagate?; can you maintain or replace a dependency if it is no longer maintained?). If the code you need from a monolith repo is a manageable amount, and not likely to need updates, then just copy-it in your codebase.
@dims
I don’t think we’re differing, I also want small dep graphs! 😃
—
@liggitt
Sometimes yes, sometimes no — it’s possible for go.sum to grow and a dependency graph shrink.
Just ignore it. Really.
go mod graph
is a much better proxy.We’ve extracted https://github.com/go-kit/log and will make it usable in the short term, I’ll ping here when it’s ready.
This isn’t relevant — go.sum is an essentially append-only log of checksums required to verify the integrity of a build. It has no meaningful correlation to the number of dependencies your module actually uses, nor the size of the ultimate artifact. There is almost no reason for anyone to care about the contents of the go.sum file.
That being said,
go mod download
is still a pain… (hopefully until lazy module loading lands)Just a note to correct an unfortunately extremely common misconception:
go.sum
is a mostly-append-only log of checksums. It exists as a safety mechanism to guard against module corruption and malicious intermediaries. It has very little correlation to the dependency graph of your module, and shouldn’t be used as a proxy for dep graph size or complexity. It should be checked in and otherwise ignored in almost all cases.This will be fixed upstream in go 1.17 https://github.com/golang/go/issues/36460
I’m a newcomer to Go, so unfortunately I can’t offer any solutions, but I wanted to note that exploded dependency tree is also problematic in organizations where all direct and indirect dependencies must go through open source license review. So far in my review, all of the dependencies have permissive licenses, but it’s a lot to ask of our lawyers.
The important metric for me is number of dependencies (
list_dependencies
is an internal tool to list all direct and indirect dependencies of a module):I understand the nuance of dependency being in go.sum versus actually being compiled, but it doesn’t make a difference for license review (since you’re just 1 import away from using the code once it’s in
go.sum
).One of the major issues with Go modules is that even if a dependency is not part of the final artifact, Go will still download it (or parts of it, based on whether it has a go.mod file or not). This is particularly annoying if you use go mod download in a dockerfile for example to propagate cache: then it will download everything.
I agree that go-kit should probably improve the situation (by splitting up the monorepo or by introducing submodules)
And if that is really an issue, we should keep the vendors directories in our projects as it prevent users to have them in their cache.
While this is true, this is nothing irreversible. Also, the truth is that technically the bloat does not matter as if you don’t import a package that imports that bloat, it will NOT end up as a dependency of yours for your module.
I don’t want to speculate, would be nice to see the exact, other problems this situation can cause.
As per the solution, it’s not that easy to remove https://github.com/go-kit/kit/tree/master/log dep, by forking, etc. The reason is that everyone depends on go-kit anyway, so there will be always some helper etc, using it. It would need to be a bigger upstream change. Can we at least try to ask go-kit to move the log to a separate sub-module first? Probably unlikely they will accept it, but it will at least highlight the problem on their side. There is also a road map: https://github.com/go-kit/kit/issues/843 which tells us they plan to clean the packages a bit. (drop unnecessary ones)