yq: Converting YAML to JSON is super slow
Describe the bug
When converting a 5000 line YAML to JSON using yq:
time docker run \
--rm \
-u $(id -ru):$(id -rg) \
-v $(pwd):/shared \
-w /shared \
mikefarah/yq yq r --tojson swagger.json
It takes 2 minutes:
...
docker run --rm -u $(id -ru):$(id -rg) -v $(pwd):/shared -w /shared yq r 0,07s user 0,03s system 0% cpu 2:18,42 total
Similarly, a 500 line file takes around 5s, which is in itself super slow.
Input Yaml
5000 lines of YAML, not pasting that here.
Command The command you ran:
yq r --tojson swagger.yml
Actual behavior
Super slow
Expected behavior
It should take under a second. That’s what a simple Python program does it in.
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 17 (7 by maintainers)
@mikefarah I am loving to take a look and do a PR – before beginning, is there some architectural change reason that 2.4.x works quickly and 3.x is slow?
+1:
A yaml input file of mine is 827 lines long and makes heavy, multi-level use of anchors and aliases; exploding it into a 10294-line output file takes 10-12 minutes. In some sitations, I can speed up development by exploding it into a json file and loading that using
jq(which takes <1s), but obviously, whenever the original yaml file changes, I am stalled. Only workaround for now is to comment out the aliases that have not changed.I do understand that anchor / alias support was implemented in a naive way in order to support them at all - and @mikefarah: I do sincerely appreciate the great work 👍 - anchors and aliases enable me to do fantastic stuff with yaml - but whatever makes this faster would bring a significant benefit to my work.
BTW: I know nothing about Go, but from I’ve heard, it’s really good at parallelizing tasks (“co-routines” and what not ?!?); looking at
topwhile the exploding is running showsyqusing ~180% CPU (e.g.CPU usage: 25% user, 5% sys, 70% idle) on a mid-2015 MacBook Pro (Intel Core i7 Quad Core); wouldn’t this ideally max out my CPU ? (at - I don’t know - 400% /95% useror so ?!?)@PaulCharlton: Is there anything I can do to support you with the PR you mention ? I’m able and more than happy to test dev versions and report benchmarks - somewhat self-serving, would get me a chance to get GOing (jeez, my puns have seen better days…)
Ah yeah, this will be when I moved to 3.0. As part of handling yaml anchors on conversion to JSON I made it iterate through the yaml structure. Let me play around to see if I can do that better…or even skip it