yq: Converting YAML to JSON is super slow

Describe the bug When converting a 5000 line YAML to JSON using yq:

time docker run \                                                      
        --rm \
        -u $(id -ru):$(id -rg) \
        -v $(pwd):/shared \
        -w /shared \
        mikefarah/yq yq r --tojson swagger.json

It takes 2 minutes:

...
docker run --rm -u $(id -ru):$(id -rg) -v $(pwd):/shared -w /shared  yq r    0,07s user 0,03s system 0% cpu 2:18,42 total

Similarly, a 500 line file takes around 5s, which is in itself super slow.

Input Yaml

5000 lines of YAML, not pasting that here.

Command The command you ran:

yq r --tojson swagger.yml

Actual behavior

Super slow

Expected behavior

It should take under a second. That’s what a simple Python program does it in.

Additional context Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 17 (7 by maintainers)

Most upvoted comments

@mikefarah I am loving to take a look and do a PR – before beginning, is there some architectural change reason that 2.4.x works quickly and 3.x is slow?

+1:

A yaml input file of mine is 827 lines long and makes heavy, multi-level use of anchors and aliases; exploding it into a 10294-line output file takes 10-12 minutes. In some sitations, I can speed up development by exploding it into a json file and loading that using jq (which takes <1s), but obviously, whenever the original yaml file changes, I am stalled. Only workaround for now is to comment out the aliases that have not changed.

I do understand that anchor / alias support was implemented in a naive way in order to support them at all - and @mikefarah: I do sincerely appreciate the great work 👍 - anchors and aliases enable me to do fantastic stuff with yaml - but whatever makes this faster would bring a significant benefit to my work.

BTW: I know nothing about Go, but from I’ve heard, it’s really good at parallelizing tasks (“co-routines” and what not ?!?); looking at top while the exploding is running shows yq using ~180% CPU (e.g. CPU usage: 25% user, 5% sys, 70% idle) on a mid-2015 MacBook Pro (Intel Core i7 Quad Core); wouldn’t this ideally max out my CPU ? (at - I don’t know - 400% / 95% user or so ?!?)

@PaulCharlton: Is there anything I can do to support you with the PR you mention ? I’m able and more than happy to test dev versions and report benchmarks - somewhat self-serving, would get me a chance to get GOing (jeez, my puns have seen better days…)

Ah yeah, this will be when I moved to 3.0. As part of handling yaml anchors on conversion to JSON I made it iterate through the yaml structure. Let me play around to see if I can do that better…or even skip it