dbt-docs: Slow response times interacting with data lineage chart on large projects

Describe the bug

Users open the data lineage graph by clicking the button at the bottom-right corner of the page. It takes 2-3 seconds for the graph to load. This issue persists beyond the initial load too. Most interactions take 2-3 seconds to complete when several nodes are selected in the lineage graph. Response times are also slow when using selectors; the page becomes briefly unresponsive and keystrokes aren’t immediately input to the text field.

Steps To Reproduce

Serve the data catalog for a large dbt project with relatively large manifest.json and catalog.json files; in my example, 300+ models and 1800+ tests generate a 6.2 MB manifest.json and a 1.2 MB catalog.json.

Click the “data lineage chart” button on the bottom-right corner of the page.

See profiling output below for benchmark.

Expected behavior

The data lineage chart should load under 500ms (or some other arbitrary threshold determined by users’ tolerance).

Screenshots and log output

The “View Lineage Chart” button:

image

My profiling output:

image

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.18.0
   latest version: 0.18.1

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 0.18.0
  - snowflake: 0.18.0
  - redshift: 0.18.0
  - postgres: 0.18.0

The operating system you’re using:

The data catalog is served with the base Docker image library/nginx:1.19.0-alpine.

The documentation is generated with the base Docker image library/python:3.7.7-slim-buster.

The output of python --version: 3.7.7

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

Hey @ajbosco - sorry for the delay! It’s ready. @AlexanderKutz is going to work on publishing it over the next couple days.

I’m not sure where this issue currently stands but @vogt4nick, @drewbanin and @jtcohen6 - we (kraftheinz engineering) are just about wrapped up with converting dbt docs over to React. One of the first things we did was compressed the manifest.json file during the build which solved all of the performance issues. For context, we currently have over 115 projects and the site loads in less than 5 seconds after that change.

We will have a repo to share with everyone once we are done packaging it up for public consumption.

Hello @ajbosco and anyone else interested in checking it out, I have published the react version here

big + 1 on this one! The website loads the manifest.json file over the network, so I don’t believe that any fs/streaming solution is going to be appropriate here. I do think that the way we’re loading/creating nodes for the DAG is synchronous and “blocking” in the frontend, and it makes the website pretty unusable until the entire DAG is loaded!

I think we should:

  1. Relocate this issue to the dbt-docs repo
  2. Experiment with different approaches that eg. let us very quickly render the chrome/nav for the website (+ search, for instance) and then more asynchronously and progressively build out the DAG viz

@jtcohen6 you buy it?

+1 dbt Cloud user - the slow response time of the data lineage chart for their larger project made this feature relatively unusable. Any time they tried to move or modify the chart, it lags and takes a few seconds to load correctly again

Agreed it is very slow and causes a really poor user experience.

Would love to see this move forward! We’re evangelizing dbt at the org and the docs are a large part of that.

We are very close to publishing the code but would be open to chatting directly prior to (or after).

The site itself looks exactly the same (for the most part). We are using the manifest.json but do not use the catalog.json or run_results.json. The reason for this is because we have also integrated it directly with Snowflake via a backend API. We also had some issues with the catalog.json due to the way we have implemented DBT across all of our projects.

The biggest change was a massive revamp to the lineage graph - this is all done now using HTML Canvas.

As for upstreaming the changes, there would be some work required to have it act just like the existing version. We have not gone through the process to allow it to be part of the dbt docs generate process - this is something we are definitely open to but would need some guidance on. And obviously there would need to be a version that does include the use of run_results and catalog. These are all things we agree with and would be open to helping out on but as of now can’t prioritize it due to a lack of time.

FYI on what a full rewrite can look and feel like for dbt docs: https://dagster.io/blog/dbt-docs-on-react

+1. The page crashes with our large DBT repo when interacting with the lineage chart. Here are some numbers for reference.

Found 1042 models, 1071 tests, 0 snapshots, 0 analyses, 493 macros, 0 operations, 29 seed files, 650 sources, 10 exposures, 0 metrics

image