dbt-core: Parse docs blocks when they appear in .sql files

Feature

Feature description

It’s my belief that documentation being in a separate file from the code it’s documenting will result in the documentation falling behind model development in all but the most disciplined organizations. Therefore, I believe that {% docs %} blocks should be parsed from .sql files in addition to .md files.

Granted, syntax highlighting is unlikely to be happy about the presence of markdown in a .sql file, but this is something we already deal with, having Jinja in .sql files. In the same way that we can nest a SQL comment within a Jinja comment, i.e. {% /* comment fo' reals, yo! */ #} so that both SQL and Jinja syntax highlighters will ignore the comment, users could conceivably wrap a docs block in a SQL block comment to the same effect, e.g.:

/* {% docs foo %}
## Bar!
{% enddocs %} */
select ...

Who will this benefit?

This would benefit people who wish to include documentation in the model itself, and would not negatively impact people who choose to document in the .yml file, or people who choose not to document at all (the heathens!)

About this issue

Original URL
State: closed
Created 6 years ago
Reactions: 31
Comments: 15 (4 by maintainers)

Commits related to this issue

Updates DocumentParser to look for docs in .sql files Since the .sql files have macros that aren't available to the docs parser (e.g., `ref`), we can instead get the AST for the docs macros and direc... — committed to jakebiesinger/dbt-core by deleted user 6 years ago

Most upvoted comments

We are also using dbt more and more and the project says a lot to me in many aspects. The documentation is very clear and the API is very straight forward. However, one point that also bothers me is the artificial separation of model definition and sql. I likewise think that the schema yml are not really necessary.

Background:

Currently we are thinking about splitting our schema yml into several yml’s - one per model. A single yml file quickly becomes confusing.

Our folder structure would then look like this.


models
 - model_a.yml 
 - model_a.sql

 - model_b.yml 
 - model_b.sql

This would make the structure clearer with the current tool, but it would require a lot of typing. The model name as well as the schema yml structure are not necessary with an approach as it is described here.

Proposal

Here my 3 ct to the topic Documentation in the sql file.

The first anonymous docs block found describes the documentation of the model associated with the file:

Content of model_a.sql:

{% docs %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

SELECT
  {% docs .foo %}
    The number of `Foo`s we should create.
  {% enddocs %} 
  1 AS foo,
  2 AS bar

The parser could automatically enrich the anonymous docs block with the name of the file it was found in respectively the model name - in this case “docs model_a”.

{% docs %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

Inline documentation of columns could be implemented using a leading “.” :

  {% docs .foo %}
    The number of `Foo`s we should create.
  {% enddocs %}

Again, the parser could extrapolize the model name: {% docs model_a.foo %}

The schema. yml handling in dbt docs could then default to “description: {{ doc("model_name") }}” if no model description was explicitly specified. Analogous for columns then description: {{ doc("model_name.columnName") }} would be possible.

This approach would be, from what I can see so far, backwards compatible and implementable with manageable effort.

What this approach does not yet cover: how to deal with tests. if a readable format can also be found to include them directly in sql, it would be possible to describe all the information belonging to the model directly in the model.

martinspaeth on Jan 7, 2022

Just wanted to get a read here from maintainers if possible - is this something that has been discussed and nixed, or is it of interest? We are now in the position of documenting both the .sql files AND putting documentation in schema.yml and it leads to documentation getting out of sync.

The documentation within the model file itself is preferable when reviewing PRs changing a model, updating an existing model, and/or updating documentation about an existing model. schema.yml often gets forgotten about so our documentation tends to get stale.

alexscott-ff on Nov 20, 2023

We’re also dealing with this issue. Mostly writing docs in separate .md files but it creates file clutter and having the documentation separated from the SQL makes it harder to make changes as the goal and intentions of the model are detached from the code.

Having documentation with the code would be best as everything the model is trying to achieve is all in the same context

visserp on Nov 24, 2023

This is definitely an issue.

As a developer you really want the same information in both places without the unnecessary duplication and maintenance.

Being able to add the doc block before the SQL makes so much more sense.

Phil-T1 on Nov 29, 2023

The simplest solution might be to only parse a single docs block out of the .sql file and assume it to be the model documentation. Not the cleanest or most intuitive, but it would work.

Or it could be irrelevant entirely; since under the current implementation the docs block needs to be referenced in the schema.yml file, that behavior could be retained requiring the user to make that link once for each model, but thereafter only need to change the docs block in the model to update the documentation.

this_model.sql

{% docs something %}
...
{% enddocs %}

select
...

schema.yml

version: 2

models:
  - name: this_model
    description: {{ doc('something') }}
    ...

Lastly, maybe it’s possible to have an unnamed docs block, only valid in a model file, which is always model documentation?

{% docs %}
...
{% enddocs %}

select
...

ghost on Oct 2, 2018

One possible convention:

All docs blocks in .sql files are required to have the model name as a prefix.
We document the model itself by looking for a block w/a name that matches the model name.
We document columns by looking for blocks w/name model_name.column_name.
No restrictions are placed on the positioning of any doc blocks inside the .sql file

Documenting models would be super-simple, and column docs could live right next to their column definitions. E.g., my_model.sql:

{% docs my_model %}
This model is sooo slick. Learn more [here](example.com)
{% enddocs %}

SELECT
  {% docs my_model.foo %}
    The number of `Foo`s we should create.
  {% enddocs %} 
  1 AS foo,
  2 AS bar

I don’t think I’d mind having to explicitly say the model name or the column names. Doing so still beats defining this stuff in .md or .yml files IMHO.

jakebiesinger on Oct 1, 2018