dbt-core: Parse docs blocks when they appear in .sql files
Feature
Feature description
It’s my belief that documentation being in a separate file from the code it’s documenting will result in the documentation falling behind model development in all but the most disciplined organizations. Therefore, I believe that {% docs %} blocks should be parsed from .sql files in addition to .md files.
Granted, syntax highlighting is unlikely to be happy about the presence of markdown in a .sql file, but this is something we already deal with, having Jinja in .sql files. In the same way that we can nest a SQL comment within a Jinja comment, i.e. {% /* comment fo' reals, yo! */ #} so that both SQL and Jinja syntax highlighters will ignore the comment, users could conceivably wrap a docs block in a SQL block comment to the same effect, e.g.:
/* {% docs foo %}
## Bar!
{% enddocs %} */
select ...
Who will this benefit?
This would benefit people who wish to include documentation in the model itself, and would not negatively impact people who choose to document in the .yml file, or people who choose not to document at all (the heathens!)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 31
- Comments: 15 (4 by maintainers)
We are also using dbt more and more and the project says a lot to me in many aspects. The documentation is very clear and the API is very straight forward. However, one point that also bothers me is the artificial separation of model definition and sql. I likewise think that the schema yml are not really necessary.
Background:
Currently we are thinking about splitting our schema yml into several yml’s - one per model. A single yml file quickly becomes confusing.
Our folder structure would then look like this.
This would make the structure clearer with the current tool, but it would require a lot of typing. The model name as well as the schema yml structure are not necessary with an approach as it is described here.
Related topics
In the forum I came across a scipt that can add similar functionality:
https://discourse.getdbt.com/t/here-is-a-way-to-write-dbt-docs-as-sql-comment/1658
The approach described here is quite nice, but preprocessing outside the standard is not an option for us.
First MR’s that would allow a development in this direction natively are unfortunately not continued: https://github.com/dbt-labs/dbt-core/pull/1042#issuecomment-443898027 Also the described approach to integrate sql files via a config was unfortunately not pursued further.
I think the activities as well as the many likes of the issue deserve some attention 😃
Proposal
Here my 3 ct to the topic Documentation in the sql file.
The first anonymous docs block found describes the documentation of the model associated with the file:
Content of model_a.sql:
The parser could automatically enrich the anonymous docs block with the name of the file it was found in respectively the model name - in this case “docs model_a”.
Inline documentation of columns could be implemented using a leading “.” :
Again, the parser could extrapolize the model name:
{% docs model_a.foo %}The schema. yml handling in dbt docs could then default to “
description: {{ doc("model_name") }}” if no model description was explicitly specified. Analogous for columns then description:{{ doc("model_name.columnName") }}would be possible.This approach would be, from what I can see so far, backwards compatible and implementable with manageable effort.
What this approach does not yet cover: how to deal with tests. if a readable format can also be found to include them directly in sql, it would be possible to describe all the information belonging to the model directly in the model.
Just wanted to get a read here from maintainers if possible - is this something that has been discussed and nixed, or is it of interest? We are now in the position of documenting both the
.sqlfiles AND putting documentation inschema.ymland it leads to documentation getting out of sync.The documentation within the model file itself is preferable when reviewing PRs changing a model, updating an existing model, and/or updating documentation about an existing model.
schema.ymloften gets forgotten about so our documentation tends to get stale.We’re also dealing with this issue. Mostly writing docs in separate .md files but it creates file clutter and having the documentation separated from the SQL makes it harder to make changes as the goal and intentions of the model are detached from the code.
Having documentation with the code would be best as everything the model is trying to achieve is all in the same context
This is definitely an issue.
As a developer you really want the same information in both places without the unnecessary duplication and maintenance.
Being able to add the doc block before the SQL makes so much more sense.
The simplest solution might be to only parse a single docs block out of the .sql file and assume it to be the model documentation. Not the cleanest or most intuitive, but it would work.
Or it could be irrelevant entirely; since under the current implementation the docs block needs to be referenced in the schema.yml file, that behavior could be retained requiring the user to make that link once for each model, but thereafter only need to change the docs block in the model to update the documentation.
this_model.sql
schema.yml
Lastly, maybe it’s possible to have an unnamed docs block, only valid in a model file, which is always model documentation?
One possible convention:
docsblocks in .sql files are required to have the model name as a prefix.model_name.column_name.Documenting models would be super-simple, and column docs could live right next to their column definitions. E.g., my_model.sql:
I don’t think I’d mind having to explicitly say the model name or the column names. Doing so still beats defining this stuff in .md or .yml files IMHO.