dbt-core: [CT-201] Reconcile configs + properties for sources

Follow-up to #2401, #3616

Let’s convert some source properties into configs that can be set inside config: blocks and within dbt_project.yml:

# models/src_whatever.yml
version: 2
sources:
  - name: my_source
    description: ... # not a config
    config:
      enabled: ...
      quoting: {dict}
      freshness: {dict}
      loader: ...
      loaded_at_field: ...
      database: ... # or 'project' in dbt-bigquery
      schema: ... # or 'dataset' in dbt-bigquery
      identifier: ... # this is like an alias for alias
      meta: {dict}
      tags: ...
    tables:
      - name: my_src_table
        config: # all the same stuff as above. these take precedence for specific tables
        description: ... # not a config
        tests: ... # not a config
        columns: ... # not a config
# dbt_project.yml
sources:
  project_name:
    subdirectory:
      +database: raw
      +loader: fivetran
      another_subdirectory:
        +enabled: false

For backwards compatibility, we should still support setting these as top-level properties:

sources:
  - name: my_source
    loaded_at_field: updated_at

But raise an error if the same config is set in both places (even if it’s with the same value):

sources:
  - name: my_source
    loaded_at_field: updated_at
    config:
      loaded_at_field: updated_at

Notes

  • We’re thinking that descriptions shouldn’t be configs. They’re rendered with a different context (docs), and they don’t make sense to set hierarchically. The same goes for tests and columns—this would be quite tricky to figure out, since tests actually generate new nodes, rather than adding properties to the existing node.

  • As far as the manifest / graph.sources context, I’m open to suggestions. For backwards compatibility, we’d want to store things like loaded_at_field in both node.config and node-level keys. But I think there’s a valid argument for removing this as a top-level key and only storing it in node.config, so long as we communicate clearly that such a move is taking place.

  • For better state:modified comparisons, we’d want to store the un-rendered version of these configs in node.unrendered_config, regardless of whether they’re set in dbt_project.yml or models/src_whatever.yml. Original issue for this is https://github.com/dbt-labs/dbt/issues/2744.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 16 (14 by maintainers)

Most upvoted comments

Please make possible to add “config:” section to my sources.yml on source table level to store information! Configs are reachable programmatically on runtime with {{ config.get(‘my_config_key1’) }} and i need to get watermark_field for my source: image

@alexrosenfeld10 Totally fair question. v1.1 will include the work completed by https://github.com/dbt-labs/dbt-core/pull/5008: supporting config.enabled defined in each source’s .yml definition. That’s what we were able to get done with the time and appetite we had to work on this. Unfortunately, I don’t have an estimate for when we’ll be able to prioritize remaining steps, since we need to move onto other initiatives.

Context:

  • We already support one configuration for sources, the enabled config in dbt_project.yml
  • Sources support override, which allows users to redefine a source, with new properties, that take precedence over the same properties for the source with the same name defined in a package.

In summary, the goal of this issue:

  • Make it possible to define source properties in dbt_project.yml.
  • Make it possible to enable/disable a source table within its .yml file definition

Main exit criteria:

  • Sources and source tables, defined in .yml files, accept a new property, config:
  • That config property accepts enabled: true|false, which has the effect of enabling or disabling the source
  • It’s also possible to define existing properties within that config property, i.e. as configs as instead. These configs can be defined at the project, source, and source-table level—and they are inherited/overridden in that order of specificity.
  • All source configs that currently exist as node-level attributes, are copied over to those node-level attributes in manifest.json. This offers backwards compatibility for metadata use cases (such as dbt-docs) that depend on accessing those attributes at the top-level. If users want fine-grained control over config resolution, they should define these as configs.

Totally fair question around how to handle a source overrides. This might be out of scope for the initial effort, but it’s worth thinking through, since we want to eventually support property overrides for other resource types, too: #4157.

I think configurations should be resolved first, and then override second, but that’s not a strongly held view—as long as we can document consistent behavior. To make this concrete:

# dbt_project.yml
sources:
  my_package:
    my_src_name:
      +database: db_one
sources:
  - name: my_src_name
    overrides: my_package
    database: db_two

I would expect both configurations to be resolved, then the override applied, such that my_src_name ends up pointing to db_two.

This is great! Just commenting here to show my support of this feature.