dbt-core: [CT-260] [Bug] `generate_database_name` does not apply to sources
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
When using sources, the generate_database_name override doesn’t apply to sources.
Expected Behavior
Sources would take the overridden database name, just like models.
This is either a very big oversight, or for some reason it’s intentional. It’s quite a blocker for me though. I’ve tried to get around it by adding some jinja in my sources yaml block, but no luck so far. I’m even trying to call macros from there, which I don’t think is possible.
I suppose leaving database explicitly unset might be an OK workaround, as it would take the database of the session which most of the time would be correct.
Steps To Reproduce
- Set up a dbt project
- Configure a source and a model that consumes the source.
- Add a
generate_database_nameoverride macro. Here’s an example:
{% macro generate_database_name(custom_database_name=none, node=none) -%}
{%- if custom_database_name is not none and node.resource_type in ['seed', 'model'] -%}
{%- set error_message -%}
{{ node.resource_type | capitalize }} '{{ node.unique_id }}' has a database configured. This is not allowed because the it will be derived from the directory structure and target name.
{%- endset -%}
{{ exceptions.raise_compiler_error(error_message) }}
{%- endif -%}
{% set path = node.path %}
{% set database = path.split('/')[0] %}
{% if target.name in ['prod_us', 'prod_eu'] %}
{{ database }}
{% else %}
{{ database + "_" + target.name }}
{% endif %}
{%- endmacro %}
dbt compile- The compiled models will have whatever db is set in your profile, instead of the overridden one from the macro
Relevant log output
No response
Environment
- OS: macOS
- Python: 3.7.9
- dbt:
❯ dbt --version
installed version: 1.0.2
latest version: 1.0.2
Up to date!
Plugins:
- snowflake: 1.0.0 - Up to date!
### What database are you using dbt with?
snowflake
### Additional Context
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (15 by maintainers)
Jumping in with three quick thoughts, since there’s already some great conversation happening here:
generate_database_namefor sources by default. dbt assumes that (a) sources are preexisting tables with predetermined locations, not things it is tasked with putting in place; and (b) those locations don’t vary across environments. The goal behind the default custom schema behavior is to avoid two developers putting the same model in the same namespace location, and stepping on each others’ toes. The same logic applies for why snapshots usetarget_database+target_schemaconfigs instead, and don’t usegenerate_X_namemacros—it should be the same for all users. Obviously, in many complex deployments, these objects can vary across environments—sources in dev/staging are sampled, cleansed of PII, perhaps even faked entirely.target_database = generate_database_name('db')if they so choose. The default is justifiable, and the alternative is still possible. Unfortunately, dbt doesn’t support macro calls within.ymlconfiguration today. (Exception: “late-rendered” config, such as hooks.) This is a capability that we’ve continually worked toward, and we’ve laid a number of the foundational pieces. The reason we haven’t just done this: adding macros to the rendering contexts for configurations will be a significant performance hit at parse time. Could we add support for justgenerate_X_namemacros, since those already have special behavior during parsing? Maybe. That would make this possible, at least:I’ve just come across this behaviour too and was struggling to understand why sources weren’t changing their database value even though I configured the macro to do so.
In my case I want any source that uses a
PROD_database to use theDEV_database instead, when thenon-liveprofile is used.For now the below Jinja logic works, it’s just not very user friendly for developers when it comes to adding a new source. Would love for this to be a flag or setting in a future version.
Sorry, yes, no way to do this today. I meant option 2 to be an illustration of how you could someday reenable this, for a particular source, in the same way users have done with snapshots.
So option 4 might be, an optional configuration that “enrolls” sources into
generate_x_namemacros (or a dedicated source-specific version). Behind the scenes, this would need to look like runningupdate_parsed_node_nameover sources when they’re parsed.For the short term, I still like the third option most. We’re planning to start work on that soon, with the aim of including it in v1.1. After that capability is unlocked, we can see if there remains a significant unmet need for more dynamic, macro-based configuration.
Good thoughts, thanks @jtcohen6. Of those three I am most a fan of the third option.
It’s not uncommon for people to use the same physical instance of a cloud database with separate logical (
_qakind of thing) databases, to save on $$. Also, it’s not uncommon for folks to do what I’m doing where separate sub-orgs within a company go in separate databases. I understand the assumption dbt is currently making that sources are expected to be in a known location, my point here is to call into question the validity of that assumption.My main question for you is, why isn’t there a 4th option, a
generate_X_name_sources(where x is database and schema)? Same exact idea as what we do for models, just applied to sources.Similarly, you did say that dbt “doesn’t support
generate_x_namefor sources by default”, which implies it can be enabled. Where can I change that setting (Perhaps there is no way to and I’m just reading in a bit too carefully)?