meltano: Bug: Null config values don't get passed to plugins, preventing Mappers from working as expected

Trying to replicate the behavior here

https://sdk.meltano.com/en/latest/stream_maps.html

{
    "stream_maps": {
        "customers": {
            // exclude these since we're capturing them in the pii stream
            "email": null,
            "full_name": null
        },
        "customers_pii": {
            "__source__": "customers",
            // include just the PII and the customer_id
            "customer_id": "customer_id",
            "email": "email",
            "full_name": "full_name",
            // exclude anything not declared
            "__else__": null,
        },
    },
}

So if you have a meltano.yml like

config:
  stream_maps:
    customers_pii:
      "email": "email"
      __else__: null

And then run meltano invoke --dump=config tap-name the else token will not exist.

It’s not clear that this is a bug as I did some digging here and as a part of https://gitlab.com/meltano/meltano/-/issues/2334 (more specifically https://gitlab.com/meltano/meltano/-/merge_requests/1898 ) it looks like passing null values was removed as it was causing a different issue

UPDATE (AJ, 2022-08-30)

As noted below in https://github.com/meltano/meltano/issues/6382#issuecomment-1208353612, the bug is that Meltano excludes/removes config input values if they are null.

I can confirm this is still the behavior

visch@visch-ubuntu:~/git/tap-abc$ meltano --version
meltano, version 2.4.0
      config:
        stream_maps:
          customers_pii:
            "email": "email"
            __else__: null
visch@visch-ubuntu:~/git/tap-abc$ meltano invoke --dump=config tap-abc 
{
  "stream_maps": {
    "customers_pii": {
      "email": "email"
    }
 }

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

@tayloramurphy logging here that we had another user (and myself too) get stuck on this bug in slack today https://meltano.slack.com/archives/C01TCRBBJD7/p1671134681823139.

From my view mappers and stream maps are a differentiating meltano feature that gives flexibility and allows users to hack around broken taps/targets, especially since we are constantly recommending it in slack. It feels like we’re very close to something super powerful if it was cleaned up and documented.

After skimming this issue I see that its a challenge to fix in meltano but it is really a meltano bug 😄 . It helps if its fixed in the SDK but all taps/targets need to be upgraded to get support back, even though it already works today and meltano’s bug causes the null values to disappear.

Verified that this works with mappers but not taps (Really hard to just get the config for mappers but here we go)

meltano.yml

version: 1
default_environment: dev
environments:
- name: dev
- name: staging
- name: prod
project_id: 8339108a-ba30-48ae-9dcc-8c50702ff803
plugins:
  extractors:
  - name: tap-csv
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-csv.git
    config:
      stream_maps:
        customers_pii:
          email: email
          __else__:
      files:
        - entity: "in"
          path: "in.csv"
          keys: ["id"]
          delimiter: ","
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl
  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: pipedrive-exclude-custom-fields
      config:
        stream_maps:
          activities:
            id: id
            deal_title: deal_title
            __else__:
        transformations:
          - field_id: "commit"
            tap_stream_name: "commits"
            field_paths: ["author/email", "committer/email"]
            type: "HASH"

meltano run tap-csv pipedrive-exclude-custom-fields target-jsonl

(monkey patch in a sleep in target-jsonl)

}visch@visch-ubuntu:~/git/meltano-projects/test_null_key_with_tap/.meltano/run/transform-field$ ls
mapper.90aa6695-2412-4909-975c-f1901713d331.config.json
visch@visch-ubuntu:~/git/meltano-projects/test_null_key_with_tap/.meltano/run/transform-field$ cat mapper.90aa6695-2412-4909-975c-f1901713d331.config.json
{
  "stream_maps": {
    "activities": {
      "id": "id",
      "deal_title": "deal_title",
      "__else__": null
    }
  },
  "transformations": [
    {
      "field_id": "commit",
      "tap_stream_name": "commits",
      "field_paths": [
        "author/email",
        "committer/email"
      ],
      "type": "HASH"
    }
  ]

@aaronsteers yes that’s one reason, but the main reason is missing documentation https://github.com/meltano/meltano/issues/6412#issuecomment-1277792107.

@tayloramurphy, @sbalnojan

Now that the below merged, I think we can lower priority on this issue. Meltano will still purge null values from mappers’ config, but users may now instead send __NULL__ to provide that null behavior.

In the long run, it’s still worth trying to unwind that null handling behavior, but at a lower priority. (Removed from board.)

@sbalnojan - I think you’ve been avoiding the meltano-map-transforms plugin overall in tutorials for this and other reasons, but if any existing references exist, it may also worth updating tutorials/instructions, if/where applicable.

@tayloramurphy - I’m not sure honestly. We’d have to dig in deeper to know how this would actually handle.

For all the trouble this creates in the Meltano codebase, and potential complexities/risks it raises, I’m inclined to just add 'NULL' as an alias for null in the SDK - and then instruct Meltano users to send an allcaps string while we work on our longterm solve…

Logged:

@aaronsteers is there potentially a fourth option where we have a check per plugin on whether or not an env_var is set? I’m thinking:

environments:
  - name: prod
    config:
      plugins:
        extractors:
          - name: tap-github
            env:
              NULL_KEEP: true

this would keep nulls for tap-github but if I had it for

environments:
  - name: dev
    env:
      NULL_KEEP: true

it would keep it for everything?

I propose this b/c my only hesitation with the feature flag option is that it’s global and I can see users wanting to configure this behavior for specific plugins.

Long-term I like updating the SDK to use something other than null.

@visch can you confirm this is still a bug? @edgarrmondragon have you seen this?