sqlfluff: Internal error when linting dbt model that contains a call statement

Search before asking

  • I searched the issues and found no similar issues.

What Happened

I used sqlfluff lint --dialect snowflake myfile.sql using the requirements below and got internal errors when linting a dbt model. It seems to be a different error than the one reported in #3611

dbt-core==1.3
dbt-snowflake==1.3.0
streamlit==1.12.0
sqlfluff==1.4.1
sqlfluff-templater-dbt==1.4.1

I got a warning/error with this traceback:

root@645cc3519edf:/app# sqlfluff lint --dialect snowflake models/error_fluff.sql 
=== [dbt templater] Sorting Nodes...
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
WARNING    Unable to lint models/error_fluff.sql due to an internal error. Please report this as an issue with your query's contents and stacktrace below!
To hide this warning, add the failing file to .sqlfluffignore
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/runner.py", line 108, in run
    yield partial()
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 769, in lint_rendered
    parsed = cls.parse_rendered(rendered)
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 378, in parse_rendered
    tokens, lvs, config = cls._lex_templated_file(
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 159, in _lex_templated_file
    tokens, lex_vs = lexer.lex(templated_file)
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 564, in lex
    segments: Tuple[RawSegment, ...] = self.elements_to_segments(
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 620, in elements_to_segments
    placeholders, source_slice, block_stack = _generate_placeholder_segments(
  File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 446, in _generate_placeholder_segments
    block_uuid = block_stack.pop()
IndexError: pop from empty list

Expected Behaviour

  • linting does not fail or
  • error is more informative

Observed Behaviour

I obtained the error & traceback above when linting this file:

{{
  config(
    materialized = 'view'
 )
}}
{% call statement('unique_keys', fetch_result=True) %}
  select 'tests' as key_name
{% endcall %}
{% set unique_keys = load_result('unique_keys') %}

with temp as (
    select 1 as dummy
    union all 
    select 2
    union all 
    select 99
),
prep as (
select 
    row_number() over (order by dummy asc ) as id,
    {%- if execute %}
    {%- for key_name in unique_keys['data'] %}
    {%- if key_name[0] not in ('tests')%}
        'this_was_a_test_indeed'
    {%- else %}
    coalesce('this was not a test','error')
    {%- endif %}
        as was_it_a_test,
    {%- endfor %}
    {% endif %}

    1 as dummy
from temp)
select * from prep

removing the call statement block resolves the linting error (so this one doesn’t fail):

{{
  config(
    materialized = 'view'
 )
}}
{% set unique_keys = load_result('unique_keys') %}

with temp as (
    select 1 as dummy
    union all 
    select 2
    union all 
    select 99
),
prep as (
select 
    row_number() over (order by dummy asc ) as id,
    {%- if execute %}
    {%- for key_name in unique_keys['data'] %}
    {%- if key_name[0] not in ('tests')%}
        'this_was_a_test_indeed'
    {%- else %}
    coalesce('this was not a test','error')
    {%- endif %}
        as was_it_a_test,
    {%- endfor %}
    {% endif %}

    1 as dummy
from temp)
select * from prep

How to reproduce

To reproduce: I’m running this on docker, but I would imagine this error is independent of the environment:

docker-compose:

version: '3.3'
services:
    frosty-local:
        command: /bin/bash
        build: .
        volumes:
            - ./:/app/
            # - ~/.dbt:/root/.dbt
            - ~/.ssh:/root/.ssh

        environment:
          - several vars exported to set up the profiles.yml     
        

        ports:
            - "8085:8085"
            - "8501:8501"

Dockerfile:

FROM python:3.8.2-buster

RUN apt-get update -yqq \
  && apt-get install -yqq \
    postgresql \
    libssl-dev \
    less \
    vim \
    libffi-dev \
    libpq-dev

# upgrade pip
RUN pip install --upgrade pip

RUN echo 'alias ll="ls -lari"' >> ~/.bashrc

# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt

ENV DBT_PROFILES_DIR=/app/profiles
RUN mkdir -p /app
COPY ./ /app/
RUN chmod 777 -R /app
WORKDIR /app/
RUN dbt deps
EXPOSE 8501

error_fluff.sql:

{{
  config(
    materialized = 'view'
 )
}}
{% call statement('unique_keys', fetch_result=True) %}
  select 'tests' as key_name
{% endcall %}
{% set unique_keys = load_result('unique_keys') %}

with temp as (
    select 1 as dummy
    union all 
    select 2
    union all 
    select 99
),
prep as (
select 
    row_number() over (order by dummy asc ) as id,
    {%- if execute %}
    {%- for key_name in unique_keys['data'] %}
    {%- if key_name[0] not in ('tests')%}
        'this_was_a_test_indeed'
    {%- else %}
    coalesce('this was not a test','error')
    {%- endif %}
        as was_it_a_test,
    {%- endfor %}
    {% endif %}

    1 as dummy
from temp)
select * from prep

command: sqlfluff lint --dialect snowflake models/error_fluff.sql

Dialect

sqlfluff-templater-dbt & dbt-snowflake

Version

1.4.1

Configuration

sqlfluff:

[sqlfluff]
# verbose is an integer (0-2) indicating the level of log output
verbose = 0
# Turn off color formatting of output
nocolor = False
# Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html
# Or run 'sqlfluff dialects'
dialect = None
# One of [raw|jinja|python|placeholder]
templater = dbt
# Comma separated list of rules to check, default to all
rules = L001,L002,L003,L004,L005,L006,L008,L009,L010,L011,L012,L014,L015,L016,L017,L018,L019,L020,L021,L022,L023,L025,L027,L028,L029,L030,L032,L036,L038,L039,L040,L041,L042,L043,L045,L046,L048,L049,L051,L053,L054,L055,L057,L058,L059,L060,L061,L062,L063,L065
# all
# Comma separated list of rules to exclude, or None
exclude_rules = L007,L013,L024,L026,L031,L033,L034,L035,L037,L044,L047,L050,L052,L056,L064,L066,
# The depth to recursively parse to (0 for unlimited)
recurse = 0
# Below controls SQLFluff output, see max_line_length for SQL output
output_line_length = 80
# Number of passes to run before admitting defeat
runaway_limit = 10
# Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating)
ignore = None
# Ignore linting errors found within sections of code coming directly from
# templated code (e.g. from within Jinja curly braces. Note that it does not
# ignore errors from literal code found within template loops.
ignore_templated_areas = True
# can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig
encoding = autodetect
# Ignore inline overrides (e.g. to test if still required)
disable_noqa = False
# Comma separated list of file extensions to lint
# NB: This config will only apply in the root folder
sql_file_exts = .sql,.sql.j2,.dml,.ddl
# Allow fix to run on files, even if they contain parsing errors
# Note altering this is NOT RECOMMENDED as can corrupt SQL
fix_even_unparsable = False
# Very large files can make the parser effectively hang.
# The more efficient check is the _byte_ limit check which
# is enabled by default. The previous _character_ limit check
# is still present for backward compatibility. This will be
# removed in a future version.
# Set either to 0 to disable.
large_file_skip_char_limit = 0
large_file_skip_byte_limit = 20000
# CPU processes to use while linting.
# If positive, just implies number of processes.
# If negative or zero, implies number_of_cpus - specified_number.
# e.g. -1 means use all processors but one. 0  means all cpus.
processes = 1

[sqlfluff:indentation]
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations
indented_joins = True
indented_ctes = True
indented_using_on = True
indented_on_contents = True
template_blocks_indent = True

# Layout configuration
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing
[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing

[sqlfluff:layout:type:binary_operator]
line_position = leading

[sqlfluff:layout:type:statement_terminator]
spacing_before = touch
line_position = trailing

[sqlfluff:layout:type:end_of_file]
spacing_before = touch

[sqlfluff:layout:type:set_operator]
line_position = alone:strict

[sqlfluff:layout:type:start_bracket]
spacing_after = touch

[sqlfluff:layout:type:end_bracket]
spacing_before = touch

[sqlfluff:layout:type:start_square_bracket]
spacing_after = touch

[sqlfluff:layout:type:end_square_bracket]
spacing_before = touch

[sqlfluff:layout:type:casting_operator]
spacing_before = touch
spacing_after = touch

[sqlfluff:layout:type:comparison_operator]
spacing_within = touch
line_position = leading

[sqlfluff:layout:type:object_reference]
spacing_within = inline

[sqlfluff:layout:type:comment]
spacing_before = any
spacing_after = any

# Template loop tokens shouldn't dictate spacing around them.
[sqlfluff:layout:type:template_loop]
spacing_before = any
spacing_after = any

[sqlfluff:templater]
unwrap_wrapped_queries = True

[sqlfluff:templater:jinja]
apply_dbt_builtins = True

# Some rules can be configured directly from the config common to other rules
[sqlfluff:rules]
tab_space_size = 4
# Max line length is set by default to be in line with the dbt style guide.
# https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md
max_line_length = 80
indent_unit = space
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all

# Some rules have their own specific config
[sqlfluff:rules:L003]
hanging_indents = True

[sqlfluff:rules:L010]
# Keywords
capitalisation_policy = upper
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None

[sqlfluff:rules:L011]
# Aliasing preference for tables
aliasing = explicit

[sqlfluff:rules:L012]
# Aliasing preference for columns
aliasing = explicit

[sqlfluff:rules:L014]
# Unquoted identifiers
extended_capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
# unquoted_identifiers_policy = aliases

[sqlfluff:rules:L016]
# Line length
ignore_comment_lines = True
ignore_comment_clauses = False
max_line_length = 100

[sqlfluff:rules:L027]
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None

[sqlfluff:rules:L026]
# References must be in FROM clause
# Disabled for some dialects (e.g. bigquery)
force_enable = False

[sqlfluff:rules:L028]
# References must be consistently used
# Disabled for some dialects (e.g. bigquery)
force_enable = False

[sqlfluff:rules:L029]
# Keywords should not be used as identifiers.
unquoted_identifiers_policy = aliases
quoted_identifiers_policy = none
# Comma separated list of words to ignore for this rule
ignore_words = email,first_name,last_name,value
ignore_words_regex = (^.*_$),(^.*\$.*$)

[sqlfluff:rules:L030]
# Function names
extended_capitalisation_policy = lower
# consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None

[sqlfluff:rules:L031]
# Avoid table aliases in from clauses and join conditions.
# Disabled for some dialects (e.g. bigquery)
force_enable = False

[sqlfluff:rules:L036]
wildcard_policy = single

[sqlfluff:rules:L038]
# Trailing commas
select_clause_trailing_comma = forbid

[sqlfluff:rules:L040]
# Null & Boolean Literals
capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None

[sqlfluff:rules:L042]
# By default, allow subqueries in from clauses, but not join clauses
forbid_subquery_in = both
# join

[sqlfluff:rules:L047]
# Consistent syntax to count all rows
prefer_count_1 = False
prefer_count_0 = False

[sqlfluff:rules:L051]
# Fully qualify JOIN clause
fully_qualify_join_types = inner

[sqlfluff:rules:L052]
# Semi-colon formatting approach
multiline_newline = False
require_final_semicolon = False

[sqlfluff:rules:L054]
# GROUP BY/ORDER BY column references
group_by_and_order_by_style = consistent

[sqlfluff:rules:L057]
# Special characters in identifiers
unquoted_identifiers_policy = all
quoted_identifiers_policy = all
allow_space_in_identifier = False
additional_allowed_characters = None
ignore_words = email,first_name,last_name,metadata$filename,metadata$file_row_number 
ignore_words_regex = (^.*_$)

[sqlfluff:rules:L059]
# Policy on quoted and unquoted identifiers
prefer_quoted_identifiers = False
ignore_words = None
ignore_words_regex = None
force_enable = False

[sqlfluff:rules:L062]
# Comma separated list of blocked words that should not be used
blocked_words = None
blocked_regex = None

[sqlfluff:rules:L063]
# Data Types
extended_capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None

[sqlfluff:rules:L064]
# Consistent usage of preferred quotes for quoted literals
preferred_quoted_literal_style = consistent
# Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres)
force_enable = False

[sqlfluff:rules:L066]
min_alias_length = None
max_alias_length = None

. sqlfluffignore

target/
dbt_packages/
/app/dbt_packages/
macros/
models/challenge_20*

Are you willing to work on and submit a PR to address the issue?

  • Yes I am willing to submit a PR!

Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 4
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

☝️ I’ve demonstrated this failure in the PR mentioned above (see the results of the checks on the first commit: https://github.com/sqlfluff/sqlfluff/actions/runs/4405984521/jobs/7717601722).

I’ve also tried altering the code to see how things fail if we… don’t rename it from block_start to templated, but leave everything the same.

SQLFluff is not commercial software, and has a limited number of unpaid developers. Maintaining multiple branches is extra work. If people are working for free, please be understanding about their willingness (or lack thereof) to do even more.

Ah - ok I’ve found out why. There’s a section in [tracer.py#L562], where if we find an opening call block, that we swap the block_type from block_start to templated. Because it’s no longer a block_start, we don’t add it’s ID to the stack, and then when we get to the block_end (which is still a block_end) then we don’t have anything left on the stack to use.

I’m not sure I understand why we re-type it in the tracer however. @barrywhart - I think this is more your area of expertise. Any ideas?

If I disable that clause (i.e. add an and False: to the end of the if statement), then this bug goes away.

I have a feeling I know where this is coming from. To support indentation around jija tags, we try to match starts and ends. It looks like some kind of miscategorisation means we exhaust our stack of starts before we match up all the ends.

What would make this significantly easier to resolve would be a minimal query that still exhibits the issue. If we can show that a query with a {% call ... %} block is all that’s required then that makes this much easier to solve than if it’s an interaction between that block and some others.

I’ll assign this one to me because I think I last touched the block matching code.