sqlfluff: Internal error when linting dbt model that contains a call statement
Search before asking
- I searched the issues and found no similar issues.
What Happened
I used sqlfluff lint --dialect snowflake myfile.sql
using the requirements below and got internal errors when linting a dbt model. It seems to be a different error than the one reported in #3611
dbt-core==1.3
dbt-snowflake==1.3.0
streamlit==1.12.0
sqlfluff==1.4.1
sqlfluff-templater-dbt==1.4.1
I got a warning/error with this traceback:
root@645cc3519edf:/app# sqlfluff lint --dialect snowflake models/error_fluff.sql
=== [dbt templater] Sorting Nodes...
=== [dbt templater] Compiling dbt project...
=== [dbt templater] Project Compiled.
WARNING Unable to lint models/error_fluff.sql due to an internal error. Please report this as an issue with your query's contents and stacktrace below!
To hide this warning, add the failing file to .sqlfluffignore
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/runner.py", line 108, in run
yield partial()
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 769, in lint_rendered
parsed = cls.parse_rendered(rendered)
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 378, in parse_rendered
tokens, lvs, config = cls._lex_templated_file(
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/linter/linter.py", line 159, in _lex_templated_file
tokens, lex_vs = lexer.lex(templated_file)
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 564, in lex
segments: Tuple[RawSegment, ...] = self.elements_to_segments(
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 620, in elements_to_segments
placeholders, source_slice, block_stack = _generate_placeholder_segments(
File "/usr/local/lib/python3.8/site-packages/sqlfluff/core/parser/lexer.py", line 446, in _generate_placeholder_segments
block_uuid = block_stack.pop()
IndexError: pop from empty list
Expected Behaviour
- linting does not fail or
- error is more informative
Observed Behaviour
I obtained the error & traceback above when linting this file:
{{
config(
materialized = 'view'
)
}}
{% call statement('unique_keys', fetch_result=True) %}
select 'tests' as key_name
{% endcall %}
{% set unique_keys = load_result('unique_keys') %}
with temp as (
select 1 as dummy
union all
select 2
union all
select 99
),
prep as (
select
row_number() over (order by dummy asc ) as id,
{%- if execute %}
{%- for key_name in unique_keys['data'] %}
{%- if key_name[0] not in ('tests')%}
'this_was_a_test_indeed'
{%- else %}
coalesce('this was not a test','error')
{%- endif %}
as was_it_a_test,
{%- endfor %}
{% endif %}
1 as dummy
from temp)
select * from prep
removing the call statement block resolves the linting error (so this one doesn’t fail):
{{
config(
materialized = 'view'
)
}}
{% set unique_keys = load_result('unique_keys') %}
with temp as (
select 1 as dummy
union all
select 2
union all
select 99
),
prep as (
select
row_number() over (order by dummy asc ) as id,
{%- if execute %}
{%- for key_name in unique_keys['data'] %}
{%- if key_name[0] not in ('tests')%}
'this_was_a_test_indeed'
{%- else %}
coalesce('this was not a test','error')
{%- endif %}
as was_it_a_test,
{%- endfor %}
{% endif %}
1 as dummy
from temp)
select * from prep
How to reproduce
To reproduce: I’m running this on docker, but I would imagine this error is independent of the environment:
docker-compose:
version: '3.3'
services:
frosty-local:
command: /bin/bash
build: .
volumes:
- ./:/app/
# - ~/.dbt:/root/.dbt
- ~/.ssh:/root/.ssh
environment:
- several vars exported to set up the profiles.yml
ports:
- "8085:8085"
- "8501:8501"
Dockerfile:
FROM python:3.8.2-buster
RUN apt-get update -yqq \
&& apt-get install -yqq \
postgresql \
libssl-dev \
less \
vim \
libffi-dev \
libpq-dev
# upgrade pip
RUN pip install --upgrade pip
RUN echo 'alias ll="ls -lari"' >> ~/.bashrc
# Install dependencies:
COPY requirements.txt .
RUN pip install -r requirements.txt
ENV DBT_PROFILES_DIR=/app/profiles
RUN mkdir -p /app
COPY ./ /app/
RUN chmod 777 -R /app
WORKDIR /app/
RUN dbt deps
EXPOSE 8501
error_fluff.sql:
{{
config(
materialized = 'view'
)
}}
{% call statement('unique_keys', fetch_result=True) %}
select 'tests' as key_name
{% endcall %}
{% set unique_keys = load_result('unique_keys') %}
with temp as (
select 1 as dummy
union all
select 2
union all
select 99
),
prep as (
select
row_number() over (order by dummy asc ) as id,
{%- if execute %}
{%- for key_name in unique_keys['data'] %}
{%- if key_name[0] not in ('tests')%}
'this_was_a_test_indeed'
{%- else %}
coalesce('this was not a test','error')
{%- endif %}
as was_it_a_test,
{%- endfor %}
{% endif %}
1 as dummy
from temp)
select * from prep
command:
sqlfluff lint --dialect snowflake models/error_fluff.sql
Dialect
sqlfluff-templater-dbt & dbt-snowflake
Version
1.4.1
Configuration
sqlfluff:
[sqlfluff]
# verbose is an integer (0-2) indicating the level of log output
verbose = 0
# Turn off color formatting of output
nocolor = False
# Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html
# Or run 'sqlfluff dialects'
dialect = None
# One of [raw|jinja|python|placeholder]
templater = dbt
# Comma separated list of rules to check, default to all
rules = L001,L002,L003,L004,L005,L006,L008,L009,L010,L011,L012,L014,L015,L016,L017,L018,L019,L020,L021,L022,L023,L025,L027,L028,L029,L030,L032,L036,L038,L039,L040,L041,L042,L043,L045,L046,L048,L049,L051,L053,L054,L055,L057,L058,L059,L060,L061,L062,L063,L065
# all
# Comma separated list of rules to exclude, or None
exclude_rules = L007,L013,L024,L026,L031,L033,L034,L035,L037,L044,L047,L050,L052,L056,L064,L066,
# The depth to recursively parse to (0 for unlimited)
recurse = 0
# Below controls SQLFluff output, see max_line_length for SQL output
output_line_length = 80
# Number of passes to run before admitting defeat
runaway_limit = 10
# Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating)
ignore = None
# Ignore linting errors found within sections of code coming directly from
# templated code (e.g. from within Jinja curly braces. Note that it does not
# ignore errors from literal code found within template loops.
ignore_templated_areas = True
# can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig
encoding = autodetect
# Ignore inline overrides (e.g. to test if still required)
disable_noqa = False
# Comma separated list of file extensions to lint
# NB: This config will only apply in the root folder
sql_file_exts = .sql,.sql.j2,.dml,.ddl
# Allow fix to run on files, even if they contain parsing errors
# Note altering this is NOT RECOMMENDED as can corrupt SQL
fix_even_unparsable = False
# Very large files can make the parser effectively hang.
# The more efficient check is the _byte_ limit check which
# is enabled by default. The previous _character_ limit check
# is still present for backward compatibility. This will be
# removed in a future version.
# Set either to 0 to disable.
large_file_skip_char_limit = 0
large_file_skip_byte_limit = 20000
# CPU processes to use while linting.
# If positive, just implies number of processes.
# If negative or zero, implies number_of_cpus - specified_number.
# e.g. -1 means use all processors but one. 0 means all cpus.
processes = 1
[sqlfluff:indentation]
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations
indented_joins = True
indented_ctes = True
indented_using_on = True
indented_on_contents = True
template_blocks_indent = True
# Layout configuration
# See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing
[sqlfluff:layout:type:comma]
spacing_before = touch
line_position = trailing
[sqlfluff:layout:type:binary_operator]
line_position = leading
[sqlfluff:layout:type:statement_terminator]
spacing_before = touch
line_position = trailing
[sqlfluff:layout:type:end_of_file]
spacing_before = touch
[sqlfluff:layout:type:set_operator]
line_position = alone:strict
[sqlfluff:layout:type:start_bracket]
spacing_after = touch
[sqlfluff:layout:type:end_bracket]
spacing_before = touch
[sqlfluff:layout:type:start_square_bracket]
spacing_after = touch
[sqlfluff:layout:type:end_square_bracket]
spacing_before = touch
[sqlfluff:layout:type:casting_operator]
spacing_before = touch
spacing_after = touch
[sqlfluff:layout:type:comparison_operator]
spacing_within = touch
line_position = leading
[sqlfluff:layout:type:object_reference]
spacing_within = inline
[sqlfluff:layout:type:comment]
spacing_before = any
spacing_after = any
# Template loop tokens shouldn't dictate spacing around them.
[sqlfluff:layout:type:template_loop]
spacing_before = any
spacing_after = any
[sqlfluff:templater]
unwrap_wrapped_queries = True
[sqlfluff:templater:jinja]
apply_dbt_builtins = True
# Some rules can be configured directly from the config common to other rules
[sqlfluff:rules]
tab_space_size = 4
# Max line length is set by default to be in line with the dbt style guide.
# https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md
max_line_length = 80
indent_unit = space
allow_scalar = True
single_table_references = consistent
unquoted_identifiers_policy = all
# Some rules have their own specific config
[sqlfluff:rules:L003]
hanging_indents = True
[sqlfluff:rules:L010]
# Keywords
capitalisation_policy = upper
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:L011]
# Aliasing preference for tables
aliasing = explicit
[sqlfluff:rules:L012]
# Aliasing preference for columns
aliasing = explicit
[sqlfluff:rules:L014]
# Unquoted identifiers
extended_capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
# unquoted_identifiers_policy = aliases
[sqlfluff:rules:L016]
# Line length
ignore_comment_lines = True
ignore_comment_clauses = False
max_line_length = 100
[sqlfluff:rules:L027]
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:L026]
# References must be in FROM clause
# Disabled for some dialects (e.g. bigquery)
force_enable = False
[sqlfluff:rules:L028]
# References must be consistently used
# Disabled for some dialects (e.g. bigquery)
force_enable = False
[sqlfluff:rules:L029]
# Keywords should not be used as identifiers.
unquoted_identifiers_policy = aliases
quoted_identifiers_policy = none
# Comma separated list of words to ignore for this rule
ignore_words = email,first_name,last_name,value
ignore_words_regex = (^.*_$),(^.*\$.*$)
[sqlfluff:rules:L030]
# Function names
extended_capitalisation_policy = lower
# consistent
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:L031]
# Avoid table aliases in from clauses and join conditions.
# Disabled for some dialects (e.g. bigquery)
force_enable = False
[sqlfluff:rules:L036]
wildcard_policy = single
[sqlfluff:rules:L038]
# Trailing commas
select_clause_trailing_comma = forbid
[sqlfluff:rules:L040]
# Null & Boolean Literals
capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:L042]
# By default, allow subqueries in from clauses, but not join clauses
forbid_subquery_in = both
# join
[sqlfluff:rules:L047]
# Consistent syntax to count all rows
prefer_count_1 = False
prefer_count_0 = False
[sqlfluff:rules:L051]
# Fully qualify JOIN clause
fully_qualify_join_types = inner
[sqlfluff:rules:L052]
# Semi-colon formatting approach
multiline_newline = False
require_final_semicolon = False
[sqlfluff:rules:L054]
# GROUP BY/ORDER BY column references
group_by_and_order_by_style = consistent
[sqlfluff:rules:L057]
# Special characters in identifiers
unquoted_identifiers_policy = all
quoted_identifiers_policy = all
allow_space_in_identifier = False
additional_allowed_characters = None
ignore_words = email,first_name,last_name,metadata$filename,metadata$file_row_number
ignore_words_regex = (^.*_$)
[sqlfluff:rules:L059]
# Policy on quoted and unquoted identifiers
prefer_quoted_identifiers = False
ignore_words = None
ignore_words_regex = None
force_enable = False
[sqlfluff:rules:L062]
# Comma separated list of blocked words that should not be used
blocked_words = None
blocked_regex = None
[sqlfluff:rules:L063]
# Data Types
extended_capitalisation_policy = lower
# Comma separated list of words to ignore for this rule
ignore_words = None
ignore_words_regex = None
[sqlfluff:rules:L064]
# Consistent usage of preferred quotes for quoted literals
preferred_quoted_literal_style = consistent
# Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres)
force_enable = False
[sqlfluff:rules:L066]
min_alias_length = None
max_alias_length = None
. sqlfluffignore
target/
dbt_packages/
/app/dbt_packages/
macros/
models/challenge_20*
Are you willing to work on and submit a PR to address the issue?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 4
- Comments: 18 (10 by maintainers)
Commits related to this issue
- Test case to demonstrate #4013 — committed to sqlfluff/sqlfluff by alanmcruickshank a year ago
☝️ I’ve demonstrated this failure in the PR mentioned above (see the results of the checks on the first commit: https://github.com/sqlfluff/sqlfluff/actions/runs/4405984521/jobs/7717601722).
I’ve also tried altering the code to see how things fail if we… don’t rename it from
block_start
totemplated
, but leave everything the same.SQLFluff is not commercial software, and has a limited number of unpaid developers. Maintaining multiple branches is extra work. If people are working for free, please be understanding about their willingness (or lack thereof) to do even more.
Ah - ok I’ve found out why. There’s a section in [tracer.py#L562], where if we find an opening
call
block, that we swap theblock_type
fromblock_start
totemplated
. Because it’s no longer ablock_start
, we don’t add it’s ID to the stack, and then when we get to theblock_end
(which is still ablock_end
) then we don’t have anything left on the stack to use.I’m not sure I understand why we re-type it in the tracer however. @barrywhart - I think this is more your area of expertise. Any ideas?
If I disable that clause (i.e. add an
and False:
to the end of theif
statement), then this bug goes away.I have a feeling I know where this is coming from. To support indentation around jija tags, we try to match starts and ends. It looks like some kind of miscategorisation means we exhaust our stack of starts before we match up all the ends.
What would make this significantly easier to resolve would be a minimal query that still exhibits the issue. If we can show that a query with a
{% call ... %}
block is all that’s required then that makes this much easier to solve than if it’s an interaction between that block and some others.I’ll assign this one to me because I think I last touched the block matching code.