python-bigquery-dataframes: sqlglot.errors.ParseError for sample from website

Hello everyone,

Running a copy/pasted sample from the site is raising a sqlglot.errors.ParseError. Version issues maybe? Please see details below.

Your help is appreciated!

Cheers, Sami

Environment details

  • OS type and version: Sonoma 14.2.1 (on M2 Max)
  • Python version: python --version 3.9.18
  • pip version: pip --version pip 23.0.1
  • bigframes version: pip show bigframes 0.19.0

Steps to reproduce

  1. Run code sample from : https://cloud.google.com/bigquery/docs/bigquery-dataframes#bigframes-ml-regression after adding a project id

Code example

from bigframes.ml.linear_model import LinearRegression
import bigframes.pandas as bpd

bpd.options.bigquery.project = "our_project_id"

# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)

# Filter down to the data to the Adelie Penguin species
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]

# Drop the species column
adelie_data = adelie_data.drop(columns=["species"])

# Drop rows with nulls to get training data
training_data = adelie_data.dropna()

# Specify your feature (or input) columns and the label (or output) column:
feature_columns = training_data[
    ["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"]
]
label_columns = training_data[["body_mass_g"]]

test_data = adelie_data[adelie_data.body_mass_g.isnull()]

# Create the linear model
model = LinearRegression()
model.fit(feature_columns, label_columns)

# Score the model
score = model.score(feature_columns, label_columns)

# Predict using the model
result = model.predict(test_data)
# example

Stack trace

% python src/bq_run.py
Query job bb049054-a4f0-4d88-b128-b97eb020038b is DONE.28.9 kB processed.  
https://console.cloud.google.com/bigquery?project=platform-dev-285607&j=bq:US:bb049054-a4f0-4d88-b128-b97eb020038b&page=queryresults
Traceback (most recent call last):
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1039, in parse_into
    return self._parse(parser, raw_tokens, sql)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1078, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1119, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 61.
  platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/samiabboud/dev/aampe/modeling/src/bq_run.py", line 33, in <module>
    model.fit(feature_columns, label_columns)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/base.py", line 162, in fit
    return self._fit(X, y)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/linear_model.py", line 136, in _fit
    self._bqml_model = self._bqml_model_factory.create_model(
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/core.py", line 245, in create_model
    input_data = X_train._cached().join(y_train._cached(), how="outer")
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
    self._set_block(self._block.cached())
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
    self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
    table_expression = self.ibis_client.table(
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
    table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/__init__.py", line 124, in parse_one
    result = dialect.parse_into(into, sql, **opts)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 325, in parse_into
    return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
  File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1044, in parse_into
    raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f' into <class 'sqlglot.expressions.Table'>

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 22 (5 by maintainers)

Most upvoted comments

@tswast @chelsea-lin Just tested and seems fixed! Thank you!!

@ZeroCool2u Thanks for the report. My teammate @chelsea-lin was able to determine that there is a bug in sqlglot’s parsing of BigQuery table IDs, which has been reported and hopefully fixed in a future release. In the meantime, I believe bigframes 0.22.0 will have worked around this issue. Could you please try with that version and report back if it is fixed?

In a notebook:

%pip install --upgrade bigframes

And then restart your notebook runtime.

@chelsea-lin , @tswast sorry for delayed response.

Environment details: environment.txt

Full call stack:

Traceback (most recent call last):
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1056, in parse_into
    return self._parse(parser, raw_tokens, sql)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1095, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1136, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 64.
  encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/folders/wn/fkylpmg57j11htwlk6n249xr0000gp/T/ipykernel_37862/2695306357.py", line 5, in <module>
    enc.fit(X)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/preprocessing.py", line 510, in fit
    self._bqml_model = self._bqml_model_factory.create_model(
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/ml/core.py", line 243, in create_model
    input_data = X_train._cached()
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
    return method(*args, **kwargs)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
    self._set_block(self._block.cached())
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
    self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
    table_expression = self.ibis_client.table(
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
    table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/__init__.py", line 123, in parse_one
    result = dialect.parse_into(into, sql, **opts)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 447, in parse_into
    return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
  File "/Users/furkan.uysal/anaconda3/envs/myenv/lib/python3.9/site-packages/sqlglot/parser.py", line 1061, in parse_into
    raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'encoded-hangout-414110._46f61dc8b3e2eb2697eb7be8fa45757c2d44aebe.anon271366cfceeb965c764bc43446c057e69691f83157ca78394ecb85df7904eb22' into <class 'sqlglot.expressions.Table'>

In addition to https://github.com/googleapis/python-bigquery-dataframes/issues/315#issuecomment-1960552049 could you also please run the following cell:

import bigframes
bigframes.__version__

This will help us confirm that the version of bigframes in the notebook aligns with that claimed by pip.

@uysalfurkan Thank you so much for reporting this issue! I apologize for the delay in responding. To help me track down this dependency mismatch, could you please provide the full call stack? Also, to reproduce the error in our side, would you mind sharing your environment details? You can generate these using the following command:

import sys
!{sys.executable} -m pip freeze