python-bigquery-dataframes: sqlglot.errors.ParseError for sample from website
Hello everyone,
Running a copy/pasted sample from the site is raising a sqlglot.errors.ParseError
. Version issues maybe? Please see details below.
Your help is appreciated!
Cheers, Sami
Environment details
- OS type and version: Sonoma 14.2.1 (on M2 Max)
- Python version:
python --version
3.9.18 - pip version:
pip --version
pip 23.0.1 bigframes
version:pip show bigframes
0.19.0
Steps to reproduce
- Run code sample from : https://cloud.google.com/bigquery/docs/bigquery-dataframes#bigframes-ml-regression after adding a project id
Code example
from bigframes.ml.linear_model import LinearRegression
import bigframes.pandas as bpd
bpd.options.bigquery.project = "our_project_id"
# Load data from BigQuery
query_or_table = "bigquery-public-data.ml_datasets.penguins"
bq_df = bpd.read_gbq(query_or_table)
# Filter down to the data to the Adelie Penguin species
adelie_data = bq_df[bq_df.species == "Adelie Penguin (Pygoscelis adeliae)"]
# Drop the species column
adelie_data = adelie_data.drop(columns=["species"])
# Drop rows with nulls to get training data
training_data = adelie_data.dropna()
# Specify your feature (or input) columns and the label (or output) column:
feature_columns = training_data[
["island", "culmen_length_mm", "culmen_depth_mm", "flipper_length_mm", "sex"]
]
label_columns = training_data[["body_mass_g"]]
test_data = adelie_data[adelie_data.body_mass_g.isnull()]
# Create the linear model
model = LinearRegression()
model.fit(feature_columns, label_columns)
# Score the model
score = model.score(feature_columns, label_columns)
# Predict using the model
result = model.predict(test_data)
# example
Stack trace
% python src/bq_run.py
Query job bb049054-a4f0-4d88-b128-b97eb020038b is DONE.28.9 kB processed.
https://console.cloud.google.com/bigquery?project=platform-dev-285607&j=bq:US:bb049054-a4f0-4d88-b128-b97eb020038b&page=queryresults
Traceback (most recent call last):
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1039, in parse_into
return self._parse(parser, raw_tokens, sql)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1078, in _parse
self.raise_error("Invalid expression / Unexpected token")
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1119, in raise_error
raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 1, Col: 61.
platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/samiabboud/dev/aampe/modeling/src/bq_run.py", line 33, in <module>
model.fit(feature_columns, label_columns)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/base.py", line 162, in fit
return self._fit(X, y)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/linear_model.py", line 136, in _fit
self._bqml_model = self._bqml_model_factory.create_model(
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/ml/core.py", line 245, in create_model
input_data = X_train._cached().join(y_train._cached(), how="outer")
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/log_adapter.py", line 44, in wrapper
return method(*args, **kwargs)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/dataframe.py", line 3045, in _cached
self._set_block(self._block.cached())
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/core/blocks.py", line 1677, in cached
self.session._execute_and_cache(self.expr, cluster_cols=self.index_columns),
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/bigframes/session/__init__.py", line 1479, in _execute_and_cache
table_expression = self.ibis_client.table(
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/ibis/backends/bigquery/__init__.py", line 509, in table
table = sg.parse_one(name, into=sg.exp.Table, read=self.name)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/__init__.py", line 124, in parse_one
result = dialect.parse_into(into, sql, **opts)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/dialects/dialect.py", line 325, in parse_into
return self.parser(**opts).parse_into(expression_type, self.tokenize(sql), sql)
File "/Users/samiabboud/dev/aampe/modeling/venv/lib/python3.9/site-packages/sqlglot/parser.py", line 1044, in parse_into
raise ParseError(
sqlglot.errors.ParseError: Failed to parse 'platform-dev-285607._21e83bdd53455fdc8544000e45591de500adacc2.anon0277dfb0_f1fc_47b2_a519_1493d286435f' into <class 'sqlglot.expressions.Table'>
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 22 (5 by maintainers)
@tswast @chelsea-lin Just tested and seems fixed! Thank you!!
@ZeroCool2u Thanks for the report. My teammate @chelsea-lin was able to determine that there is a bug in sqlglot’s parsing of BigQuery table IDs, which has been reported and hopefully fixed in a future release. In the meantime, I believe bigframes 0.22.0 will have worked around this issue. Could you please try with that version and report back if it is fixed?
In a notebook:
And then restart your notebook runtime.
@chelsea-lin , @tswast sorry for delayed response.
Environment details: environment.txt
Full call stack:
In addition to https://github.com/googleapis/python-bigquery-dataframes/issues/315#issuecomment-1960552049 could you also please run the following cell:
This will help us confirm that the version of bigframes in the notebook aligns with that claimed by
pip
.@uysalfurkan Thank you so much for reporting this issue! I apologize for the delay in responding. To help me track down this dependency mismatch, could you please provide the full call stack? Also, to reproduce the error in our side, would you mind sharing your environment details? You can generate these using the following command: