pudl: Data irregularities cause epacems_to_parquet to fail
After (apparently) successfully running the new data package based ETL process on the full EPA CEMS dataset (all years, all states), I tried to run the epacems_to_parquet
script, but it encountered errors, and ultimately failed. Several errors were of the type:
sys:1: DtypeWarning: Columns (8,10,12,14) have mixed types. Specify dtype option on import or set low_memory=False.
But the thing that crashed it eventually was:
Traceback (most recent call last):
File "/home/zane/miniconda3/envs/pudl-dev/bin/epacems_to_parquet", line 11, in <module>
load_entry_point('catalystcoop.pudl', 'console_scripts', 'epacems_to_parquet')()
File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 319, in main
clobber=args.clobber)
File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 205, in epacems_to_parquet
df = year_from_operating_datetime(df).astype(IN_DTYPES)
File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 123, in year_from_operating_datetime
df['year'] = df.operating_datetime_utc.dt.year
File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/generic.py", line 5175, in __getattr__
return object.__getattribute__(self, name)
File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/accessor.py", line 175, in __get__
accessor_obj = self._accessor(obj)
File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 343, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike " "values")
AttributeError: Can only use .dt accessor with datetimelike values
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (12 by maintainers)
I wrote some code that should address this, but I’ll test it later this week before creating a PR.