pudl: Data irregularities cause epacems_to_parquet to fail

After (apparently) successfully running the new data package based ETL process on the full EPA CEMS dataset (all years, all states), I tried to run the epacems_to_parquet script, but it encountered errors, and ultimately failed. Several errors were of the type:

sys:1: DtypeWarning: Columns (8,10,12,14) have mixed types. Specify dtype option on import or set low_memory=False.

But the thing that crashed it eventually was:

Traceback (most recent call last):
  File "/home/zane/miniconda3/envs/pudl-dev/bin/epacems_to_parquet", line 11, in <module>
    load_entry_point('catalystcoop.pudl', 'console_scripts', 'epacems_to_parquet')()
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 319, in main
    clobber=args.clobber)
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 205, in epacems_to_parquet
    df = year_from_operating_datetime(df).astype(IN_DTYPES)
  File "/home/zane/pudl/src/pudl/convert/epacems_to_parquet.py", line 123, in year_from_operating_datetime
    df['year'] = df.operating_datetime_utc.dt.year
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/generic.py", line 5175, in __getattr__
    return object.__getattribute__(self, name)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/accessor.py", line 175, in __get__
    accessor_obj = self._accessor(obj)
  File "/home/zane/miniconda3/envs/pudl-dev/lib/python3.7/site-packages/pandas/core/indexes/accessors.py", line 343, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike " "values")
AttributeError: Can only use .dt accessor with datetimelike values

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (12 by maintainers)

Most upvoted comments

I wrote some code that should address this, but I’ll test it later this week before creating a PR.