fastparquet: To Pandas doesn't work with parquet file - Type Error

Hi all, I’m loading some parquet files generated by a Spark ETL job.

I get this error when calling parquet_file.to_pandas().

AttributeError                            Traceback (most recent call last)
<ipython-input-9-7098f6946da6> in <module>()
----> 1 profiles.to_pandas()

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/api.py in to_pandas(self, columns, categories, filters, index, timestamp96)
    332                     self.read_row_group(rg, columns, categories, infile=f,
    333                                         index=index, assign=parts,
--> 334                                         timestamp96=timestamp96)
    335                     start += rg.num_rows
    336         else:

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/api.py in read_row_group(self, rg, columns, categories, infile, index, assign, timestamp96)
    184                 infile, rg, columns, categories, self.schema, self.cats,
    185                 self.selfmade, index=index, assign=assign,
--> 186                 timestamp96=timestamp96, sep=self.sep)
    187         if ret:
    188             return df

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/core.py in read_row_group(file, rg, columns, categories, schema_helper, cats, selfmade, index, assign, timestamp96, sep)
    336         raise RuntimeError('Going with pre-allocation!')
    337     read_row_group_arrays(file, rg, columns, categories, schema_helper,
--> 338                           cats, selfmade, assign=assign, timestamp96=timestamp96)
    339 
    340     for cat in cats:

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/core.py in read_row_group_arrays(file, rg, columns, categories, schema_helper, cats, selfmade, assign, timestamp96)
    313                  selfmade=selfmade, assign=out[name],
    314                  catdef=out[name+'-catdef'] if use else None,
--> 315                  timestamp96=mr)
    316 
    317         if _is_map_like(schema_helper, column):

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/core.py in read_col(column, schema_helper, infile, use_cat, grab_dict, selfmade, assign, catdef, timestamp96)
    237             skip_nulls = False
    238         defi, rep, val = read_data_page(infile, schema_helper, ph, cmd,
--> 239                                         skip_nulls, selfmade=selfmade)
    240         if rep is not None and assign.dtype.kind != 'O':  # pragma: no cover
    241             # this should never get called

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/core.py in read_data_page(f, helper, header, metadata, skip_nulls, selfmade)
    103                                            dtype=np.uint8))
    104 
--> 105     repetition_levels = read_rep(io_obj, daph, helper, metadata)
    106 
    107     if skip_nulls and not helper.is_required(metadata.path_in_schema):

/home/springcoil/miniconda3/envs/py35/lib/python3.5/site-packages/fastparquet/core.py in read_rep(io_obj, daph, helper, metadata)
     83             metadata.path_in_schema)
     84         bit_width = encoding.width_from_max_int(max_repetition_level)
---> 85         repetition_levels = read_data(io_obj, daph.repetition_level_encoding,
     86                                       daph.num_values,
     87                                       bit_width)[:daph.num_values]

AttributeError: 'NoneType' object has no attribute 'repetition_level_encoding'```


Has anyone seen anything like this before?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 5
  • Comments: 41 (18 by maintainers)

Most upvoted comments

I have the same problem here.

OK, so: there appear to be multiple dictionary pages, which is not supposed to happen, but I can deal with. Also, the encoding is “bit-packed (deprecated)”, which, as the name suggests, is not supposed to be around. I can maybe code it up, since the spec is well-stated, and I can compare the result against ground-truth as given by spark. I’ll get back to you.