datasets: Command tfds.as_dataframe fails to make dataframe

Short description When I call tfds.as_dataframe, it gives the error below.

Environment information

  • Operating System: Ubuntu 20.04

  • Python version: 3.8

  • tensorflow-datasets/tfds-nightly version: tfds-nightly v3.2.1.dev202009090105

  • tensorflow/tf-nightly version: tensorflow v2.3

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ? Yes

Reproduction instructions

import tensorflow.compat.v2 as tf
import tensorflow_datasets as tfds
import pandas as pd

tfds.disable_progress_bar()
tf.enable_v2_behavior()

(ds_train, ds_test), ds_info = tfds.load(
    'mnist',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

def normalize_img(image, label):
  """Normalizes images: `uint8` -> `float32`."""
  return tf.cast(image, tf.float32) / 255., label

ds_test = ds_test.map(
    normalize_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
ds_test = ds_test.batch(128)
ds_test = ds_test.cache()
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

df = tfds.as_dataframe(ds_test.take(10), ds_info)

Link to logs

Traceback (most recent call last):
  File "mnist_test.py", line 31, in <module>
    df = tfds.as_dataframe(ds_test.take(10), ds_info)
  File "/home/ubuntu/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow_datasets/core/as_dataframe.py", line 192, in as_dataframe
    columns = _make_columns(ds.element_spec, ds_info=ds_info)
  File "/home/ubuntu/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow_datasets/core/as_dataframe.py", line 148, in _make_columns
    return [
  File "/home/ubuntu/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow_datasets/core/as_dataframe.py", line 149, in <listcomp>
    ColumnInfo.from_spec(path, ds_info)
  File "/home/ubuntu/miniconda3/envs/tensorflow/lib/python3.8/site-packages/tensorflow_datasets/core/as_dataframe.py", line 61, in from_spec
    name = '/'.join(path)
TypeError: sequence item 0: expected str instance, int found

Expected behavior Conversion from dataset to into a pandas dataframe

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 3
  • Comments: 15 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Please access this file: C:\Users\elver\PycharmProjects\ds_to_csv\venv\lib\site-packages\tensorflow_datasets\core\as_dataframe.py And cast path into str: name = ‘/’.join(map(str,path))

When as_supervised=True is used, ds_info must be fed to the DataFrame, hence the “str-int” error. If you don’t use the flag as_supervised, it’s up to you to pass ds_info to the DataFrame or not, no error for that. Tested on: TF 2.6.0.