tensorflow: Cannot download celeb_a dataset from tensorflow_datasets :(
Running this
(train_data, test_data), info = tfds.load(name = 'celeb_a', split = ['train', 'test'], as_supervised = True, shuffle_files = True, with_info = True)
Gives this
NonMatchingChecksumError: Artifact https://drive.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM, downloaded to /root/tensorflow_datasets/downloads/ucexport_download_id_0B7EVK8r0v71pZjFTYXZWM3FlDDaXUAQO8EGH_a7VqGNLRtW52mva1LzDrb-V723OQN8.tmp.4ec0de7ede1541dca88a21190e298882/uc, has wrong checksum.
As far as I know, this issue is there because the dataset is on the google drive.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 3
- Comments: 17 (1 by maintainers)
You can download from source if you’re getting the same problem.
I think I have solved this problem (tested with TFDS V 4.9.2). First, download the source code. Actually I downloaded the whole celeb_a directory. Then you will see Line 31 in the file
celeb_a_dataset_builder.py
isimport tensorflow_datasets.public_api as tfds
. Here I changed this line withimport tensorflow_datasets as tfds
Next, build this dataset manually. Open your terminal and cd toceleb_a
directory where you download the source code and run commondtfds build celeb_a
. Surprisingly, this dataset will be downloaded automatically : )Posting an alternative solution here since (1) automatically downloading CelebA still doesn’t work, and (2) I found the above solution too manual. This solution (tested with TFDS v3.2.1) simply overrides a method of the CelebA dataset builder to use manually downloaded data.
Step 1. Manually download data. This should get you the files
{DATA_DIR}/img_align_celeba.zip
,{DATA_DIR}/list_eval_partition.txt
,{DATA_DIR}/list_landmarks_align_celeba.txt
, and{DATA_DIR}/list_attr_celeba.txt
. The following links are provided in the source code for the CelebA dataset builder.Step 2. Override the
_split_generators
function of the CelebA builder class. Then calldownload_and_prepare
. Full code is provided below.You can now call
tfds.load('celeb_a')
orbuilder = tfds.builder('celeb_a'); builder.download_and_prepare()
and reuse the prepared dataset.Hi there, I think there is a way and I’ve managed to do what I wanted. Here’s my code:-
With this code, everything is working just fine for now. Also, do share your opinion for reading the data based on the above code, like performance and memory wise, and some modifications that I should make if any. And if everything is okay, then please add this example to the official tensorflow documentation, it’d very helpful for others as well.
Now one thing I’d still want to achieve, in the the
augment
function, I can’t get the dimensions of the image, like if I runinside this
augment
function to get the minimum of height and width, then it gives me<unknown>
. I want to replace this line“
image = tf.image.random_crop(image, (178, 178, 3))
” with thisI tried removed the
@tf.function
decorator, but still not working.So is a there a way I can get this last part done?
Thanks for your time.