datasets: ImportError: cannot import name 'array_record_module' from 'array_record.python'

/!\ PLEASE INCLUDE THE FULL STACKTRACE AND CODE SNIPPET

Short description Description of the bug.

Environment information

  • Operating System: Win64

  • Python version: 3.8

  • tensorflow-datasets/tfds-nightly version: <package and version>4.8.3

  • tensorflow/tf-nightly version: <package and version>

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?yes

Link to logs

Traceback (most recent call last):
  File "flan/v2/run_example.py", line 10, in <module>
    import seqio
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\seqio\__init__.py", line 18, in <module>
    from seqio.dataset_providers import *
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\seqio\dataset_providers.py", line 34, in <module>
    from seqio import utils
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\seqio\utils.py", line 25, in <module>
    import tensorflow_datasets as tfds
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\__init__.py", line 43, in <module>
    import tensorflow_datasets.core.logging as _tfds_logging
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\__init__.py", line 22, in <module>
    from tensorflow_datasets.core import community
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\community\__init__.py", line 18, in <module>
    from tensorflow_datasets.core.community.huggingface_wrapper import mock_builtin_to_use_gfile
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\community\huggingface_wrapper.py", line 31, in <module>
    from tensorflow_datasets.core import dataset_builder
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\dataset_builder.py", line 34, in <module>
    from tensorflow_datasets.core import dataset_info
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\dataset_info.py", line 46, in <module>
    from tensorflow_datasets.core import file_adapters
  File "D:\anaconda\envs\zzc_flan\lib\site-packages\tensorflow_datasets\core\file_adapters.py", line 29, in <module>
    from array_record.python import array_record_module
ImportError: cannot import name 'array_record_module' from 'array_record.python' (D:\anaconda\envs\zzc_flan\lib\site-packages\array_record\python\__init__.py)```


**Expected behavior*
How to solve this bug?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 9
  • Comments: 32

Most upvoted comments

Basically get the same error here

Environment information

  • Operating System: Windows 10, x86_64
  • Ryzen 5 5600x CPU. gtx 760 gpu (not used)
  • Python version: 3.9
  • tensorflow-datasets version: 4.9.0
  • tensorflow version: 2.12.0

Clean install + downgrading tfds to tensorflow-datasets==4.8.3 fixes the error for me. Clean install + tfds==4.9 still gives the error.

I solved by downgrading to a lower version namely version 4.8.3 instead of 4.9.0. It’s amazing that it worked

TFDS 4.9.1 is out with a fix for the installation on macOS.

pip install --upgrade tensorflow-datasets==4.9.1

We did a post-mortem. The main outputs are:

  • We are going to integrate macos-latest and windows-latest to our CI/CD in Github Actions, as we see more users on these platforms.
  • We are going to make ArrayRecord compatible with windows/macos in the future releases.

I am closing the issue, but will happily re-open it in case we’re missing anything. Thank you all for your understanding!

Basically get the same error here

Environment information

  • Operating System: Windows 10, x86_64
  • Ryzen 5 5600x CPU. gtx 760 gpu (not used)
  • Python version: 3.9
  • tensorflow-datasets version: 4.9.0
  • tensorflow version: 2.12.0

Clean install + downgrading tfds to tensorflow-datasets==4.8.3 fixes the error for me. Clean install + tfds==4.9 still gives the error.

This solved it for me, just a single command!

pip install -U tensorflow-datasets==4.8.3

Hi all, thank you for reporting the issue.

This is due to macos/windows platforms not being supported by the ArrayRecord dependency (https://github.com/google/array_record). This dependency is core to one of our new features. The issue https://github.com/tensorflow/datasets/issues/4852 also tracks the progress on this problem.

The PR https://github.com/tensorflow/datasets/pull/4856 should fix the issue for all macOS users by lazily loading ArrayRecord.

You can confirm it works locally on your side with:

git clone https://github.com/tensorflow/datasets /tmp/datasets
python3 -m pip install -e /tmp/datasets
python3 -c 'import tensorflow_datasets as tfds; tfds.load("mnist")'

If so, we will deploy the fix by end of today in TFDS 4.9.1.

Thanks for your understanding!

Does this also take care of the issue within windows WSL too? I noticed that you mentioned that the fix was available for the installation on macOS, however, it was also noted earlier that the bug affected both win64 and macOS. Please advise. Thank you.

. Either that’s another bug (a versioning fail, on TF’s part), or because of being nightly. Does anybody care to explain? Thanks!

I’m not sure why you think this is a “versioning fail”. The nightly build contained pulls that ended up in the next version, which is totally normal. The nightly build doesn’t have its version bumped until they are ready to do a release, which is also normal.

Doesn’t look like a GPU error then, it could just be the VRAM/RAM

I don’t think that RAM or GPU are relevant to this issue.

That’s why I asked, thank you for your answer 😃

I agree, it might’ve been an initial coincidence.

Thanks @9characters
worked for me as well… Although I had to restart Pycharm after downgrading the version to 4.8.3 !!

I solved by downgrading to a lower version namely version 4.8.3 instead of 4.9.0. It’s amazing that it worked

Worked for me as well! Thank you.

I have Mac os m1. Does somebody know how to resolve that issue?