tensorflow: third_party/icu/data missing big-endian conversion data files

Click to expand!

Issue Type

Build/Install

Have you reproduced the bug with TF nightly?

Yes

Source

source

Tensorflow Version

tf 2.11

Custom Code

Yes

OS Platform and Distribution

Linux Ubuntu 20.04 s390x

Mobile device

No response

Python version

3.10

Bazel version

5.3.0

GCC/Compiler version

gcc 9

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

Two of the unicode unit tests fail on s390x due to the icu_conversion_data.c.gz.* files being in little-endian format. 

Could big-endian formatted versions of the icu_conversion_data.c.gz.* files be added so that they can be selected when building on s390x?

Standalone code to reproduce the issue

The unit tests:

* //tensorflow/python/kernel_tests/strings_ops:unicode_decode_op_test
* //tensorflow/python/kernel_tests/strings_ops:unicode_transcode_op_test

fail on s390x.

Relevant log output

======================================================================
ERROR: testDecodeWithDifferentEncodings5 ('SHIFT-JIS', ['Hello', 'こんにちは']) (__main__.UnicodeDecodeTest)
UnicodeDecodeTest.testDecodeWithDifferentEncodings5 ('SHIFT-JIS', ['Hello', 'こんにちは'])
testDecodeWithDifferentEncodings('SHIFT-JIS', ['Hello', 'こんにちは'])
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1378, in _do_call
    return fn(*args)
  File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1361, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1454, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not create converter for input encoding: SHIFT-JIS
         [[{{node UnicodeDecode/UnicodeDecode}}]]

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 19 (14 by maintainers)

Most upvoted comments

I’m starting to think that the solution here might be to just generate the conversion files at build time, instead of storing them in the repo. There are several requirements that could be pushing towards that

Unfortunately I not longer work in TF (left the team nearly a year ago). I am consulting from time to time (especially on security topics, but also on OSS), and might contribute here and there with some PRs, but my main bulk of work is no longer in TF. So it will take some time until I can handle this.