tensorflow: third_party/icu/data missing big-endian conversion data files
Click to expand!
Issue Type
Build/Install
Have you reproduced the bug with TF nightly?
Yes
Source
source
Tensorflow Version
tf 2.11
Custom Code
Yes
OS Platform and Distribution
Linux Ubuntu 20.04 s390x
Mobile device
No response
Python version
3.10
Bazel version
5.3.0
GCC/Compiler version
gcc 9
CUDA/cuDNN version
No response
GPU model and memory
No response
Current Behaviour?
Two of the unicode unit tests fail on s390x due to the icu_conversion_data.c.gz.* files being in little-endian format.
Could big-endian formatted versions of the icu_conversion_data.c.gz.* files be added so that they can be selected when building on s390x?
Standalone code to reproduce the issue
The unit tests:
* //tensorflow/python/kernel_tests/strings_ops:unicode_decode_op_test
* //tensorflow/python/kernel_tests/strings_ops:unicode_transcode_op_test
fail on s390x.
Relevant log output
======================================================================
ERROR: testDecodeWithDifferentEncodings5 ('SHIFT-JIS', ['Hello', 'こんにちは']) (__main__.UnicodeDecodeTest)
UnicodeDecodeTest.testDecodeWithDifferentEncodings5 ('SHIFT-JIS', ['Hello', 'こんにちは'])
testDecodeWithDifferentEncodings('SHIFT-JIS', ['Hello', 'こんにちは'])
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1378, in _do_call
return fn(*args)
File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1361, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/home/jalbrecht/.cache/bazel/_bazel_jalbrecht/d88fb95a52e214eb99836e3d1a65a951/execroot/org_tensorflow/bazel-out/s390x-opt/bin/tensorflow/python/kernel_tests/strings_ops/unicode_decode_op_test.runfiles/org_tensorflow/tensorflow/python/client/session.py", line 1454, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Could not create converter for input encoding: SHIFT-JIS
[[{{node UnicodeDecode/UnicodeDecode}}]]
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 19 (14 by maintainers)
I’m starting to think that the solution here might be to just generate the conversion files at build time, instead of storing them in the repo. There are several requirements that could be pushing towards that
Unfortunately I not longer work in TF (left the team nearly a year ago). I am consulting from time to time (especially on security topics, but also on OSS), and might contribute here and there with some PRs, but my main bulk of work is no longer in TF. So it will take some time until I can handle this.