tokenizers: The Conda package doesn't work on CentOS 7 and Ubuntu 18.04

import tokenizers doesn’t work on CentOS 7 (and RHEL 7), because it has glibc 2.17, while the Conda package (at least for Python 3.8) was compiled against a newer version (the error says version 'GLIBC_2.18' not found).

I believe CentOS and RHEL 7 are both still popular, though I can’t back up this claim (I can’t find a link). My university cluster uses it at least. And note version 7 is the latest CentOS version with long-term support.

It does work from PyPI’s version, so I just use that one. However, it’d be good it worked from Conda. So it seems it’s related to the glibc version specified by Conda, while the one provided by ubuntu-latest (Ubuntu 18.04 as of today, IIUC) seems to work fine. Conda docs say they do provide their own glibc version. Because there isn’t such a libc package in Conda, they provide a solution via virtual packages. So it seems that all that’s needed is to set an env var like:

CONDA_OVERRIDE_GLIBC=2.17

(note Rust dynamically links the compiled binary to the available glibc version at build time).

Could you take a look at it? Not sure how it can be tested w/o pushing a release.

Related issues (some people also mention CentOS 7):

Update: it’s got a bit worse, as now it requires a newer GLIBC version:

/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29’ not found

It’s not working for me on Ubuntu 18.04 (GLIBC 2.27).

What this article details may be the solution for it: Building Rust binaries in CI that work with older GLIBC

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 26
  • Comments: 15

Commits related to this issue

Most upvoted comments

Same issue on Ubuntu 18.04.5 LTS. Ubuntu 18.04’s latest GLIBC version is 2.27. Conda-installed tokenizers (through transformers installation) version 0.10.2 requires GLIBC 2.29.

As a workaround, I’ve installed the previous tokenizers version, and everything works fine now:

conda install -c huggingface tokenizers=0.10.1 transformers=4.4.2

use pip instead of conda:

conda uninstall tokenizers, transformers
pip install transformers

Same issue on Ubuntu 18.04.5 LTS. Ubuntu 18.04’s latest GLIBC version is 2.27. Conda-installed tokenizers (through transformers installation) version 0.10.2 requires GLIBC 2.29.

Same issue for me on Ubuntu 18.04. I just changed to pip installation, which seems to be working.