tiktoken: pyinstaller has some bug that results in improper packaging of tiktoken

What could be the fix for this error. I am trying out the library for the first time.

import tiktoken
enc = tiktoken.get_encoding("gpt2")
assert enc.decode(enc.encode("hello world")) == "hello world"
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [47], in <cell line: 2>()
      1 import tiktoken
----> 2 enc = tiktoken.get_encoding("gpt2")
      3 assert enc.decode(enc.encode("hello world")) == "hello world"

File ~/work/p3ds/lib/python3.10/site-packages/tiktoken/registry.py:60, in get_encoding(encoding_name)
     57     assert ENCODING_CONSTRUCTORS is not None
     59 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 60     raise ValueError(f"Unknown encoding {encoding_name}")
     62 constructor = ENCODING_CONSTRUCTORS[encoding_name]
     63 enc = Encoding(**constructor())

ValueError: Unknown encoding gpt2


About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22

Commits related to this issue

Most upvoted comments

I have solved it using methods below:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.

2.delete the code with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module “blobfile” __init__.py

Hope it works on you.

It looks like some of the issue here is the blobfile dependency. Most people won’t need that; I can make that an optional dependency.

I haven’t ever used pyinstaller, sounds like there’s a bug in it? The tiktoken distribution on PyPI contains two packages, tiktoken and tiktoken_ext and needs both of them for tiktoken.get_encoding("gpt2") to work.

Maybe see if pyinstaller people know what the issue is. I’m willing to make minor adjustments to how tiktoken specifies packaging metadata to support the use case.

A simplier solution. Just add these lines to your code which imports tiktoken.

from tiktoken_ext import openai_public
import tiktoken_ext

Doesn’t need the __init__.py, tiktoken_ext is a namespace package. We use this to allow extensibility, e.g. see https://github.com/openai/tiktoken#extending-tiktoken

I have solved it using methods below:我已经使用以下方法解决了它:

1.Add --hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_ext when you use pyinstaller to make it executable.当您使用 pyinstaller 使其可执行时。

2.delete the code 2.删除代码 with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip() in module “blobfile” __init__.py在模块“blobfile” __init__.py

Hope it works on you. 希望它对你有用。

thank you very much! I solve my problem!

I’ve made blobfile an optional dependency in 0.3.1.

Based on Jeremy-ttt’s message, it sounds like the rest of this can be handled by pyinstaller’s --hidden-import.

Let me know if there’s anything else I can do here, if not, I’ll close this issue soon.