tiktoken: pyinstaller has some bug that results in improper packaging of tiktoken
What could be the fix for this error. I am trying out the library for the first time.
import tiktoken
enc = tiktoken.get_encoding("gpt2")
assert enc.decode(enc.encode("hello world")) == "hello world"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [47], in <cell line: 2>()
1 import tiktoken
----> 2 enc = tiktoken.get_encoding("gpt2")
3 assert enc.decode(enc.encode("hello world")) == "hello world"
File ~/work/p3ds/lib/python3.10/site-packages/tiktoken/registry.py:60, in get_encoding(encoding_name)
57 assert ENCODING_CONSTRUCTORS is not None
59 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 60 raise ValueError(f"Unknown encoding {encoding_name}")
62 constructor = ENCODING_CONSTRUCTORS[encoding_name]
63 enc = Encoding(**constructor())
ValueError: Unknown encoding gpt2
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22
Commits related to this issue
- Add necessary hidden-imports (see: https://github.com/openai/tiktoken/issues/43) — committed to refstudio/refstudio by gjreda 10 months ago
- feat: Add support for Ollama as a local LLM via litellm (#525) * Adopt litellm for rewrite - phase 1 * Adopt litellm * Improve OpenAiSettingsPane showing options * Code cleanup * tests pa... — committed to refstudio/refstudio by cguedes 10 months ago
I have solved it using methods below:
1.Add
--hidden-import=tiktoken_ext.openai_public --hidden-import=tiktoken_extwhen you use pyinstaller to make it executable.2.delete the code
with open(os.path.join(_SCRIPT_DIR, "VERSION")) as _version_file: __version__ = _version_file.read().strip()in module “blobfile”__init__.pyHope it works on you.
It looks like some of the issue here is the blobfile dependency. Most people won’t need that; I can make that an optional dependency.
I haven’t ever used
pyinstaller, sounds like there’s a bug in it? Thetiktokendistribution on PyPI contains two packages,tiktokenandtiktoken_extand needs both of them fortiktoken.get_encoding("gpt2")to work.Maybe see if pyinstaller people know what the issue is. I’m willing to make minor adjustments to how tiktoken specifies packaging metadata to support the use case.
A simplier solution. Just add these lines to your code which imports
tiktoken.Doesn’t need the
__init__.py, tiktoken_ext is a namespace package. We use this to allow extensibility, e.g. see https://github.com/openai/tiktoken#extending-tiktokenthank you very much! I solve my problem!
I’ve made blobfile an optional dependency in 0.3.1.
Based on Jeremy-ttt’s message, it sounds like the rest of this can be handled by pyinstaller’s
--hidden-import.Let me know if there’s anything else I can do here, if not, I’ll close this issue soon.