private-gpt: python ingest.py error:install pypandoc wheels with included pandoc.
Macos 13.4/ intel i7
python -V Python 3.10.11
$ pip list
Package Version
----------------------- -----------
aiohttp 3.8.4
aiosignal 1.3.1
anyio 3.6.2
argilla 1.7.0
async-timeout 4.0.2
attrs 23.1.0
backoff 2.2.1
beautifulsoup4 4.12.2
certifi 2023.5.7
cffi 1.15.1
chardet 5.1.0
charset-normalizer 3.1.0
chromadb 0.3.22
click 8.1.3
clickhouse-connect 0.5.24
colorclass 2.2.2
commonmark 0.9.1
compressed-rtf 1.0.6
cryptography 40.0.2
dataclasses-json 0.5.7
Deprecated 1.2.13
duckdb 0.7.1
easygui 0.98.3
ebcdic 1.1.1
et-xmlfile 1.1.0
extract-msg 0.41.1
fastapi 0.95.1
filelock 3.12.0
frozenlist 1.3.3
fsspec 2023.5.0
greenlet 2.0.2
h11 0.14.0
hnswlib 0.7.0
httpcore 0.16.3
httptools 0.5.0
httpx 0.23.3
huggingface-hub 0.14.1
idna 3.4
IMAPClient 2.3.1
Jinja2 3.1.2
joblib 1.2.0
langchain 0.0.166
lark-parser 0.12.0
llama-cpp-python 0.1.48
lxml 4.9.2
lz4 4.3.2
Markdown 3.4.3
MarkupSafe 2.1.2
marshmallow 3.19.0
marshmallow-enum 1.5.1
monotonic 1.6
mpmath 1.3.0
msg-parser 1.2.0
msoffcrypto-tool 5.0.1
multidict 6.0.4
mypy-extensions 1.0.0
networkx 3.1
nltk 3.8.1
numexpr 2.8.4
numpy 1.23.5
olefile 0.46
oletools 0.60.1
openapi-schema-pydantic 1.2.4
openpyxl 3.1.2
packaging 23.1
pandas 1.5.3
pandoc 2.3
pcodedmp 1.2.6
pdfminer.six 20221105
Pillow 9.5.0
pip 23.1.2
plumbum 1.8.1
ply 3.11
posthog 3.0.1
pycparser 2.21
pydantic 1.10.7
Pygments 2.15.1
pygpt4all 1.1.0
pygptj 2.0.3
pyllamacpp 2.1.3
pypandoc 1.11
pyparsing 2.4.7
python-dateutil 2.8.2
python-docx 0.8.11
python-dotenv 1.0.0
python-magic 0.4.27
python-pptx 0.6.21
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML 6.0
red-black-tree-mod 1.20
regex 2023.5.5
requests 2.30.0
rfc3986 1.5.0
rich 13.0.1
RTFDE 0.0.2
scikit-learn 1.2.2
scipy 1.10.1
sentence-transformers 2.2.2
sentencepiece 0.1.99
setuptools 67.7.2
six 1.16.0
sniffio 1.3.0
soupsieve 2.4.1
SQLAlchemy 2.0.13
starlette 0.26.1
sympy 1.12
tabulate 0.9.0
tenacity 8.2.2
threadpoolctl 3.1.0
tokenizers 0.13.3
torch 2.0.1
torchvision 0.15.2
tqdm 4.65.0
transformers 4.29.1
typer 0.9.0
typing_extensions 4.5.0
typing-inspect 0.8.0
tzdata 2023.3
tzlocal 4.2
unstructured 0.6.5
urllib3 2.0.2
uvicorn 0.22.0
uvloop 0.17.0
watchfiles 0.19.0
websockets 11.0.3
wheel 0.40.0
wrapt 1.14.1
XlsxWriter 3.1.0
yarl 1.9.2
zstandard 0.21.0
$ python ingest.py
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 2%|▎ | 2/131 [00:03<03:12, 1.50s/it][nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /Users/51pwn/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
Loading new documents: 2%|▎ | 2/131 [00:04<05:13, 2.43s/it]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 89, in load_single_document
return loader.load()[0]
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/langchain/document_loaders/unstructured.py", line 70, in load
elements = self._get_elements()
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/langchain/document_loaders/epub.py", line 22, in _get_elements
return partition_epub(filename=self.file_path, **self.unstructured_kwargs)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/partition/epub.py", line 24, in partition_epub
return convert_and_partition_html(
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/partition/html.py", line 124, in convert_and_partition_html
html_text = convert_file_to_html_text(source_format=source_format, filename=filename, file=file)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/file_utils/file_conversion.py", line 44, in convert_file_to_html_text
html_text = convert_file_to_text(
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/unstructured/file_utils/file_conversion.py", line 12, in convert_file_to_text
text = pypandoc.convert_file(filename, target_format, format=source_format)
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/pypandoc/__init__.py", line 168, in convert_file
return _convert_input(discovered_source_files, format, 'path', to, extra_args=extra_args,
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/pypandoc/__init__.py", line 324, in _convert_input
_ensure_pandoc_path()
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/site-packages/pypandoc/__init__.py", line 750, in _ensure_pandoc_path
raise OSError("No pandoc was found: either install pandoc and add it\n"
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 167, in <module>
main()
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 157, in main
texts = process_documents()
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 119, in process_documents
documents = load_documents(source_directory, ignored_files)
File "/Users/51pwn/MyWork/privateGPT/ingest.py", line 108, in load_documents
for i, doc in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Users/51pwn/anaconda3/envs/privateGPT/lib/python3.10/multiprocessing/pool.py", line 873, in next
raise value
OSError: No pandoc was found: either install pandoc and add it
to your PATH or or call pypandoc.download_pandoc(...) or
install pypandoc wheels with included pandoc.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17
Problem solved with :
pip install pypandoc-binary
(no need to worry about the PATH with this command)
I ran into the same issue. You have to install pandoc, and add it to your PATH.
I’m on Win10. I ran the following py script. It will download and run the pandoc installer. Then add “C:\Users\Username\AppData\Local\Pandoc” to your PATH. That’s where mine got installed. Yours might be different.
I can run normally on Macos Inteli7, and this is my operation to share with everyone
However, I found that using it to try making AI search engines still falls far short of expectations
I’ve found solution. I need to install pandoc with
brew
firstbrew install pandoc
More details: https://pandoc.org/installing.html