word_cloud: KeyError: 'foo' building simple word cloud with spaces in words
Description
Attempting to build a word cloud from a file containing one phrase per line results in KeyError.
Steps/Code to Reproduce
$ echo 'Foo Bar\nFoo Bar\n' | rwt wordcloud -- -m wordcloud --imagefile authors.png --regexp '[ \w]+'
Collecting wordcloud
Using cached https://files.pythonhosted.org/packages/c7/07/e43a7094a58e602e85a09494d9b99e7b5d71ca4789852287386e21e74c33/wordcloud-1.5.0-cp37-cp37m-macosx_10_6_x86_64.whl
Collecting numpy>=1.6.1 (from wordcloud)
Using cached https://files.pythonhosted.org/packages/a8/e1/838e35e6f44e2bc19bf902e945c34ae730a33a8e346d6208bf7c4d751416/numpy-1.15.3-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Collecting pillow (from wordcloud)
Using cached https://files.pythonhosted.org/packages/99/c8/550f3416afe7b6726efc8a7f2249a38d6ae65c3514ef6c36bdc8485868b7/Pillow-5.3.0-cp37-cp37m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl
Installing collected packages: numpy, pillow, wordcloud
Successfully installed numpy-1.15.3 pillow-5.3.0 wordcloud-1.5.0
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/__main__.py", line 37, in <module>
main()
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/__main__.py", line 33, in main
wordcloud_cli_main(*wordcloud_cli_parse_args(sys.argv[1:]))
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/wordcloud_cli.py", line 89, in main
wordcloud.generate(text)
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/wordcloud.py", line 605, in generate
return self.generate_from_text(text)
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/wordcloud.py", line 586, in generate_from_text
words = self.process_text(text)
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/wordcloud.py", line 563, in process_text
word_counts = unigrams_and_bigrams(words, self.normalize_plurals)
File "/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/rwt-21h77osc/wordcloud/tokenization.py", line 55, in unigrams_and_bigrams
word1 = standard_form[bigram[0].lower()]
KeyError: 'foo'
Expected Results
A wordcloud should be generated with “Foo Bar” as the only phrase.
Versions
Darwin-18.0.0-x86_64-i386-64bit
Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 03:13:28)
[Clang 6.0 (clang-600.0.57)]
Other versions appear above.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 24 (11 by maintainers)
Regex probably needs to include
-
. I’ll retry. Yes, that’s better.git -C ~/p/word_cloud log --format=%an | python -m pip-run -q wordcloud -- -m wordcloud --imagefile authors.png --regexp r'[ \w\.\-]+' --no_collocations
I confirm 😃
On Mon, Nov 5, 2018, 4:34 PM Andreas Mueller <notifications@github.com wrote:
I just ran this command:
git -C ~/p/word_cloud log --format=%an | python -m pip-run -q wordcloud -- -m wordcloud --imagefile authors.png --regexp r'[ \w\.]+' --no_collocations
and it produced:
So that confirms
--no_collocations
works.😕
Providing a runpy entry point is really valuable, and making it a simple name like
wordcloud
is really convenient… especially when theconsole_scripts
likewordcloud_cli
aren’t available. I’d say it should be supported, and was pleased when I found out it is (ref).