nltk: text.generate() does not exist, but is still referenced

I’ve been trying to follow along with the Natural Language Processing book, but right in the first chapter I’m coming across some issues. After importing everything from nltk.book, my first thought was to try text3.generate() as was demonstrated in one of the examples. Of course, I got a lovely AttributeError because the Text class apparently doesn’t have that method in the NLTK that I installed. Furthermore, even running nltk.text.demo() tries to do generated text - and returns the same error. Of course, I couldn’t find any documentation for the generate() method, so I’m assuming it was removed; if that’s the case, you should remove references to it from nltk.text.demo() and from the textbook.

I’m using Python 2.7.8 with NLTK 3.0.0b1 (which was the version available via Windows installer package from PyPI at the time of this writing). The text3.generate() example is in both the old and current versions of the textbook.

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Reactions: 3
  • Comments: 28 (8 by maintainers)

Most upvoted comments

This is still referred to in Chapter one of the book: http://www.nltk.org/book/ch01.html

It’s no biggy, but I’m probably typical in spending a few minutes googling this as an issue. Those few minutes times however many people are working their way through the book… 😃

Bump, generate() references are still there.

In my opinion, I totally agree with you guys to remove (generate() references).

I see two issues with the current code (besides the fact that it does not work), user-experience-wise:

  1. In the book, the method does not take any parameters (at least when it first appears in the book), but the current signature requires a words parameter. So when you follow the book, what you get is the following:

    >>> text1.generate()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: generate() missing 1 required positional argument: 'words'
    

    This is not very user-friendly. I suggest giving words a default value in the signature (e.g. words=None) to avoid this issue.

  2. Using the regular Python console following the book instructions I do not see the DeprecationWarning that the code logs. This is what I did from the beginning:

    [adrian@chakra temporal]$ mkdir nltk
    [adrian@chakra temporal]$ cd nltk/
    [adrian@chakra nltk]$ python3 -m venv venv
    [adrian@chakra nltk]$ . venv/bin/activate
    (venv) [adrian@chakra nltk]$ pip install nltk
    Collecting nltk
    Using cached nltk-3.2.2.tar.gz
    Collecting six (from nltk)
    Using cached six-1.10.0-py2.py3-none-any.whl
    Installing collected packages: six, nltk
    Running setup.py install for nltk ... done
    Successfully installed nltk-3.2.2 six-1.10.0
    You are using pip version 8.1.1, however version 9.0.1 is available.
    You should consider upgrading via the 'pip install --upgrade pip' command.
    (venv) [adrian@chakra nltk]$ python
    Python 3.5.2 (default, Jan 18 2017, 23:05:33) 
    [GCC 5.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import nltk
    >>> # Download the book resources, which requires GUI interaction.
    ... 
    >>> nltk.download()
    showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
    True
    >>> from nltk.book import *
    *** Introductory Examples for the NLTK Book ***
    Loading text1, ..., text9 and sent1, ..., sent9
    Type the name of the text or sentence to view it.
    Type: 'texts()' or 'sents()' to list the materials.
    text1: Moby Dick by Herman Melville 1851
    text2: Sense and Sensibility by Jane Austen 1811
    text3: The Book of Genesis
    text4: Inaugural Address Corpus
    text5: Chat Corpus
    text6: Monty Python and the Holy Grail
    text7: Wall Street Journal
    text8: Personals Corpus
    text9: The Man Who Was Thursday by G . K . Chesterton 1908
    >>> text1.generate(words=None)
    >>> 
    

    Since the generate() is not simply deprecated as in “it will be removed”, but also as in “it does not work anymore”, I suggest raising a NotImplementedError instead of logging a warning. Doing so would both make existing code fail (which is what I, as a developer, would like it to do instead of apparently succeeding without actually doing what it used to do) and show up in the console for book readers.

If you agree with these changes but do not have the time or motivation to implement them, just let me know and I will send a merge request.

It’s still there now.

Could also build your own generate, if ya want: http://www.cyber-omelette.com/2017/01/markov.html¶

Hi, what are possible current alternatives to auto generate text, in replacement to this function? I am reading the generated text from the Chapter 1 of the book, and I too would like to produce marvelous sentences such as “In the beginning of his brother is a hairy man” or this fundamental question : “so shall thy wages be?”.

The generate method should be present as of NLTK 3.4. Check out the nltk.lm package!

I also wonder if we could remove the references to generate() from the book. At least in chapter 1, they do not seem to be required by other sections of the chapter.