setuptools: Newlines in the `description` field produce a malformed PKG-INFO

We discovered this accidentally by way of https://github.com/zopefoundation/zc.relation/issues/4#issuecomment-397532224: if you pass a string containing newlines to the description argument of setup(), setuptools will generate a malformed PKG-INFO.

To reproduce:

# setup.py
from setuptools import setup
setup(
    name='test-package',
    version='0.1',
    author='Blah Blah',
    author_email='blah@example.com',
    description='description\n\n',
    py_modules=['blah'],
)

(The contents of blah.py do not matter, but the file should exist.)

Run python setup.py sdist and the inspect test_package.egg-info/PKG-INFO. For me, with setuptools 39.1.0, it looks like this:

Metadata-Version: 1.0
Name: test-package
Version: 0.1
Summary: description


Home-page: UNKNOWN
Author: Blah Blah
Author-email: blah@example.com
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN

The extra newlines lead tools to treat the rest of the PKG-INFO as a long_description.

I would expect setuptools to complain about the newlines in the description field, or at least escape them properly (i.e. prepend whitespace, like it does for the long_description field).

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 6
  • Comments: 42 (22 by maintainers)

Commits related to this issue

Most upvoted comments

I see the options as:

  1. Escape newlines in the text so they aren’t interpreted as delimiters.
  2. Raise an error whenever double newlines are found outside of long_description.
  3. Silently s/\n+/\n/g for everything except long_description.

I think I prefer 1 as least disruptive, but I’m ambivalent between 2 and 3. 2 is more explicit, but 3 creates less “churn” (since this will just be auto-magically fixed).

We should fail hard for configurations that are invalid. There should be an error message that gives a clear indication on how to fix the problem. This is a developer API, and developers should deal with their broken code. We shouldn’t try to guess-fix broken stuff.

Do the fields/kwargs have an explicit specification of the data they expect? This should be enforced. The fields description and license obviously expect single-line data. Hence, when a newline is submitted in the data for those fields the generation of PKG-INFO should be aborted with a clear error message, e.g.

$ python3 setup.py sdist
ERROR: The "description" field requires a single line of text. Multi-line text was supplied.
Aborting.

setuptools 51.3.0 landed and I immediately got some error report about broken (internal) builds.

Reporting error here is not a good idea, really.

Please, don’t go that way. Drop description altogether if you don’t like it and can’t fix it, but don’t break installations. Ecosystem is huge, libraries are plenty, automated builds are numerous…

Just wanted to point out that this is not limited to just the Summary field – any field with a double newline will cause the rest of the PKG-INFO file after those newlines to be interpreted as the “message body” and thus become the Long-Description.

I would have expect that setuptools does not hardly break for a simple escape all \n workaround.

It seems like an unfortunate wart to carry to support transforming \n to forever. Is that what you’re proposing, that Setuptools should allow invalid inputs and maintain responsibility for making them valid?

To be sure, Setuptools is not happy about breaking the builds. It’s conceivable we could back out the removal and restore the Deprecation warning for an extended period. Would that help?

In retrospect, the change introduced was probably too aggressive, especially because it affects third-party packages and older releases. I’ll back off the change to emit a warning and remove the newlines with the intention of making the behavior invalid in the future.

OK, anyone who wants to submit a PR, let’s go with a hard-failure if newlines are found in fields other than long_description.

I think the easiest way to do this is to implement the check right before PKG-INFO is actually written out. Hopefully that doesn’t disconnect the error message too much from the location where these inputs are actually being generated.

@Mekk Immediately issuing a fatal error here is perhaps not a good idea (if you are affected, fetch 51.3.3 and get a warning instead), but please do recognize that if these errors break your build, your build was already subtly broken. It’s not that setuptools dropped support for newlines in the description field, it’s that it never supported them properly in the first place and produced bad and wrong output when they were encountered, only that bad and wrong output was still good enough that it still worked in most cases (though see my comment above re python_requires). Again, this is not about dropping something that used to be supported, this is turning silent breakage into loud breakage. Which I think we can all agree is a good thing.

Just out of curiosity: Why are newlines not possible? What breaks with newlines?

In its final form, the PKG-INFO metadata file is an RFC 822 document. Double newlines at any point in this document indicate a switch from the message headers to the message body. This causes everything after a double newline to be interpreted as part of the message body, including important metadata fields.

@bittner Not sure I follow… The validator(s) would be part of the packaging library, which would understand how to validate a field based on the field name. You shouldn’t have to do an add_validator here at all.

I’d really prefer to introduce a class for this that can encapsulate the validation logic for all values. It feels too much like spaghetti code, otherwise. I hope that’s fine.

One of my top priorities for this weekend’s sprint is to move metadata validation logic out of Warehouse and into pypa/packaging for reuse elsewhere, and this would be a good example of “elsewhere”.

The API will likely behave a lot like how it’s currently being used in Warehouse: there will be a class which is initialized with a dict adhering to JSON-compatible metadata, and an instance of that class will have a .validate() function and a .errors attribute.

If you want to move ahead with that assumption, we can probably just connect the dots later.