pandas: DOC: fix EX03 errors in docstrings

pandas has a script for validating docstrings

https://github.com/pandas-dev/pandas/blob/b7e2202459eadc9dd599cbe58251ecc930798b97/ci/code_checks.sh#L72-L172

Currently, some methods fail the EX03 check.

The task here is:

take 2-4 methods
run: scripts/validate_docstrings.py --format=actions --errors=EX03 method-name
check if validation docstrings passes for those methods, and if it’s necessary fix the docstrings according to whatever error is reported
remove those methods from code_checks.sh
commit, push, open pull request

Please don’t comment take as multiple people can work on this issue. You also don’t need to ask for permission to work on this, just comment on which methods are you going to work.

If you’re new contributor, please check the contributing guide

thanks @MarcoGorelli for giving me the idea for this issue.

About this issue

Original URL
State: closed
Created 6 months ago
Reactions: 1
Comments: 66 (54 by maintainers)

Most upvoted comments

yeah that’s the fix - sorry if it wasn’t clear, it was more of an explanation for people that had trouble figuring out the lines affected.

asishm on Jan 11, 2024

Explanation of what to look for:

EX03 is the errors for the example code-blocks in a function/method’s documentation

for pandas.errors.SpecificationError the examples show:

Examples
--------
>>> df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
...                    'B': range(5),
...                    'C': range(5)})
>>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

>>> df.groupby('A').agg(['min', 'min']) # doctest: +SKIP
... # SpecificationError: nested renamer is not supported

line 4 here would be the 4th line in the examples which is >>> df.groupby('A').B.agg({'foo': 'count'}) # doctest: +SKIP

line 6 would be >>> df.groupby('A').agg({'B': {'foo': ['sum', 'max']}}) # doctest: +SKIP

asishm on Jan 11, 2024

@jordan-d-murphy, I agree, seems we fixed all flake8 errors. Thank you for working on this issue with intensity and helping other contributors. Now, we can close this issue.

natmokval on Jan 23, 2024

@jordan-d-murphy great I’ll take those!

jordan-betterman on Jan 22, 2024

I’ll take:

pandas.io.formats.style.Styler.to_latex
pandas.read_parquet

tiffanyxiao on Jan 14, 2024

Okay! Makes sense. Hope the photo might help someone else then 🙂

jordan-d-murphy on Jan 11, 2024

I’ve opened a PR for the remaining 4 functions. I believe this will close this issue.

pandas.Series.plot.line pandas.Series.to_sql pandas.read_json pandas.DataFrame.to_sql

jordan-d-murphy on Jan 23, 2024

I tried all of that and still didn’t work. I’m going to stop working on it and find another issue to take on. Thanks for all the help!

jordan-betterman on Jan 22, 2024

hmmm okay, yes your approach seems correct, but when I ran this on the latest branch I’m seeing no EX03 errors for pandas.Series.plot.line

I’ve been using the following approach to set up my dev env and working branch before working on my PRs, which ensures my branch is up to date with the latest version of main.

can you try running these commands, and then try running your script again and see if it helps?

Updating the development environment

git checkout main git merge upstream/main mamba activate pandas-dev mamba env update -f environment.yml --prune

Creating a feature branch

git checkout main git pull upstream main --ff-only git checkout -b shiny-new-feature (NOTE: shiny-new-feature should be your new working branch name)

After running the above commands, running the following script scripts/validate_docstrings.py --format=actions --errors=EX03 pandas.Series.plot.line results in output that ends with this:

################################################################################
################################## Validation ##################################
################################################################################

1 Errors found for `pandas.Series.plot.line`:
	PR02	Unknown parameters {'color'}

jordan-d-murphy on Jan 22, 2024

script: scripts/validate_docstrings.py --format=actions --errors=EX03 pandas.Series.plot.line

Output:

################################################################################
##################### Docstring (pandas.Series.plot.line)  #####################
################################################################################

Plot Series or DataFrame as lines.

This function is useful to plot lines using DataFrame's values
as coordinates.

Parameters
----------
x : label or position, optional
    Allows plotting of one column versus another. If not specified,
    the index of the DataFrame is used.
y : label or position, optional
    Allows plotting of one column versus another. If not specified,
    all numerical columns are used.
color : str, array-like, or dict, optional
    The color for each of the DataFrame's columns. Possible values are:

    - A single color string referred to by name, RGB or RGBA code,
        for instance 'red' or '#a98d19'.

    - A sequence of color strings referred to by name, RGB or RGBA
        code, which will be used for each column recursively. For
        instance ['green','yellow'] each column's line will be filled in
        green or yellow, alternatively. If there is only a single column to
        be plotted, then only the first color from the color list will be
        used.

    - A dict of the form {column name : color}, so that each column will be
        colored accordingly. For example, if your columns are called `a` and
        `b`, then passing {'a': 'green', 'b': 'red'} will color lines for
        column `a` in green and lines for column `b` in red.

**kwargs
    Additional keyword arguments are documented in
    :meth:`DataFrame.plot`.

Returns
-------
matplotlib.axes.Axes or np.ndarray of them
    An ndarray is returned with one :class:`matplotlib.axes.Axes`
    per column when ``subplots=True``.

        See Also
        --------
        matplotlib.pyplot.plot : Plot y versus x as lines and/or markers.

        Examples
        --------

        .. plot::
            :context: close-figs

            >>> s = pd.Series([1, 3, 2])
            >>> s.plot.line()  # doctest: +SKIP

        .. plot::
            :context: close-figs

            The following example shows the populations for some animals
            over the years.

            >>> df = pd.DataFrame({
            ...    'pig': [20, 18, 489, 675, 1776],
            ...    'horse': [4, 25, 281, 600, 1900]
            ...    }, index=[1990, 1997, 2003, 2009, 2014])
            >>> lines = df.plot.line()

        .. plot::
           :context: close-figs

           An example with subplots, so an array of axes is returned.

           >>> axes = df.plot.line(subplots=True)
           >>> type(axes)
           <class 'numpy.ndarray'>

        .. plot::
           :context: close-figs

           Let's repeat the same example, but specifying colors for
           each column (in this case, for each animal).

           >>> axes = df.plot.line(
           ...     subplots=True, color={"pig": "pink", "horse": "#742802"}
           ... )

        .. plot::
            :context: close-figs

            The following example shows the relationship between both
            populations.

            >>> lines = df.plot.line(x='pig', y='horse')

################################################################################
################################## Validation ##################################
################################################################################

3 Errors found for `pandas.Series.plot.line`:
	PR02	Unknown parameters {'color'}
	EX03	flake8 error: line 4, col 4: E121 continuation line under-indented for hanging indent
	EX03	flake8 error: line 6, col 4: E123 closing bracket does not match indentation of opening bracket's line

jordan-betterman on Jan 22, 2024

Otherwise id take:

pandas.errors.SettingWithCopyWarning

pandas.errors.SpecificationError

pandas.errors.UndefinedVariableError

I cannot work on these since I cannot make the numpydoc work to test for validity. So they are up for grabs.

lukasld on Jan 20, 2024

@natmokval should pandas.arrays.DatetimeArray be added to the list? I see a flake8 error on my PR tests:

Error: /home/runner/work/pandas/pandas/pandas/core/arrays/datetimes.py:179:EX03:pandas.arrays.DatetimeArray:flake8 error: line 2, col 4: E121 continuation line under-indented for hanging indent

Edit: Looks like this is fixed in #56855

alpakpinar on Jan 17, 2024

I’ll take these:

pandas.io.formats.style.Styler.highlight_quantile pandas.io.formats.style.Styler.background_gradient pandas.io.formats.style.Styler.text_gradient

jordan-d-murphy on Jan 16, 2024

Working on:

pandas.DataFrame.plot.hexbin
pandas.DataFrame.plot.line

alpakpinar on Jan 16, 2024

I’ll take these:

pandas.io.formats.style.Styler.set_tooltips pandas.io.formats.style.Styler.set_uuid pandas.io.formats.style.Styler.pipe pandas.io.formats.style.Styler.highlight_between

jordan-d-murphy on Jan 15, 2024

Working on:

pandas.DataFrame.groupby
pandas.DataFrame.values
pandas.DataFrame.sort_values

alpakpinar on Jan 16, 2024

Working on:

pandas.ExcelWriter

yuanx749 on Jan 15, 2024

I’ll take:

pandas.io.formats.style.Styler.format_index pandas.io.formats.style.Styler.relabel_index pandas.io.formats.style.Styler.hide pandas.io.formats.style.Styler.set_td_classes

jordan-d-murphy on Jan 15, 2024

I’ll take these:

pandas.io.json.build_table_schema pandas.read_stata pandas.plotting.scatter_matrix pandas.Index.droplevel pandas.Grouper

jordan-d-murphy on Jan 14, 2024

I’ll take: pandas.Timestamp.ceil pandas.Timestamp.floor pandas.Timestamp.round

jordan-d-murphy on Jan 14, 2024

working on:

pandas.errors.PossibleDataLossError
pandas.errors.PossiblePrecisionLoss
pandas.errors.SettingWithCopyError
pandas.errors.ValueLabelTypeMismatch

yuanx749 on Jan 14, 2024

Hi @natmokval, the command line scripts/validate_docstrings.py --format=actions --errors=EX03 method-name outputs all kind of errors, not just the EX03 errors.

https://github.com/pandas-dev/pandas/blob/c778746f2219601ac3c38f4f287f9a4e68905655/scripts/validate_docstrings.py#L444-L458

erichxchen on Jan 14, 2024

Working on:

pandas.DataFrame.hist
pandas.read_json
pandas.DataFrame.to_sql

erichxchen on Jan 14, 2024

PR opened for the following:

pandas.DataFrame.var
pandas.DatetimeIndex.day_name
pandas.core.groupby.DataFrameGroupBy.apply
pandas.DatetimeIndex.month_name
pandas.core.groupby.DataFrameGroupBy.hist
pandas.core.groupby.SeriesGroupBy.apply
pandas.core.groupby.SeriesGroupBy.transform
pandas.DataFrame.plot.hist
pandas.DataFrame.tz_localize
pandas.CategoricalIndex.set_categories
pandas.core.groupby.DataFrameGroupBy.boxplot
pandas.core.groupby.SeriesGroupBy.pipe
pandas.DataFrame.plot.bar
pandas.DataFrame.tz_convert
pandas.core.groupby.DataFrameGroupBy.pipe
pandas.DataFrame.skew
pandas.core.window.rolling.Rolling.corr

asishm on Jan 13, 2024

work on

pandas.MultiIndex.names
pandas.MultiIndex.droplevel

tqa236 on Jan 13, 2024

I’ll take these:

pandas.core.resample.Resampler.interpolate pandas.pivot
pandas.merge_asof pandas.wide_to_long pandas.Index.rename pandas.Index.isin pandas.IndexSlice

jordan-d-murphy on Jan 12, 2024

Im new to this and maybe I overlook something fundamental: I wanted to take on some of these docstrings, I however run into an issue and am not sure what I am doing wrong.

After executing:
python3 validate_docstrings.py --format=actions --errors=EX03 pandas.errors.SpecificationError
I get a list of flake8 - errors:
flake8 error: line 4, col 40: E261 at least two spaces before inline comment
...
However, after adding an extra space and saving the docstring for SpecificationError in this case in
./pandas/errors/__init__.py
and rerunning the above validation_docstrings.py again, the script returns the same errors, as if the change had no effect.

Even I am new and facing similar issue . Even after making the changes the error logs don’t change

svrashank on Jan 11, 2024

Working on:

pandas.errors.DatabaseError
pandas.errors.IndexingError
pandas.errors.InvalidColumnName Thank you!

tiffanyxiao on Jan 11, 2024

Working on :

pandas.Series.plot.line
pandas.Series.to_sql

svrashank on Jan 10, 2024

I will work for: pandas.Series.cat.set_categories pandas.Series.plot.bar pandas.Series.plot.hist

luke396 on Jan 10, 2024

I can take the first two methods.

pandas.Series.dt.day_name
pandas.Series.str.len

roadrollerdafjorst on Jan 10, 2024