pandas: BUG: on_bad_lines=callable does not invoke callable for all bad lines

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [29]:
import pandas as pd
pd.__version__
Out [29]:
'1.4.3'

In [30]:
len(open("bad.csv").readlines())
Out [30]:
3

In [31]:
df1 = pd.read_csv("bad.csv", on_bad_lines='warn', engine='python')
Skipping line 3: ',' expected after '"'


In [32]:
df2 = pd.read_csv("bad.csv", on_bad_lines=print, engine='python')

In [33]:
len(df1), len(df2)
Out [33]:
(1, 1)

Issue Description

The above data file has two rows + header. Row 2 is valid, Row 3 is bad.

For df1, I’m setting on_bad_line=warn, and I see a warning for line 3.

For d2, I’m passing on_bad_lines=print, and I don’t see any prints - the bad line is silently skipped.

❯ cat bad.csv
country,founded,id,industry,linkedin_url,locality,name,region,size,website
united states,"",heritage-equine-equipment-llc,farming,linkedin.com/company/heritage-equine-equipment-llc,"",heritage equine equipment llc,"",1-10,heritageequineequip.com
chile,"",contacto-corporación-colina,hospital & health care,linkedin.com/company/contacto-corporación-colina,colina,"contacto \" corporación colina",santiago metropolitan,11-50,corporacioncolina.cl

Expected Behavior

I would expect the bad line to be printed in the second case.

Installed Versions

pd.show_versions()

INSTALLED VERSIONS

commit : e8093ba372f9adfe79439d90fe74b0b5b6dea9d6 python : 3.9.12.final.0 python-bits : 64 OS : Linux OS-release : 5.11.0-49-generic Version : #55-Ubuntu SMP Wed Jan 12 17:36:34 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.4.3 numpy : 1.23.1 pytz : 2022.1 dateutil : 2.8.2 setuptools : 60.6.0 pip : 22.0.3 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.4.0 pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None markupsafe : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None /home/venky/dev/instant-science/explore/.venv/lib/python3.9/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. warnings.warn(“Setuptools is replacing distutils.”)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (12 by maintainers)

Most upvoted comments

You can cross reference this issue when making a PR and we can leave this issue open to further discuss a broader scope for “bad line”