pandas: BUG: erroneous initialization of a DataFrame with Series objects

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

x = pd.Series(["a", "b", "c"])
y = pd.Series([1, 2, 3])

pd.DataFrame(y, x)
>>>     0
>>> a NaN
>>> b NaN
>>> c NaN
pd.DataFrame(x, y)
>>>      0
>>> 1    b
>>> 2    c
>>> 3  NaN
pd.DataFrame(x.values, y.values)
>>>    0
>>> 1  a
>>> 2  b
>>> 3  c

Problem description

I would expect pd.Series objects to be valid inputs for the DataFrame constructor. If this is not the case a warning (or even raising an error) would be nice…

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c7f7443c1bad8262358114d5e88cd9c8a308e8aa python : 3.9.6.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.17763 machine : AMD64 processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : en LOCALE : German_Austria.1252

pandas : 1.3.1 numpy : 1.21.1 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.1 setuptools : 49.6.0.post20210108 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.0 sqlalchemy : None tables : 3.6.1 tabulate : None xarray : None xlrd : None xlwt : None numba : None

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

@tyuyoshi sure. go for it!

Hi, can I pick this issue as my first OSS contribution?