pandas: BUG: Cannot convert existing column to categorical
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
x = pd.DataFrame({
"A": pd.Categorical(["A", "B"], categories=["A", "B"]),
"B": [1,2],
"C": ["D", "E"]
})
print(x.dtypes)
x.loc[:, "C"] = pd.Categorical(x.loc[:, "C"], categories=["D", "E"])
print(x.dtypes)
Issue Description
When setting an existing column to its categorical equivalent, the underlying dtypes stay the same.
Expected Behavior
Output now:
A category B int64 C object dtype: object A category B int64 C object dtype: object
Expected output: A category B int64 C object dtype: object A category B int64 C category <---- dtype: object
Installed Versions
INSTALLED VERSIONS
commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.9.14.final.0 python-bits : 64 OS : Darwin OS-release : 22.4.0 Version : Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8
pandas : 2.0.0 numpy : 1.23.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 67.4.0 pip : 23.0.1 Cython : 0.29.33 pytest : 7.2.2 hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : 3.0.8 lxml.etree : None html5lib : None pymysql : None psycopg2 : 2.9.5 jinja2 : 3.0.3 IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : 2023.1.0 gcsfs : None matplotlib : 3.7.0 numba : 0.56.4 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.1 snappy : None sqlalchemy : 1.4.46 tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2022.7 qtpy : None pyqt5 : None
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 18 (12 by maintainers)
Still seems to fail for me
C still seems to be object.
Just wanted to convert dtype (float -> int) of multiple columns in my df and encountered this issue:
pandas 2.1.3
I have the same issue
the fact that it works without
.locbut doesn’t with.locis very confusing.