pandas: Cannot convert numbers to strings when reading an Excel spreadsheet

I’m reading some excel spreadsheets (xlsx format) into pandas using read_excel, which generally works great. The problem I have is that when a column contains numbers, pandas converts these to float64 type, and I would like them to be treated as strings. After reading them in, I can convert the column to str:

 my_frame.my_col = my_frame.my_col.astype('str')

This works as far as assigning the right type to the column, but when I view the values in this column, the strings are formatted in scientific-format e.g. 8.027770e+14, which is not what I want. I like to be able to tell Pandas to read the columns as strings. My current solution involves dropping down to xlrd to read the spreadsheet.

See stackoverflow question: http://stackoverflow.com/q/20970483/690890

About this issue

Original URL
State: closed
Created 10 years ago
Comments: 34 (17 by maintainers)

Most upvoted comments

the ‘real’ soln here is to support a dtype={} argument similar to read_csv, see issue #4932 which would allow you to specify that certain columns can be retained as object dtype (iow they wouldnt’ be coerced).

PR anyone?

jreback on Mar 20, 2014

vote for suggested solution which is to support dtype for excel file.

naoko on Jun 17, 2014

Here is a workaround to the problem of leading zeros. If you know that your column has a fixed-width you can first turn the column into a string, and then add the gone padding zeros with the string method zfill, as done in here: http://stackoverflow.com/questions/339007/nicest-way-to-pad-zeroes-to-string

Overall, I agree that read_excel should have a dtype option.

paupereira on Feb 19, 2015

well, my company receives several files of client data (normally Excel files) on a constant basis. I am in charge of ensuring that data is formatted according to our main import logic.

I built a Flask app to help our account managers formatting the files. This app relies on pandas heavily, because pandas is amazing.

However, since these files are coming straight from clients, my app doesn’t know what is on each file, so each account manager uses my app to decide if a column should be an int, or a date, etc.

So how i built the app it would turn every column into a dtype object column and then format every column according to the requirements.

This works great with csv files (because of the dtype arg), but with excel files the date columns are giving me a hard type.

manugarri on Mar 17, 2014

I like to be able to tell Pandas to read the columns as strings. My current solution involves dropping down to xlrd to read the spreadsheet.

How exactly are you doing that?

As far as I can see from the xlrd code, when it reads a number from an xlsx file it immediately converts it to a float before returning the cell object to the user.

So unless I am missing something I don’t see how Pandas can be made to read the columns as strings. It will always get a float from xlrd for an xls or xlsx file if Excel has stored them as numbers.

The CSV readers behave differently and can be type specified because they return all read data as strings regardless of whether it looks like a number of not.

Perhaps @jreback or @jtratner can suggest a different way to convert a column from float64 to formatted strings.

But apart from a workaround I don’t think this issue is fixable in the way that you would like.

jmcnamara on Jan 18, 2014

Thanks, but the dtype argument isn’t available for read_excel

deanjones on Jan 9, 2014