pandas: Pandas DataFrame is not JSON serializable

Hello,

I’m trying to build a RESTful api with Flask-RESTful which will return Pandas DataFrame as JSON data.

from flask import Flask
from flask.ext import restful
from flask.ext.restful import Resource, Api
import pandas as pd
import click
import logging

app = Flask(__name__)
api = restful.Api(app)

class DataWebservice(Resource):
    def get(self, size):
        logging.info("get %d" % size)
        # get DB conn
        # df = pd.read_sql(...) # USE LIMIT
        df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
        #return(df.to_json())
        return(df)

api.add_resource(DataWebservice, '/api/v1/solar/df/get/<int:size>')

@click.command()
@click.option('--host', default='127.0.0.1', \
    help="host (127.0.0.1 or 0.0.0.0 to accept all ip)")
@click.option('--debug/--no-debug', default=False, help="debug mode")
def main(debug, host):
    app.run(host=host, debug=debug)

if __name__ == '__main__':
    main()

I run server using $ python server.py --debug

I run client using $ curl http://127.0.0.1:5000/api/v1/solar/df/get/10

but I get the following error

TypeError:    col1  col2
0     1     2
1     1     2
...
8     1     2
9     1     2

[10 rows x 2 columns] is not JSON serializable

So it seems that Pandas DataFrame are not JSON serializable.

I try this using IPython

size = 10
df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
json.dumps(df)

It raises same error. I’m aware that DataFrame have method named to_json() but it doesn’t help me much as my server will return escaped strings such as

"{\"col1\":{\"0\":1,\"1\":1,\"2\":1,\"3\":1,\"4\":1,\"5\":1,\"6\":1,\"7\":1,\"8\":1,\"9\":1},\"col2\":{\"0\":2,\"1\":2,\"2\":2,\"3\":2,\"4\":2,\"5\":2,\"6\":2,\"7\":2,\"8\":2,\"9\":2}}"

see https://github.com/twilio/flask-restful/issues/269

Kind regards

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 20 (14 by maintainers)

Most upvoted comments

you can use make_response from flask , e.g. resp = make_response(df.to_json(orient = “records”)) and then simply return it.

@scls19fr this is obviously an old issue, but seeing as I stumbled upon it. The easiest way to nest a dataframe in a larger JSON blob is to use

demo = {
    'key': df.to_dict(orient='record')
}

json.dump(demo)

It works much better with this custom encoder (with to_dict method):

class JSONEncoder(json.JSONEncoder):
    def default(self, obj):
        if hasattr(obj,'to_dict'):
            return obj.to_dict()
        return json.JSONEncoder.default(self, obj)

In: size = 10
In: df = pd.DataFrame({"col1": [1]*size, "col2": [2]*size})
In: ser = json.dumps(d, cls=JSONEncoder)
Out: '{"return": {"col2": {"0": 2, "1": 2, "2": 2, "3": 2, "4": 2, "5": 2, "6": 2, "7": 2, "8": 2, "9": 2}, "col1": {"0": 1, "1": 1, "2": 1, "3": 1, "4": 1, "5": 1, "6": 1, "7": 1, "8": 1, "9": 1}}, "success": 1}'
unser=json.loads(ser)

In: print(unser)
Out: {u'return': {u'col2': {u'1': 2, u'0': 2, u'3': 2, u'2': 2, u'5': 2, u'4': 2, u'7': 2, u'6': 2, u'9': 2, u'8': 2}, u'col1': {u'1': 1, u'0': 1, u'3': 1, u'2': 1, u'5': 1, u'4': 1, u'7': 1, u'6': 1, u'9': 1, u'8': 1}}, u'success': 1}

In: type(unser)
Out: dict

pd.DataFrame(unser['return'])

There is no extra quotes.

Thanks

This is the typical way to extend the default json encoder

In [9]: class JSONEncoder(json.JSONEncoder):
   ...:     def default(self, obj):
   ...:         if hasattr(obj,'to_json'):
   ...:             return obj.to_json()
   ...:         return json.JSONEncoder.default(self, obj)
   ...:     

In [10]: json.dumps(df, cls=JSONEncoder)
Out[10]: '"{\\"col1\\":{\\"0\\":1,\\"1\\":1,\\"2\\":1,\\"3\\":1,\\"4\\":1,\\"5\\":1,\\"6\\":1,\\"7\\":1,\\"8\\":1,\\"9\\":1},\\"col2\\":{\\"0\\":2,\\"1\\":2,\\"2\\":2,\\"3\\":2,\\"4\\":2,\\"5\\":2,\\"6\\":2,\\"7\\":2,\\"8\\":2,\\"9\\":2}}"'

Hello,

Thanks I know to_json() method I thought Pandas DataFrame could inherit an other class to become directly “JSON serializable”. So json.dumps(df) could return exactly the same result as df.to_json(). So in this Flask view we could directly return DataFrame (in fact jsonify(df)) instead of doing:

    resp = Response(response=df.to_json(),
        status=200,
        mimetype="application/json")
    return(resp)

But maybe I’m wrong and there is no way for json.dumps(df) to return a JSON string. I thought that what I’m asking is more a syntactic sugar than a major improvement.

Kind regards

@scls19fr this is obviously an old issue, but seeing as I stumbled upon it. The easiest way to nest a dataframe in a larger JSON blob is to use

demo = {
    'key': df.to_dict(orient='record')
}

json.dump(demo)

That is the solution 👍 💯

I understand your reply… but imagine you have several dataframes to output into the same JSON message… doing things this way is not very clear. I think it’s much more clear to have a dict structure which can contains several df (and other data) and after serialize it.

About deserialization… yes that’s not a problem… I know structure and where are dataframes.

Please try this:

d = {"success":1 , "return": df}

ser=json.dumps(d, cls=JSONEncoder)

unser=json.loads(ser)

type(unser["return"])

it returns unicode I could expect dict