prefect: `Flow.serialize_parameters` can return invalid JSON representation
First check
- I added a descriptive title to this issue.
- I used the GitHub search to find a similar request and didn’t find it.
- I searched the Prefect documentation for this feature.
Prefect Version
2.x
Describe the current behavior
Currently when a user passes a non-JSON-serializable object as a parameter value e.g. a Pandas dataframe, they get an error:
TypeError: Dict key must be str
Describe the proposed behavior
We could share info that the parameter value of parameter X is not JSON serializable
Example Use
https://linen.prefect.io/t/2584051/oh-scratch-that-to-json-did-work-i-just-tried-it-again
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 16 (10 by maintainers)
Hm, interesting. This should not be happening at all, actually. This error is happening when the
FlowRunCreate
object is being cast to JSON. This should be receiving parameters that are serialized already by the call toflow.serialize_parameters(parameters)
which captures type errors and just reports the type when not serializable. It looks like this is actually a bug in the output ofFlow.serialize_parameters
If we update your MRE to display the output for the working and error cases e.g.
print(flow1.serialize_parameters({"df": df_error}))
we get:So the issue here is that we are actually successfully serializing the dataframe in the second case but creating invalid JSON. This is some weird behavior from the
jsonable_encoder
.to recreate error: I wrote a script that demos a dataframe that works, and one that throws the TypeError
Thanks! This should not be affected by results, it is definitely the serialization error for parameters this issue is about. You may be able to workaround this by using our
quote
utility on the parameter. Otherwise, this is waiting for a PR to pass the proper setting toorjson
.I think the first step here is to solve our incompatibility between
orjson
andjson
. The dataframe here is just an example of how you can end up with non-string keys. This issue will surface again if addressed just for dataframes.Solving this at a lower level is a bit trickier though, there are a lot of considerations about where to make the adjustment. I’m tempted to try overloading the standard library
json
withorjson
while serializing API models — if the server will accept it in that format.Also running into this issue, am a fan of always ignoring DataFrames when sending to the server (or at least having this option somewhere). Open to make my first contribution by implementing this 😄