prefect: `Flow.serialize_parameters` can return invalid JSON representation

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar request and didn’t find it.
  • I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

Currently when a user passes a non-JSON-serializable object as a parameter value e.g. a Pandas dataframe, they get an error:

TypeError: Dict key must be str

Describe the proposed behavior

We could share info that the parameter value of parameter X is not JSON serializable

Example Use

https://linen.prefect.io/t/2584051/oh-scratch-that-to-json-did-work-i-just-tried-it-again

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 16 (10 by maintainers)

Most upvoted comments

Hm, interesting. This should not be happening at all, actually. This error is happening when the FlowRunCreate object is being cast to JSON. This should be receiving parameters that are serialized already by the call to flow.serialize_parameters(parameters) which captures type errors and just reports the type when not serializable. It looks like this is actually a bug in the output of Flow.serialize_parameters

--> 215 flow_run = await client.create_flow_run(
    216     flow,
    217     # Send serialized parameters to the backend
    218     parameters=flow.serialize_parameters(parameters),
    219     state=state,
    220     tags=TagsContext.get().current_tags,
    221 )

If we update your MRE to display the output for the working and error cases e.g. print(flow1.serialize_parameters({"df": df_error})) we get:

# Working case
{'df': '<DataFrame>'}

# Error case
{'df': {'col': {0: '1'}}}

So the issue here is that we are actually successfully serializing the dataframe in the second case but creating invalid JSON. This is some weird behavior from the jsonable_encoder.

to recreate error: I wrote a script that demos a dataframe that works, and one that throws the TypeError

from prefect import task, flow
import pandas as pd

@task
def task1():
    return 'complete'
    
@flow
def flow1(df):
    task()
    
df_works = pd.DataFrame({'col':[1]})
flow1(df_works)

df_error = pd.DataFrame({'col':['1']})
flow1(df_error)

Thanks! This should not be affected by results, it is definitely the serialization error for parameters this issue is about. You may be able to workaround this by using our quote utility on the parameter. Otherwise, this is waiting for a PR to pass the proper setting to orjson.

I think the first step here is to solve our incompatibility between orjson and json. The dataframe here is just an example of how you can end up with non-string keys. This issue will surface again if addressed just for dataframes.

Solving this at a lower level is a bit trickier though, there are a lot of considerations about where to make the adjustment. I’m tempted to try overloading the standard library json with orjson while serializing API models — if the server will accept it in that format.

Also running into this issue, am a fan of always ignoring DataFrames when sending to the server (or at least having this option somewhere). Open to make my first contribution by implementing this 😄