prefect: Duplicate runs scheduled
Description
Occasionally a specific flow will get double-scheduled in the latest version of Prefect server (we’ve run into this bug several times on previous versions). I.e. for a single clock we will get two identical runs at the same time (both visible in the UI, then both executing at the same time).
Unfortunately this does not happen on every scheduler cycle or we could troubleshoot more easily.
This may or may not be relevant, but it only happens on the times when the schedule is set to run with specific parameters. Another clock on the same schedule does not have this problem.
We can temporarily fix it by:
- Toggling the schedule on and off in the UI (only works sometimes).
- Deleting the postgres data and re-registering the flow
Expected Behavior
Any unique combination of time, flow, and parameters should only get scheduled once by the server.
Reproduction
Unfortunately we can’t always reproduce it, but this problem eventually recurs even when we completely wipe the server setup and redeploy.
The schedule in question is generated as:
# normally imported from a config:
CRON_TIMES_LOADONLY = ["0 6 * * *"]
CRON_TIMES_FULLRUN = ["0 9 * * *"]
start_date = pendulum.now(tz="America/Chicago") # pin clock to CDT
clocks_loadonly = [
clocks.CronClock(s, start_date=start_date) for s in CRON_TIMES_LOADONLY
]
clocks_fullrun = [
clocks.CronClock(
s, start_date=start_date, parameter_defaults={"run-dbtbuild": True}
)
for s in CRON_TIMES_FULLRUN
]
schedule = Schedule(clocks=clocks_loadonly + clocks_fullrun)
This definition style hasn’t caused us any issues when not used with parameter defaults.
Environment
{ “config_overrides”: { “server”: { “host”: true, “telemetry”: { “enabled”: true }, “ui”: { “apollo_url”: true } } }, “env_vars”: [], “system_information”: { “platform”: “Linux-4.15.0-118-generic-x86_64-with-glibc2.10”, “prefect_backend”: “cloud”, “prefect_version”: “0.13.14”, “python_version”: “3.8.5” } }
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 9
- Comments: 15 (6 by maintainers)
Same here. I also use
parameter_defaults
. Flow re-registration seems to fix this issue temporarily. In my case, the scheduling is de-regulated after one or two days. Really difficult to reproduce it locally 😕Environment
Update: the fix for the duplicate scheduled runs has been merged into server and the next core release will use the updated images 👍
Interesting! It should give the same hash on repeated runs. If you dig into it and have a MRE feel free to tag me in an issue and I’ll take a look. You could also use a different key like the git last modified timestamp for the file with the flow instead of the serialized hash.
Hi @davidmorch - this is different issue; Prefect Server does not implement any lock on states so if you run horizontally scale agents there is a chance they submit the same run twice. Prefect Cloud uses a global lock that prevents this behavior.