sentry-python: Sentry is missing some of cron check-ins when used with sentry-python
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.1
Steps to Reproduce
I’m using this task to send check-ins from airflow:
def sentry_checkin(status: str, init_task_id: Optional[str] = None, **kwargs) -> str:
sentry_logging = LoggingIntegration(
level=logging.WARNING, # Capture warning and above as breadcrumbs
event_level=logging.ERROR, # Send errors as events
)
sentry_sdk.init(
environment="production", dsn=MY_PROJECT, integrations=[sentry_logging]
)
check_in_id = None
if init_task_id:
ti = kwargs["ti"]
check_in_id = ti.xcom_pull(task_ids=init_task_id)
return crons.capture_checkin(monitor_slug="my-monitor", check_in_id=check_in_id, status=status)
I use this task in my airflow dag, it’s set up to run on every execution: before the main logic, and after. It runs on every execution.
I used to have celery and django integrations included because I needed them in the past. Removing them improved the results.
Expected Result
All check-ins to sentry are visible in the cron tab in sentry UI. There are no missed checkins when the tasks are working as expected.
The job runs every 10 minutes, I expect evenly spaced successful check-ins.
Actual Result
Both initial check-ins and completion check-ins may be missing from time to time.
With celery and django integrations the history of check-ins looks like this:
Without celery and django integrations:
About this issue
- Original URL
- State: open
- Created 6 months ago
- Comments: 27 (13 by maintainers)
@sentrivana hi! Unfortunately, it didn’t improve the stability of my monitor… Thank you for the suggestion!
Now that we’ve had the new settings applied for a few days I wanted to confirm here that things seem to work much better now 🙏
7 day view:
There’s been a couple hiccups that I still don’t completely understand (especially the longish period of red in dev and production 6-ish days ago), but as long as this is pretty rare still then Crons is a much more helpful tool for us to view now 👍
These errors might be legitimate application errors that we need to investigate, and now they’re not drowning in false alerts. The last row in the screenshot for example was red because of a database configuration issue and Crons reported the failure accurately, which pointed us towards the issue and we made a fix and got the monitor back to green.
In our case I would expect more or less 100% green everywhere - I could see a few tasks being missed in cases where they are scheduled to run exactly when we deploy a new revision of the app or something, but mostly all the tasks seem to be running as they should, and cron monitoring is just mistaken in reporting errors.
I created a debug task today and deployed it do our dev environment only. It runs once per minute, sleeps for 10 seconds during execution then returns.
I deployed it with debug=True on sentry-sdk 1.40 first and we had three misses that you can see in the screenshot. Then I deployed the bump to 1.41 and the socket options, and it seems maybe to have improved, but we did get another miss later.
Each miss seems to be accompanied by an exception during the send_envelope request. I notice there is some NewRelic instrumentation that is affecting the http requests, will have to dig a little to see if that could have anything to do with this.
Thanks for following up @SoerenWeber, glad that upgrading has fixed the issue for you! We might be dealing with two different issues but in any case worth a try – @IevgeniiB could you see if upgrading to 1.39.2 changes anything?
Thank you, @sentrivana! Please let me know if I can provide more information to help debug this issue.
@gaprl thank you for looking into it, I’ve sent the URL now. Please let me know if this is not the URL you’re looking for or if you need something else.
Hey @IevgeniiB and @SoerenWeber, thanks for reaching out. Could you please share the affected monitor URLs with us so we can further investigate on our end? You can email it directly to us at crons-feedback@sentry.io. Thanks.
Hey @SoerenWeber, I don’t think this is related to https://github.com/getsentry/sentry-python/pull/2598 since if the type was wrong, I’d expect no check-ins at all. But the issue here appears to be intermittent.