sentry-python: Sentry is missing some of cron check-ins when used with sentry-python

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.39.1

Steps to Reproduce

I’m using this task to send check-ins from airflow:

def sentry_checkin(status: str, init_task_id: Optional[str] = None, **kwargs) -> str:
  sentry_logging = LoggingIntegration(
    level=logging.WARNING,  # Capture warning and above as breadcrumbs
    event_level=logging.ERROR,  # Send errors as events
  )
  sentry_sdk.init(
    environment="production", dsn=MY_PROJECT, integrations=[sentry_logging]
  )
  check_in_id = None
  if init_task_id:
    ti = kwargs["ti"]
    check_in_id = ti.xcom_pull(task_ids=init_task_id)
  return crons.capture_checkin(monitor_slug="my-monitor", check_in_id=check_in_id, status=status)

I use this task in my airflow dag, it’s set up to run on every execution: before the main logic, and after. It runs on every execution.

I used to have celery and django integrations included because I needed them in the past. Removing them improved the results.

Expected Result

All check-ins to sentry are visible in the cron tab in sentry UI. There are no missed checkins when the tasks are working as expected.

The job runs every 10 minutes, I expect evenly spaced successful check-ins.

Actual Result

Both initial check-ins and completion check-ins may be missing from time to time.

With celery and django integrations the history of check-ins looks like this: image

Without celery and django integrations: image

About this issue

  • Original URL
  • State: open
  • Created 6 months ago
  • Comments: 27 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@sentrivana hi! Unfortunately, it didn’t improve the stability of my monitor… Thank you for the suggestion!

Now that we’ve had the new settings applied for a few days I wanted to confirm here that things seem to work much better now 🙏

7 day view: image

There’s been a couple hiccups that I still don’t completely understand (especially the longish period of red in dev and production 6-ish days ago), but as long as this is pretty rare still then Crons is a much more helpful tool for us to view now 👍

These errors might be legitimate application errors that we need to investigate, and now they’re not drowning in false alerts. The last row in the screenshot for example was red because of a database configuration issue and Crons reported the failure accurately, which pointed us towards the issue and we made a fix and got the monitor back to green.

In our case I would expect more or less 100% green everywhere - I could see a few tasks being missed in cases where they are scheduled to run exactly when we deploy a new revision of the app or something, but mostly all the tasks seem to be running as they should, and cron monitoring is just mistaken in reporting errors.

I created a debug task today and deployed it do our dev environment only. It runs once per minute, sleeps for 10 seconds during execution then returns.

image

I deployed it with debug=True on sentry-sdk 1.40 first and we had three misses that you can see in the screenshot. Then I deployed the bump to 1.41 and the socket options, and it seems maybe to have improved, but we did get another miss later.

Each miss seems to be accompanied by an exception during the send_envelope request. I notice there is some NewRelic instrumentation that is affecting the http requests, will have to dig a little to see if that could have anything to do with this.

Traceback (most recent call last):
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 715, in urlopen
    httplib_response = self._make_request(
  File ""/usr/local/lib/python3.10/site-packages/newrelic/hooks/external_urllib3.py"", line 32, in _nr_wrapper_make_request_
    return wrapped(*args, **kwargs)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 467, in _make_request
    six.raise_from(e, None)
  File ""<string>"", line 3, in raise_from
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 462, in _make_request
    httplib_response = conn.getresponse()
  File ""/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/stdlib.py"", line 126, in getresponse
    return real_getresponse(self, *args, **kwargs)
  File ""/usr/local/lib/python3.10/site-packages/newrelic/hooks/external_httplib.py"", line 77, in httplib_getresponse_wrapper
    return wrapped(*args, **kwargs)
  File ""/usr/local/lib/python3.10/http/client.py"", line 1374, in getresponse
    response.begin()
  File ""/usr/local/lib/python3.10/http/client.py"", line 318, in begin
    version, status, reason = self._read_status()
  File ""/usr/local/lib/python3.10/http/client.py"", line 287, in _read_status
    raise RemoteDisconnected(""Remote end closed connection without""
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ""/usr/local/lib/python3.10/site-packages/sentry_sdk/transport.py"", line 535, in send_envelope_wrapper
    self._send_envelope(envelope)
  File ""/usr/local/lib/python3.10/site-packages/sentry_sdk/transport.py"", line 434, in _send_envelope
    self._send_request(
  File ""/usr/local/lib/python3.10/site-packages/sentry_sdk/transport.py"", line 245, in _send_request
    response = self._pool.request(
  File ""/usr/local/lib/python3.10/site-packages/urllib3/request.py"", line 81, in request
    return self.request_encode_body(
  File ""/usr/local/lib/python3.10/site-packages/urllib3/request.py"", line 173, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/poolmanager.py"", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 799, in urlopen
    retries = retries.increment(
  File ""/usr/local/lib/python3.10/site-packages/urllib3/util/retry.py"", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/packages/six.py"", line 769, in reraise
    raise value.with_traceback(tb)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 715, in urlopen
    httplib_response = self._make_request(
  File ""/usr/local/lib/python3.10/site-packages/newrelic/hooks/external_urllib3.py"", line 32, in _nr_wrapper_make_request_
    return wrapped(*args, **kwargs)
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 467, in _make_request
    six.raise_from(e, None)
  File ""<string>"", line 3, in raise_from
  File ""/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py"", line 462, in _make_request
    httplib_response = conn.getresponse()
  File ""/usr/local/lib/python3.10/site-packages/sentry_sdk/integrations/stdlib.py"", line 126, in getresponse
    return real_getresponse(self, *args, **kwargs)
  File ""/usr/local/lib/python3.10/site-packages/newrelic/hooks/external_httplib.py"", line 77, in httplib_getresponse_wrapper
    return wrapped(*args, **kwargs)
  File ""/usr/local/lib/python3.10/http/client.py"", line 1374, in getresponse
    response.begin()
  File ""/usr/local/lib/python3.10/http/client.py"", line 318, in begin
    version, status, reason = self._read_status()
  File ""/usr/local/lib/python3.10/http/client.py"", line 287, in _read_status
    raise RemoteDisconnected(""Remote end closed connection without""
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Thanks for following up @SoerenWeber, glad that upgrading has fixed the issue for you! We might be dealing with two different issues but in any case worth a try – @IevgeniiB could you see if upgrading to 1.39.2 changes anything?

Thank you, @sentrivana! Please let me know if I can provide more information to help debug this issue.

@gaprl thank you for looking into it, I’ve sent the URL now. Please let me know if this is not the URL you’re looking for or if you need something else.

Hey @IevgeniiB and @SoerenWeber, thanks for reaching out. Could you please share the affected monitor URLs with us so we can further investigate on our end? You can email it directly to us at crons-feedback@sentry.io. Thanks.

Hey @SoerenWeber, I don’t think this is related to https://github.com/getsentry/sentry-python/pull/2598 since if the type was wrong, I’d expect no check-ins at all. But the issue here appears to be intermittent.