google-apis-rs: Process stuck, frozen on `CalendarHub.events().doit()` with high CPU usage

Hi,

To start off, thank you very much for your libraries!

I’m running into an issue. I have a program that tries to fetch calendar events periodically. It runs just fine for a while, but at some point (after some 15 minutes) it will always end up freezing the process here:

block_on(
    self.hub
        .events()
        .list("primary")
        .add_scope(Scope::EventReadonly)
        .max_results(10)
        .time_min(&time_min.to_rfc3339())
        .time_max(&time_max.to_rfc3339())
        .single_events(true)
        .order_by("starttime")
        .doit(), // here!
)
.map(|(response, events)| {
    // this scope is never reached
})
.map_err(|_| EventsProviderError {})

This is the last log lines I get with RUST_LOG=debug before the process gets stuck:

[2022-12-21T21:56:26Z DEBUG yup_oauth2::authenticator] access token requested for scopes: [https://www.googleapis.com/auth/calendar.events.readonly]
[2022-12-21T21:56:26Z DEBUG yup_oauth2::authenticator] found valid token in cache: TokenInfo { access_token: "redacted", refresh_token: Some("redacted"), expires_at: Some(2022-12-21 22:28:22.147643846 +00:00:00), id_token: None }

The process hangs indefinitely (I’ve left it on for hours) and it seems to kick up CPU usage a lot. CPU usage goes up to 12% every time this happens, and remains there. 12% is more or less 100/8 (I have 8 CPUs, so that would indicate 100% usage of one CPU).

I’ve tried to wrap the future in a tokio::time::timeout_at, but to my great surprise the timeout never goes off.

I ran strace with my program, and the last syscall is futex:

futex(0x55a7020ddf00, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55a70208a118, FUTEX_WAKE_PRIVATE, 1) = 1
recvmsg(9, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}], 3, 936) = 0 (Timeout)
statx(AT_FDCWD, "/etc/localtime", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFLNK|0777, stx_size=33, ...}) = 0
futex(0x7f52c0d955e0, FUTEX_WAKE_PRIVATE, 1) = 1

All the lines from strace happen during normal functioning, though.

As a last resort, I tried to get a coredump from the running process, but the process ignored all the signals except SIGKILL, which sounds like it’s waiting for IO:

image

I wonder whether you have any ideas as to what might be going on. Maybe I need to set a timeout somewhere? Or maybe this library needs to do that?

Thank you in advance!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 25 (5 by maintainers)

Most upvoted comments

Thank you, very interesting. Lesson learned: use tokio tooling with tokio only, and don’t expect all crates to interact nicely with each other.

https://github.com/hyperium/hyper/issues/3140#issuecomment-1419583828

futures-executor-0.3.25/src/local_pool.rs:95

This is a red flag to me. I would check where you are calling the local pool, (is that block_on you mention at the top it?), and rip them out. They don’t work well with Tokio’s reactor. Replace them with either a tokio runtime block_on call, or better, make the function async itself, and just use #[tokio::main].

The reason they conflict: Tokio’s reactor uses epoll_wait when it needs to wait on IO. When epoll returns, Tokio’s reactor processes all events it noticed. By using futures-executor, it’s block_on does something a little different. When the future is not ready, it will park the thread (std::thread::park()) until the future wakes up. The problem is that the reactor is on that same thread, and since it parked, it will stall there, and never reach the epoll_wait. Using futures-executor is a very common cause for deadlocks.

Problem solved. I removed futures::executor::block_on and used just async/await on top of #[tokio:main]. I learned quite a bit with this!

Thanks a lot! Maybe GoAway is indeed something new for hyper to learn, let’s hope there is a solution for that.

For now I assume the issue is indeed somewhere in hyper and thus close this issue - please feel free to reopen once you know that a fix should be applied here. Thank you

Thanks for sharing your extensive notes, good luck with figuring out the culprit! It’s not the first time I notice strange hangs, which makes me feel a bit uneasy about the state of async in general.

That aside, maybe it’s possible for you to try the latest version published 10d ago - maybe that changes something.