requests-oauthlib: Problems with Token refreshing.

First of all, thank you for this wonderful library. It works great and saved me tons of time.

I am having issues with one aspect: auto token refresh. I have identified 3 distinct problems, which I will describe below and for which I will suggest solutions. I will also reference this issue from multiple other related issues. I am happy to provide a PR once we can agree upon a suitable solution.

I derived my solution from the following, as my issue is a variant of the same issue:

https://github.com/requests/requests-oauthlib/issues/260

All of the problems start here:

https://github.com/requests/requests-oauthlib/blob/master/requests_oauthlib/oauth2_session.py#L338

    auth = kwargs.pop('auth', None)
    if client_id and client_secret and (auth is None):
        log.debug('Encoding client_id "%s" with client_secret as Basic auth credentials.', client_id)
        auth = requests.auth.HTTPBasicAuth(client_id, client_secret)
    token = self.refresh_token(
        self.auto_refresh_url, auth=auth, **kwargs)

Issue 1

If I pass an auth kwarg intended for the request() method, it is eradicated here. I am not sure if this is an issue in practice.

Issue 2

Box, Google and Onedrive do not expect the client_id and client_secret to be provided via Basic authentication ala Authorization header. They instead expect these POSTed in the body. There is no way to customize this behavior. Also, I don’t know what vendors expect what method here, I only know the 3 above as that is what I am working with.

Issue 3

Any kwargs I pass which are intended for the request() method end up being passed to refresh_token() and are used when accessing the token endpoint. For example, if my original request() uploads a file, that file will be transferred to the token endpoint when attempting to refresh the token.

Issue 4

client_id and client_secret are not normally passed to request() but are necessary for token refresh. Passing them is not a problem except for unintended consequences of them being passed along to super().request()

Proposed solution.

I think all of the above can be solved by making two changes.

  1. Pass the client_id and client_secret to __init__() where they are stored as attributes (of OAuth2Session or underlying WebApplicationClient. In fact client_id is already handled this way, but fetch_token(), refresh_token() etc. do no utilize it and expect it passed again as a kwarg. This addresses issue 4.

  2. Add a refresh_kwargs argument that accepts a dict, and provides kwargs passed along to token_refresh() thus keeping them separate from the regular kwargs intended for request(). This solves issues 1 & 3.

  3. Place the client_id and client_secret into the POST body by default, all three of the providers I have tried expect this instead of basic auth. Callers can use refresh_kwargs to provide an auth kwarg if necessary (their provider expects this instead). Optionally, if auth is passed, the client_id and client_secret could be omitted from the POST body (it is unlikely that a provider expects both). This solves issue 3.

I present my current workaround, which works perfectly for my providers.


    def __init__(self):
        # Omit token refresh args, we do this manually
        self.oauthsession = OAuth2Session(token=token, **kwargs)

    def request(self, method, url, chunk, headers={}, **kwargs):
        """
        Perform HTTP request for OAuth.
        """
        while True:
            try:
                return self.oauthsession.request(method, url, headers=headers,
                                                 **kwargs)
            except TokenExpiredError:
                # Do our own, since requests_oauthlib is broken.
                token = self.oauthsession.refresh_token(
                    self.REFRESH_TOKEN_URL,
                    refresh_token=self.oauth_access.refresh_token,
                    client_id=self.provider.client_id,
                    client_secret=self.provider.client_secret)
                self._refresh_token_callback(token)
                continue

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 1
  • Comments: 19 (5 by maintainers)

Most upvoted comments

@xmedeko Heartbleed is not a sensible thread model in this case. In particular, if get_client_secret() returns the client secret as a byte string, then that byte string is subject to garbage collection. Python provides no low-level control of memory to securely erase byte strings, so that data may remain in memory indefinitely after use.

More generally, the only way to truly avoid heartbleed style attacks is to keep secret material outside of the process in question altogether. For OAuth2, given that the client_secret is sent to the remote peer, the only way to do that is to have the TLS terminating process entirely outside of the process that actually builds the HTTP request. Essentially, none of the Python HTTP client libraries in use today are suitable against such a threat model.

The best defense against Heartbleed is to upgrade your OpenSSL.

Recently playing with our internal OAuth provider and inspecting the logs I noticed the following message: Prepared refresh token request body grant_type=refresh_token&scope=0-0-0-0-0&refresh_token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX &allow_redirects=True. The last allow_redirects=True pair is included because requests.sessions.Session#get() sets it on by default, and, as it was discussed above, all extra **kwargs of the request() method are passed to refresh_token(). As a result, basically any keyword argument of get(), post(), etc., like params or headers, besides those explicitly declared in refresh_token() is included in the body of auto refresh request. I agree that having a dedicated a parameter that allows to specify which extra arguments should be passed to refresh_token() would be a proper general-case solution. But for the time being, at least not including arbitrary parameters of request methods in the body of auto refresh seems like a reasonable workaround (#277).

Yeah, it’s a real problem. It’s why, ultimately, most “managed-memory languages” (see: Python, Go, Ruby, etc.) are bad choices for writing any code that handles keying material in cryptography. Those almost always need to be written in languages that do allow low-level control of memory, so that keys can be securely evacuated when they aren’t needed.