Merge pull request #4155 from Christiandike/update/oauth.md

Add link to OAuth 2
This commit is contained in:
Min RK
2022-10-24 10:44:16 +02:00
committed by GitHub

View File

@@ -1,26 +1,26 @@
# JupyterHub and OAuth
JupyterHub uses OAuth 2 internally as a mechanism for authenticating users.
JupyterHub uses [OAuth 2](https://oauth.net/2/) as an internal mechanism for authenticating users.
As such, JupyterHub itself always functions as an OAuth **provider**.
More on what that means [below](oauth-terms).
You can find out more about what that means [below](oauth-terms).
Additionally, JupyterHub is _often_ deployed with [oauthenticator](https://oauthenticator.readthedocs.io),
Additionally, JupyterHub is _often_ deployed with [OAuthenticator](https://oauthenticator.readthedocs.io),
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
When this is the case, there are _two_ nested oauth flows:
an _internal_ oauth flow where JupyterHub is the **provider**,
and and _external_ oauth flow, where JupyterHub is a **client**.
When this is the case, there are _two_ nested OAuth flows:
an _internal_ OAuth flow where JupyterHub is the **provider**,
and an _external_ OAuth flow, where JupyterHub is the **client**.
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
Some relevant points:
The following points are noteworthy:
- Single-user servers _never_ need to communicate with or be aware of the upstream provider configured in your Authenticator.
As far as they are concerned, only JupyterHub is an OAuth provider,
As far as the servers are concerned, only JupyterHub is an OAuth provider,
and how users authenticate with the Hub itself is irrelevant.
- When talking to a single-user server,
- When interacting with a single-user server,
there are ~always two tokens:
a token issued to the server itself to communicate with the Hub API,
and a second per-user token in the browser to represent the completed login process and authorized permissions.
first, a token issued to the server itself to communicate with the Hub API,
and second, a per-user token in the browser to represent the completed login process and authorized permissions.
More on this [later](two-tokens).
(oauth-terms)=
@@ -28,64 +28,64 @@ Some relevant points:
## Key OAuth terms
Here are some key definitions to keep in mind when we are talking about OAuth.
You can also read more detail [here](https://www.oauth.com/oauth2-servers/definitions/).
You can also read more in detail [here](https://www.oauth.com/oauth2-servers/definitions/).
- **provider**: The entity responsible for managing identity and authorization,
- **provider**: The entity responsible for managing identity and authorization;
always a web server.
JupyterHub is _always_ an oauth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an oauth provider.
- **client**: An entity that requests OAuth **tokens** on a user's behalf,
JupyterHub is _always_ an OAuth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an OAuth provider.
- **client**: An entity that requests OAuth **tokens** on a user's behalf;
generally a web server of some kind.
OAuth **clients** are services that _delegate_ authentication and/or authorization
to an OAuth **provider**.
JupyterHub _services_ or single-user _servers_ are OAuth **clients** of the JupyterHub **provider**.
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external oauth **provider**, e.g. GitHub.
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external OAuth **provider**, e.g. GitHub.
- **browser**: A user's web browser, which makes requests and stores things like cookies.
- **token**: The secret value used to represent a user's authorization. This is the final product of the OAuth process.
- **code**: A short-lived temporary secret that the **client** exchanges
for a **token** at the conclusion of oauth,
in what's generally called the "oauth callback handler."
for a **token** at the conclusion of OAuth,
in what's generally called the "OAuth callback handler."
## One oauth flow
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorized access to a service or single-user server.
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorizing access to a service or single-user server.
A single oauth flow generally goes like this:
A single OAuth flow typically goes like this:
### OAuth request and redirect
1. A **browser** makes an HTTP request to an oauth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the oauth **provider** with some extra information:
- the oauth **client id** of the client itself.
- the **redirect uri** to be redirected back to after completion.
1. A **browser** makes an HTTP request to an OAuth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the OAuth **provider** with some extra information:
- the OAuth **client ID** of the client itself.
- the **redirect URI** to be redirected back to after completion.
- the **scopes** requested, which the user should be presented with to confirm.
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
3. During this authorize step,
the browser must be _authenticated_ with the provider.
This is often already stored in a cookie,
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
This _may_ even begin another oauth flow!
This _may_ even begin another OAuth flow!
4. After the user tells the provider that they want to proceed with the authorization,
the provider records this authorization in a short-lived record called an **oauth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect uri"
(or "oauth callback uri"),
with the oauth code in a url parameter.
the provider records this authorization in a short-lived record called an **OAuth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect URI"
(or "OAuth callback URI"),
with the OAuth code in a URL parameter.
That's the end of the requests made between the **browser** and the **provider**.
That marks the end of the requests made between the **browser** and the **provider**.
### State after redirect
At this point:
- The browser is authenticated with the _provider_.
- The user's authorized permissions are recorded in an _oauth code_.
- The _provider_ knows that the given oauth client's requested permissions have been granted, but the client doesn't know this yet.
- All requests so far have been made directly by the browser.
No requests have originated at the client or provider.
- The user's authorized permissions are recorded in an _OAuth code_.
- The _provider_ knows that the permissions requested by the OAuth client have been granted, but the client doesn't know this yet.
- All the requests so far have been made directly by the browser.
No requests have originated from the client or provider.
### OAuth Client Handles Callback Request
Now we get to finish the OAuth process.
At this stage, we get to finish the OAuth process.
Let's dig into what the OAuth client does when it handles
the OAuth callback request.
@@ -95,12 +95,12 @@ the OAuth callback request.
makes a second API request to the _provider_
to retrieve information about the owner of the token (the user).
This is the step where behavior diverges for different OAuth providers.
Up to this point, all oauth providers are the same, following the oauth specification.
However, oauth does not define a standard for exchanging tokens for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
Up to this point, all OAuth providers are the same, following the OAuth specification.
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
so this step may be different for each OAuth provider.
- Finally, the oauth client stores its own record that the user is authorized in a cookie.
- Finally, the OAuth client stores its own record that the user is authorized in a cookie.
This could be the token itself, or any other appropriate representation of successful authentication.
- Last of all, now that credentials have been established,
- Now that credentials have been established,
the browser can be redirected to the _original_ URL where it started,
to try the request again.
If the client wasn't able to keep track of the original URL all this time
@@ -114,7 +114,7 @@ So that's _one_ OAuth process.
## Full sequence of OAuth in JupyterHub
Let's go through the above OAuth process in JupyterHub,
with specific examples of each HTTP request and what information is contained.
with specific examples of each HTTP request and what information it contains.
For bonus points, we are using the double-OAuth example of JupyterHub configured with GitHubOAuthenticator.
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal OAuth,"
@@ -184,7 +184,7 @@ The first:
- JupyterHub->GitHub
- `POST https://github.com/login/oauth/access_token`
- request made with oauth **code** from url parameter
- request made with OAuth **code** from URL parameter
- response includes an access **token**
The second:
@@ -271,15 +271,15 @@ To handle this, OAuth tokens and the various places they are stored can _expire_
which should have the same effect as no credentials,
and trigger the authorization process again.
In JupyterHub's internal oauth, we have these layers of information that can go stale:
In JupyterHub's internal OAuth, we have these layers of information that can go stale:
- The oauth client has a **cache** of Hub responses for tokens,
- The OAuth client has a **cache** of Hub responses for tokens,
so it doesn't need to make API requests to the Hub for every request it receives.
This cache has an expiry of five minutes by default,
and is governed by the configuration `HubAuth.cache_max_age` in the single-user server.
- The internal oauth token is stored in a cookie, which has its own expiry (default: 14 days),
- The internal OAuth token is stored in a cookie, which has its own expiry (default: 14 days),
governed by `JupyterHub.cookie_max_age_days`.
- The internal oauth token can also itself expire,
- The internal OAuth token itself can also expire,
which is by default the same as the cookie expiry,
since it makes sense for the token itself and the place it is stored to expire at the same time.
This is governed by `JupyterHub.cookie_max_age_days` first,
@@ -317,9 +317,9 @@ triggering the external login process anew before letting a user proceed.
- If the token has expired, but is still in the cookie:
when the token response cache expires,
the next time the server asks the hub about the token,
no user will be identified and the internal oauth process begins again.
no user will be identified and the internal OAuth process begins again.
- If the token _cookie_ expires, the next browser request will be made with no credentials,
and the internal oauth process will begin again.
and the internal OAuth process will begin again.
This will usually have the form of a transparent redirect browsers won't notice.
However, if this occurs on an API request in a long-lived page visit
such as a JupyterLab session, the API request may fail and require
@@ -352,7 +352,7 @@ Logging out of JupyterHub means clearing and revoking many of these credentials:
### A tale of two tokens
**TODO**: discuss API token issued to server at startup ($JUPYTERHUB_API_TOKEN)
and oauth-issued token in the cookie,
and OAuth-issued token in the cookie,
and some details of how JupyterLab currently deals with that.
They are different, and JupyterLab should be making requests using the token from the cookie,
not the token from the server,