mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-17 15:03:02 +00:00
update oauth.md
This commit is contained in:
@@ -1,26 +1,26 @@
|
||||
# JupyterHub and OAuth
|
||||
|
||||
JupyterHub uses OAuth 2 internally as a mechanism for authenticating users.
|
||||
JupyterHub uses OAuth 2 as an internal mechanism for authenticating users.
|
||||
As such, JupyterHub itself always functions as an OAuth **provider**.
|
||||
More on what that means [below](oauth-terms).
|
||||
You can find out more about what that means [below](oauth-terms).
|
||||
|
||||
Additionally, JupyterHub is _often_ deployed with [oauthenticator](https://oauthenticator.readthedocs.io),
|
||||
Additionally, JupyterHub is _often_ deployed with [OAuthenticator](https://oauthenticator.readthedocs.io),
|
||||
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
|
||||
When this is the case, there are _two_ nested oauth flows:
|
||||
an _internal_ oauth flow where JupyterHub is the **provider**,
|
||||
and and _external_ oauth flow, where JupyterHub is a **client**.
|
||||
When this is the case, there are _two_ nested OAuth flows:
|
||||
an _internal_ OAuth flow where JupyterHub is the **provider**,
|
||||
and an _external_ OAuth flow, where JupyterHub is the **client**.
|
||||
|
||||
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
|
||||
|
||||
Some relevant points:
|
||||
The following points are noteworthy:
|
||||
|
||||
- Single-user servers _never_ need to communicate with or be aware of the upstream provider configured in your Authenticator.
|
||||
As far as they are concerned, only JupyterHub is an OAuth provider,
|
||||
As far as the servers are concerned, only JupyterHub is an OAuth provider,
|
||||
and how users authenticate with the Hub itself is irrelevant.
|
||||
- When talking to a single-user server,
|
||||
- When interacting with a single-user server,
|
||||
there are ~always two tokens:
|
||||
a token issued to the server itself to communicate with the Hub API,
|
||||
and a second per-user token in the browser to represent the completed login process and authorized permissions.
|
||||
first, a token issued to the server itself to communicate with the Hub API,
|
||||
and second, a per-user token in the browser to represent the completed login process and authorized permissions.
|
||||
More on this [later](two-tokens).
|
||||
|
||||
(oauth-terms)=
|
||||
@@ -28,64 +28,64 @@ Some relevant points:
|
||||
## Key OAuth terms
|
||||
|
||||
Here are some key definitions to keep in mind when we are talking about OAuth.
|
||||
You can also read more detail [here](https://www.oauth.com/oauth2-servers/definitions/).
|
||||
You can also read more in detail [here](https://www.oauth.com/oauth2-servers/definitions/).
|
||||
|
||||
- **provider**: The entity responsible for managing identity and authorization,
|
||||
- **provider**: The entity responsible for managing identity and authorization;
|
||||
always a web server.
|
||||
JupyterHub is _always_ an oauth provider for JupyterHub's components.
|
||||
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an oauth provider.
|
||||
- **client**: An entity that requests OAuth **tokens** on a user's behalf,
|
||||
JupyterHub is _always_ an OAuth provider for JupyterHub's components.
|
||||
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an OAuth provider.
|
||||
- **client**: An entity that requests OAuth **tokens** on a user's behalf;
|
||||
generally a web server of some kind.
|
||||
OAuth **clients** are services that _delegate_ authentication and/or authorization
|
||||
to an OAuth **provider**.
|
||||
JupyterHub _services_ or single-user _servers_ are OAuth **clients** of the JupyterHub **provider**.
|
||||
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external oauth **provider**, e.g. GitHub.
|
||||
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external OAuth **provider**, e.g. GitHub.
|
||||
- **browser**: A user's web browser, which makes requests and stores things like cookies.
|
||||
- **token**: The secret value used to represent a user's authorization. This is the final product of the OAuth process.
|
||||
- **code**: A short-lived temporary secret that the **client** exchanges
|
||||
for a **token** at the conclusion of oauth,
|
||||
in what's generally called the "oauth callback handler."
|
||||
for a **token** at the conclusion of OAuth,
|
||||
in what's generally called the "OAuth callback handler."
|
||||
|
||||
## One oauth flow
|
||||
|
||||
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorized access to a service or single-user server.
|
||||
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorizing access to a service or single-user server.
|
||||
|
||||
A single oauth flow generally goes like this:
|
||||
A single OAuth flow typically goes like this:
|
||||
|
||||
### OAuth request and redirect
|
||||
|
||||
1. A **browser** makes an HTTP request to an oauth **client**.
|
||||
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the oauth **provider** with some extra information:
|
||||
- the oauth **client id** of the client itself.
|
||||
- the **redirect uri** to be redirected back to after completion.
|
||||
1. A **browser** makes an HTTP request to an OAuth **client**.
|
||||
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the OAuth **provider** with some extra information:
|
||||
- the OAuth **client ID** of the client itself.
|
||||
- the **redirect URI** to be redirected back to after completion.
|
||||
- the **scopes** requested, which the user should be presented with to confirm.
|
||||
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
|
||||
3. During this authorize step,
|
||||
the browser must be _authenticated_ with the provider.
|
||||
This is often already stored in a cookie,
|
||||
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
|
||||
This _may_ even begin another oauth flow!
|
||||
This _may_ even begin another OAuth flow!
|
||||
4. After the user tells the provider that they want to proceed with the authorization,
|
||||
the provider records this authorization in a short-lived record called an **oauth code**.
|
||||
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect uri"
|
||||
(or "oauth callback uri"),
|
||||
with the oauth code in a url parameter.
|
||||
the provider records this authorization in a short-lived record called an **OAuth code**.
|
||||
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect URI"
|
||||
(or "OAuth callback URI"),
|
||||
with the OAuth code in a URL parameter.
|
||||
|
||||
That's the end of the requests made between the **browser** and the **provider**.
|
||||
That marks the end of the requests made between the **browser** and the **provider**.
|
||||
|
||||
### State after redirect
|
||||
|
||||
At this point:
|
||||
|
||||
- The browser is authenticated with the _provider_.
|
||||
- The user's authorized permissions are recorded in an _oauth code_.
|
||||
- The _provider_ knows that the given oauth client's requested permissions have been granted, but the client doesn't know this yet.
|
||||
- All requests so far have been made directly by the browser.
|
||||
No requests have originated at the client or provider.
|
||||
- The user's authorized permissions are recorded in an _OAuth code_.
|
||||
- The _provider_ knows that the permissions requested by the OAuth client have been granted, but the client doesn't know this yet.
|
||||
- All the requests so far have been made directly by the browser.
|
||||
No requests have originated from the client or provider.
|
||||
|
||||
### OAuth Client Handles Callback Request
|
||||
|
||||
Now we get to finish the OAuth process.
|
||||
At this stage, we get to finish the OAuth process.
|
||||
Let's dig into what the OAuth client does when it handles
|
||||
the OAuth callback request.
|
||||
|
||||
@@ -95,12 +95,12 @@ the OAuth callback request.
|
||||
makes a second API request to the _provider_
|
||||
to retrieve information about the owner of the token (the user).
|
||||
This is the step where behavior diverges for different OAuth providers.
|
||||
Up to this point, all oauth providers are the same, following the oauth specification.
|
||||
However, oauth does not define a standard for exchanging tokens for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
|
||||
Up to this point, all OAuth providers are the same, following the OAuth specification.
|
||||
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
|
||||
so this step may be different for each OAuth provider.
|
||||
- Finally, the oauth client stores its own record that the user is authorized in a cookie.
|
||||
- Finally, the OAuth client stores its own record that the user is authorized in a cookie.
|
||||
This could be the token itself, or any other appropriate representation of successful authentication.
|
||||
- Last of all, now that credentials have been established,
|
||||
- Now that credentials have been established,
|
||||
the browser can be redirected to the _original_ URL where it started,
|
||||
to try the request again.
|
||||
If the client wasn't able to keep track of the original URL all this time
|
||||
@@ -114,7 +114,7 @@ So that's _one_ OAuth process.
|
||||
## Full sequence of OAuth in JupyterHub
|
||||
|
||||
Let's go through the above OAuth process in JupyterHub,
|
||||
with specific examples of each HTTP request and what information is contained.
|
||||
with specific examples of each HTTP request and what information it contains.
|
||||
For bonus points, we are using the double-OAuth example of JupyterHub configured with GitHubOAuthenticator.
|
||||
|
||||
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal OAuth,"
|
||||
@@ -184,7 +184,7 @@ The first:
|
||||
|
||||
- JupyterHub->GitHub
|
||||
- `POST https://github.com/login/oauth/access_token`
|
||||
- request made with oauth **code** from url parameter
|
||||
- request made with OAuth **code** from URL parameter
|
||||
- response includes an access **token**
|
||||
|
||||
The second:
|
||||
@@ -271,15 +271,15 @@ To handle this, OAuth tokens and the various places they are stored can _expire_
|
||||
which should have the same effect as no credentials,
|
||||
and trigger the authorization process again.
|
||||
|
||||
In JupyterHub's internal oauth, we have these layers of information that can go stale:
|
||||
In JupyterHub's internal OAuth, we have these layers of information that can go stale:
|
||||
|
||||
- The oauth client has a **cache** of Hub responses for tokens,
|
||||
- The OAuth client has a **cache** of Hub responses for tokens,
|
||||
so it doesn't need to make API requests to the Hub for every request it receives.
|
||||
This cache has an expiry of five minutes by default,
|
||||
and is governed by the configuration `HubAuth.cache_max_age` in the single-user server.
|
||||
- The internal oauth token is stored in a cookie, which has its own expiry (default: 14 days),
|
||||
- The internal OAuth token is stored in a cookie, which has its own expiry (default: 14 days),
|
||||
governed by `JupyterHub.cookie_max_age_days`.
|
||||
- The internal oauth token can also itself expire,
|
||||
- The internal OAuth token itself can also expire,
|
||||
which is by default the same as the cookie expiry,
|
||||
since it makes sense for the token itself and the place it is stored to expire at the same time.
|
||||
This is governed by `JupyterHub.cookie_max_age_days` first,
|
||||
@@ -317,9 +317,9 @@ triggering the external login process anew before letting a user proceed.
|
||||
- If the token has expired, but is still in the cookie:
|
||||
when the token response cache expires,
|
||||
the next time the server asks the hub about the token,
|
||||
no user will be identified and the internal oauth process begins again.
|
||||
no user will be identified and the internal OAuth process begins again.
|
||||
- If the token _cookie_ expires, the next browser request will be made with no credentials,
|
||||
and the internal oauth process will begin again.
|
||||
and the internal OAuth process will begin again.
|
||||
This will usually have the form of a transparent redirect browsers won't notice.
|
||||
However, if this occurs on an API request in a long-lived page visit
|
||||
such as a JupyterLab session, the API request may fail and require
|
||||
@@ -352,7 +352,7 @@ Logging out of JupyterHub means clearing and revoking many of these credentials:
|
||||
### A tale of two tokens
|
||||
|
||||
**TODO**: discuss API token issued to server at startup ($JUPYTERHUB_API_TOKEN)
|
||||
and oauth-issued token in the cookie,
|
||||
and OAuth-issued token in the cookie,
|
||||
and some details of how JupyterLab currently deals with that.
|
||||
They are different, and JupyterLab should be making requests using the token from the cookie,
|
||||
not the token from the server,
|
||||
|
Reference in New Issue
Block a user