mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-09 02:54:09 +00:00
218 lines
8.5 KiB
Markdown
218 lines
8.5 KiB
Markdown
# JupyterHub and OAuth
|
|
|
|
JupyterHub uses OAuth 2 internally as a mechanism for authenticating users.
|
|
As such, JupyterHub itself always functions as an OAuth **provider**.
|
|
More on what that means below.
|
|
|
|
Additionally, JupyterHub is _often_ deployed with [oauthenticator](https://oauthenticator.readthedocs.io),
|
|
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
|
|
When this is the case, there are \*two nested
|
|
|
|
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
|
|
|
|
Some relevant points:
|
|
|
|
- Single-user servers _never_ need to communicate with or be aware of the upstream provider.
|
|
As far as they are concerned, only JupyterHub is an OAuth provider,
|
|
and how users authenticate with the Hub itself is irrelevant.
|
|
- When talking to a single-user server,
|
|
there are ~always two tokens:
|
|
a token issued to the server itself to communicate with the Hub API,
|
|
and a second per-user token in the browser to represent the completed login process and authorized permissions.
|
|
More on this later.
|
|
|
|
### Key OAuth terms
|
|
|
|
- **provider** the entity responsible for managing.
|
|
JupyterHub is _always_ an oauth provider for JupyterHub's components.
|
|
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an oauth provider.
|
|
- **client** An entity that requests OAuth tokens on a user's behalf.
|
|
JupyterHub _services_ or single-user _servers_ are OAuth clients of the JupyterHub _provider_.
|
|
When OAuthenticator is used, JupyterHub is itself also an OAuth _client_ for the external oauth _provider_, e.g. GitHub.
|
|
- **browser** A user's web browser, which makes requests and stores things like cookies
|
|
- **token** The secret value used to represent a user's authorization. This is the final product of the OAuth process.
|
|
|
|
### The oauth flow
|
|
|
|
OAuth flow is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorized access to a service or single-user server.
|
|
|
|
It generally goes like this:
|
|
|
|
#### Oauth request and redirect
|
|
1. A _browser_ makes an HTTP request to an oauth _client_.
|
|
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the oauth _provider_ with some extra information:
|
|
- the oauth **client id** of the client itself
|
|
- the **redirect uri** to be redirected back to after completion
|
|
- the **scopes** requested, which the user should be presented with to confirm.
|
|
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
|
|
3. During this authorize step,
|
|
the browser must be _authenticated_ with the provider.
|
|
This is often already stored in a cookie,
|
|
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
|
|
4. After the user tells the provider that they want to proceed with the authorization,
|
|
the provider records this authorization in a short-lived record called an **oauth code**.
|
|
5. Finally,
|
|
the oauth provider redirects the browser _back_ to the oauth client's "redirect uri"
|
|
(or "oauth callback uri"),
|
|
with the oauth code in a url parameter.
|
|
|
|
#### State after redirect
|
|
At this point:
|
|
|
|
- The browser is authenticated with the _provider_
|
|
- The user's authorized permissions are recorded in an _oauth code_
|
|
- The _provider_ knows that the given oauth client's requested permissions have been granted, but the client doesn't know this yet.
|
|
- All requests so far have been made directly by the browser.
|
|
No requests have originated at the client or provider.
|
|
|
|
#### OAuth Client Handles Callback Request
|
|
Now we get to finish the OAuth process.
|
|
Let's dig into what the oauth client does when it handles
|
|
the oauth callback request with the
|
|
|
|
- The OAuth client receives the _code_ and makes an API request to the _provider_ to exchange the code for a real _token_.
|
|
This is the first direct request between the OAuth _client_ and the _provider_.
|
|
- Once the token is retrieved, the client _usually_
|
|
makes a second API request to the _provider_
|
|
to retrieve information about the owner of the token (the user)
|
|
- Finally, the oauth client stores its own record that the user is authorized in a cookie.
|
|
This could be the token itself, or any other appropriate representation of successful authentication.
|
|
|
|
_phew_
|
|
|
|
So that's _one_ OAuth process.
|
|
|
|
## Full sequence of OAuth in JupyterHub
|
|
|
|
Let's go through the above oauth process in Jupyter,
|
|
with specific examples of each HTTP request and what information is contained.
|
|
|
|
Our starting point:
|
|
|
|
- a user's single-user server is running. Let's call them `danez`
|
|
- jupyterhub is running with GitHub as an oauth provider,
|
|
- Danez has a fresh browser session with no cookies yet
|
|
|
|
First request:
|
|
|
|
- browser->single-user server running JupyterLab or Jupyter Classic
|
|
- `GET /user/danez/notebooks/mynotebook.ipynb`
|
|
- no credentials, so client starts oauth process with JupyterHub
|
|
- response: 302 redirect -> `/hub/api/oauth2/authorize`
|
|
with:
|
|
- client-id=`jupyterhub-user-danez`
|
|
- redirect-uri=`/user/danez/oauth_callback` (we'll come back later!)
|
|
|
|
Second request, following redirect:
|
|
|
|
- browser->jupyterhub
|
|
- `GET /hub/api/oauth2/authorize`
|
|
- no credentials, so jupyterhub starts oauth process _with GitHub_
|
|
- response: 302 redirect -> `/hub/api/oauth2/authorize`
|
|
with:
|
|
- client-id=`jupyterhub-client-uuid`
|
|
- redirect-uri=`/hub/oauth_callback` (we'll come back later!)
|
|
|
|
Third request, following redirect:
|
|
|
|
- browser->GitHub
|
|
- `GET https://github.com/login/oauth/authorize`
|
|
|
|
Prompts for login and asks for confirmation of authorization.
|
|
|
|
After successful authorization
|
|
(either by looking up a pre-existing authorization,
|
|
or recording it via form submission)
|
|
GitHub issues oauth code and redirects to `/hub/oauth_callback?code=github-code`
|
|
|
|
Next request:
|
|
|
|
- browser->JupyterHub
|
|
- `GET /hub/oauth_callback?code=github-code`
|
|
|
|
Inside the callback handler, JupyterHub makes two API requests:
|
|
|
|
The first:
|
|
|
|
- JupyterHub->GitHub
|
|
- `POST https://github.com/login/oauth/access_token`
|
|
- request made with oauth code from url parameter
|
|
- response includes an access token
|
|
|
|
The second:
|
|
|
|
- JupyterHub->GitHub
|
|
- `GET https://api.github.com/user`
|
|
- request made with access token in the `Authorization` header
|
|
- response is the user model, including username, email, etc.
|
|
|
|
Now the oauth callback request completes with:
|
|
|
|
- set cookie on `/hub/` recording jupyterhub authentication so we don't need to do oauth with github again for a while
|
|
- redirect -> `/hub/api/oauth2/authorize`
|
|
|
|
Now, we get our first repeated request:
|
|
|
|
- browser->jupyterhub
|
|
- `GET /hub/api/oauth2/authorize`
|
|
- this time with credentials,
|
|
so jupyterhub either
|
|
1. serves the authorization confirmation page, or
|
|
2. automatically accepts authorization (shortcut taken when a user is visiting their own server)
|
|
- redirect -> `/user/danez/oauth_callback?code=jupyterhub-code`
|
|
|
|
Here, we start the same oauth callback process as before, but at Danez's single-user server
|
|
|
|
- browser->single-user server
|
|
- `GET /user/danez/oauth_callback`
|
|
|
|
(in handler)
|
|
|
|
Inside the callback handler, Danez's server makes two API requests to JupyterHub:
|
|
|
|
The first:
|
|
|
|
- single-user server->JupyterHub
|
|
- `POST /hub/api/oauth2/token`
|
|
- request made with oauth code from url parameter
|
|
- response includes an API token
|
|
|
|
The second:
|
|
|
|
- single-user server->JupyterHub
|
|
- `GET /hub/api/user`
|
|
- request made with token in the `Authorization` header
|
|
- response is the user model, including username, groups, etc.
|
|
|
|
Finally completing `GET /user/danez/oauth_callback`:
|
|
|
|
- response sets cookie, storing encrypted access token
|
|
- _finally_ redirects back to the original `/user/danez/notebooks/mynotebook.ipynb`
|
|
|
|
Final request:
|
|
|
|
- browser -> single-user server
|
|
- `GET /user/danez/notebooks/mynotebook.ipynb`
|
|
- encrypted jupyterhub token in cookie
|
|
|
|
To authenticate this request, the single token stored in the encrypted cookie is passed to the Hub for verification:
|
|
|
|
- single-user server -> Hub
|
|
- `GET /hub/api/user`
|
|
- browser's token in Authorization header
|
|
- response: user model with name, groups, etc.
|
|
|
|
If the user model matches who should be allowed (e.g. Danez),
|
|
then the request is allowed.
|
|
|
|
_the end_
|
|
|
|
## A tale of two tokens
|
|
|
|
**TODO**: discuss API token issued to server at startup and oauth-issued token in cookie, and some details of how JupyterLab currently deals with that.
|
|
`
|
|
|
|
## Notes
|
|
|
|
- I omitted some information about the distinction between tokens issued to the server, due to RBAC changes. But they are different!
|