Restructured references section of the docs

This commit is contained in:
alwasega
2023-02-09 12:25:07 +03:00
parent 8ef43941e8
commit 1837c33a56
24 changed files with 145 additions and 93 deletions

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,29 @@
# Configuration Reference
:::{important}
Make sure the version of JupyterHub for this documentation matches your
installation version, as the output of this command may change between versions.
:::
## JupyterHub configuration
As explained in the [Configuration Basics](generate-config-file)
section, the `jupyterhub_config.py` can be automatically generated via
> ```bash
> jupyterhub --generate-config
> ```
The following contains the output of that command for reference.
```{eval-rst}
.. jupyterhub-generate-config::
```
## JupyterHub help command output
This section contains the output of the command `jupyterhub --help-all`.
```{eval-rst}
.. jupyterhub-help-all::
```

View File

@@ -0,0 +1,413 @@
# Services
## Definition of a Service
When working with JupyterHub, a **Service** is defined as a process that interacts
with the Hub's REST API. A Service may perform a specific
action or task. For example, the following tasks can each be a unique Service:
- shutting down individuals' single user notebook servers that have been idle
for some time
- registering additional web servers which should use the Hub's authentication
and be served behind the Hub's proxy.
Two key features help define a Service:
- Is the Service **managed** by JupyterHub?
- Does the Service have a web server that should be added to the proxy's
table?
Currently, these characteristics distinguish two types of Services:
- A **Hub-Managed Service** which is managed by JupyterHub
- An **Externally-Managed Service** which runs its own web server and
communicates operation instructions via the Hub's API.
## Properties of a Service
A Service may have the following properties:
- `name: str` - the name of the service
- `admin: bool (default - false)` - whether the service should have
administrative privileges
- `url: str (default - None)` - The URL where the service is/should be. If a
url is specified for where the Service runs its own web server,
the service will be added to the proxy at `/services/:name`
- `api_token: str (default - None)` - For Externally-Managed Services you need to specify
an API token to perform API requests to the Hub
- `display: bool (default - True)` - When set to true, display a link to the
service's URL under the 'Services' dropdown in user's hub home page.
- `oauth_no_confirm: bool (default - False)` - When set to true,
skip the OAuth confirmation page when users access this service.
By default, when users authenticate with a service using JupyterHub,
they are prompted to confirm that they want to grant that service
access to their credentials.
Skipping the confirmation page is useful for admin-managed services that are considered part of the Hub
and shouldn't need extra prompts for login.
If a service is also to be managed by the Hub, it has a few extra options:
- `command: (str/Popen list)` - Command for JupyterHub to spawn the service. - Only use this if the service should be a subprocess. - If command is not specified, the Service is assumed to be managed
externally. - If a command is specified for launching the Service, the Service will
be started and managed by the Hub.
- `environment: dict` - additional environment variables for the Service.
- `user: str` - the name of a system user to manage the Service. If
unspecified, run as the same user as the Hub.
## Hub-Managed Services
A **Hub-Managed Service** is started by the Hub, and the Hub is responsible
for the Service's actions. A Hub-Managed Service can only be a local
subprocess of the Hub. The Hub will take care of starting the process and
restart the service if the service stops.
While Hub-Managed Services share some similarities with notebook Spawners,
there are no plans for Hub-Managed Services to support the same spawning
abstractions as a notebook Spawner.
If you wish to run a Service in a Docker container or other deployment
environments, the Service can be registered as an
**Externally-Managed Service**, as described below.
## Launching a Hub-Managed Service
A Hub-Managed Service is characterized by its specified `command` for launching
the Service. For example, a 'cull idle' notebook server task configured as a
Hub-Managed Service would include:
- the Service name,
- admin permissions, and
- the `command` to launch the Service which will cull idle servers after a
timeout interval
This example would be configured as follows in `jupyterhub_config.py`:
```python
c.JupyterHub.load_roles = [
{
"name": "idle-culler",
"scopes": [
"read:users:activity", # read user last_activity
"servers", # start and stop servers
# 'admin:users' # needed if culling idle users as well
]
}
]
c.JupyterHub.services = [
{
'name': 'idle-culler',
'command': [sys.executable, '-m', 'jupyterhub_idle_culler', '--timeout=3600']
}
]
```
A Hub-Managed Service may also be configured with additional optional
parameters, which describe the environment needed to start the Service process:
- `environment: dict` - additional environment variables for the Service.
- `user: str` - name of the user to run the server if different from the Hub.
Requires Hub to be root.
- `cwd: path` directory in which to run the Service, if different from the
Hub directory.
The Hub will pass the following environment variables to launch the Service:
(service-env)=
```bash
JUPYTERHUB_SERVICE_NAME: The name of the service
JUPYTERHUB_API_TOKEN: API token assigned to the service
JUPYTERHUB_API_URL: URL for the JupyterHub API (default, http://127.0.0.1:8080/hub/api)
JUPYTERHUB_BASE_URL: Base URL of the Hub (https://mydomain[:port]/)
JUPYTERHUB_SERVICE_PREFIX: URL path prefix of this service (/services/:service-name/)
JUPYTERHUB_SERVICE_URL: Local URL where the service is expected to be listening.
Only for proxied web services.
JUPYTERHUB_OAUTH_SCOPES: JSON-serialized list of scopes to use for allowing access to the service
(deprecated in 3.0, use JUPYTERHUB_OAUTH_ACCESS_SCOPES).
JUPYTERHUB_OAUTH_ACCESS_SCOPES: JSON-serialized list of scopes to use for allowing access to the service (new in 3.0).
JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES: JSON-serialized list of scopes that can be requested by the oauth client on behalf of users (new in 3.0).
```
For the previous 'cull idle' Service example, these environment variables
would be passed to the Service when the Hub starts the 'cull idle' Service:
```bash
JUPYTERHUB_SERVICE_NAME: 'idle-culler'
JUPYTERHUB_API_TOKEN: API token assigned to the service
JUPYTERHUB_API_URL: http://127.0.0.1:8080/hub/api
JUPYTERHUB_BASE_URL: https://mydomain[:port]
JUPYTERHUB_SERVICE_PREFIX: /services/idle-culler/
```
See the GitHub repo for additional information about the [jupyterhub_idle_culler][].
## Externally-Managed Services
You may prefer to use your own service management tools, such as Docker or
systemd, to manage a JupyterHub Service. These **Externally-Managed
Services**, unlike Hub-Managed Services, are not subprocesses of the Hub. You
must tell JupyterHub which API token the Externally-Managed Service is using
to perform its API requests. Each Externally-Managed Service will need a
unique API token, because the Hub authenticates each API request and the API
token is used to identify the originating Service or user.
A configuration example of an Externally-Managed Service with admin access and
running its own web server is:
```python
c.JupyterHub.services = [
{
'name': 'my-web-service',
'url': 'https://10.0.1.1:1984',
# any secret >8 characters, you'll use api_token to
# authenticate api requests to the hub from your service
'api_token': 'super-secret',
}
]
```
In this case, the `url` field will be passed along to the Service as
`JUPYTERHUB_SERVICE_URL`.
## Writing your own Services
When writing your own services, you have a few decisions to make (in addition
to what your service does!):
1. Does my service need a public URL?
2. Do I want JupyterHub to start/stop the service?
3. Does my service need to authenticate users?
When a Service is managed by JupyterHub, the Hub will pass the necessary
information to the Service via the environment variables described above. A
flexible Service, whether managed by the Hub or not, can make use of these
same environment variables.
When you run a service that has a URL, it will be accessible under a
`/services/` prefix, such as `https://myhub.horse/services/my-service/`. For
your service to route proxied requests properly, it must take
`JUPYTERHUB_SERVICE_PREFIX` into account when routing requests. For example, a
web service would normally service its root handler at `'/'`, but the proxied
service would need to serve `JUPYTERHUB_SERVICE_PREFIX`.
Note that `JUPYTERHUB_SERVICE_PREFIX` will contain a trailing slash. This must
be taken into consideration when creating the service routes. If you include an
extra slash you might get unexpected behavior. For example if your service has a
`/foo` endpoint, the route would be `JUPYTERHUB_SERVICE_PREFIX + foo`, and
`/foo/bar` would be `JUPYTERHUB_SERVICE_PREFIX + foo/bar`.
## Hub Authentication and Services
JupyterHub provides some utilities for using the Hub's authentication
mechanism to govern access to your service.
Requests to all JupyterHub services are made with OAuth tokens.
These can either be requests with a token in the `Authorization` header,
or url parameter `?token=...`,
or browser requests which must complete the OAuth authorization code flow,
which results in a token that should be persisted for future requests
(persistence is up to the service,
but an encrypted cookie confined to the service path is appropriate,
and provided by default).
:::{versionchanged} 2.0
The shared `jupyterhub-services` cookie is removed.
OAuth must be used to authenticate browser requests with services.
:::
JupyterHub includes a reference implementation of Hub authentication that
can be used by services. You may go beyond this reference implementation and
create custom hub-authenticating clients and services. We describe the process
below.
The reference, or base, implementation is the {class}`.HubAuth` class,
which implements the API requests to the Hub that resolve a token to a User model.
There are two levels of authentication with the Hub:
- {class}`.HubAuth` - the most basic authentication,
for services that should only accept API requests authorized with a token.
- {class}`.HubOAuth` - For services that should use oauth to authenticate with the Hub.
This should be used for any service that serves pages that should be visited with a browser.
To use HubAuth, you must set the `.api_token` instance variable. This can be
done either programmatically when constructing the class, or via the
`JUPYTERHUB_API_TOKEN` environment variable. A number of the examples in the
root of the jupyterhub git repository set the `JUPYTERHUB_API_TOKEN` variable
so consider having a look at those for futher reading
([cull-idle](https://github.com/jupyterhub/jupyterhub/tree/master/examples/cull-idle),
[external-oauth](https://github.com/jupyterhub/jupyterhub/tree/master/examples/external-oauth),
[service-notebook](https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-notebook)
and [service-whoiami](https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-whoami))
(TODO: Where is this API TOKen set?)
Most of the logic for authentication implementation is found in the
{meth}`.HubAuth.user_for_token` methods,
which makes a request of the Hub, and returns:
- None, if no user could be identified, or
- a dict of the following form:
```python
{
"name": "username",
"groups": ["list", "of", "groups"],
"scopes": [
"access:servers!server=username/",
],
}
```
You are then free to use the returned user information to take appropriate
action.
HubAuth also caches the Hub's response for a number of seconds,
configurable by the `cookie_cache_max_age` setting (default: five minutes).
If your service would like to make further requests _on behalf of users_,
it should use the token issued by this OAuth process.
If you are using tornado,
you can access the token authenticating the current request with {meth}`.HubAuth.get_token`.
:::{versionchanged} 2.2
{meth}`.HubAuth.get_token` adds support for retrieving
tokens stored in tornado cookies after the completion of OAuth.
Previously, it only retrieved tokens from URL parameters or the Authorization header.
Passing `get_token(handler, in_cookie=False)` preserves this behavior.
:::
### Flask Example
For example, you have a Flask service that returns information about a user.
JupyterHub's HubAuth class can be used to authenticate requests to the Flask
service. See the `service-whoami-flask` example in the
[JupyterHub GitHub repo](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-whoami-flask)
for more details.
```{literalinclude} ../../../../../../jupyterhub/examples/service-whoami-flask/whoami-flask.py
:language: python
```
### Authenticating tornado services with JupyterHub
Since most Jupyter services are written with tornado,
we include a mixin class, [`HubOAuthenticated`][huboauthenticated],
for quickly authenticating your own tornado services with JupyterHub.
Tornado's {py:func}`~.tornado.web.authenticated` decorator calls a Handler's {py:meth}`~.tornado.web.RequestHandler.get_current_user`
method to identify the user. Mixing in {class}`.HubAuthenticated` defines
{meth}`~.HubAuthenticated.get_current_user` to use HubAuth. If you want to configure the HubAuth
instance beyond the default, you'll want to define an {py:meth}`~.tornado.web.RequestHandler.initialize` method,
such as:
```python
class MyHandler(HubOAuthenticated, web.RequestHandler):
def initialize(self, hub_auth):
self.hub_auth = hub_auth
@web.authenticated
def get(self):
...
```
The HubAuth class will automatically load the desired configuration from the Service
[environment variables](service-env).
:::{versionchanged} 2.0
Access scopes are used to govern access to services.
Prior to 2.0,
sets of users and groups could be used to grant access
by defining `.hub_groups` or `.hub_users` on the authenticated handler.
These are ignored if the 2.0 `.hub_scopes` is defined.
:::
:::{seealso}
{meth}`.HubAuth.check_scopes`
:::
### Implementing your own Authentication with JupyterHub
If you don't want to use the reference implementation
(e.g. you find the implementation a poor fit for your Flask app),
you can implement authentication via the Hub yourself.
JupyterHub is a standard OAuth2 provider,
so you can use any OAuth 2 client implementation appropriate for your toolkit.
See the [FastAPI example][] for an example of using JupyterHub as an OAuth provider with [FastAPI][],
without using any code imported from JupyterHub.
On completion of OAuth, you will have an access token for JupyterHub,
which can be used to identify the user and the permissions (scopes)
the user has authorized for your service.
You will only get to this stage if the user has the required `access:services!service=$service-name` scope.
To retrieve the user model for the token, make a request to `GET /hub/api/user` with the token in the Authorization header.
For example, using flask:
```{literalinclude} ../../../../../../jupyterhub/examples/service-whoami-flask/whoami-flask.py
:language: python
```
We recommend looking at the [`HubOAuth`][huboauth] class implementation for reference,
and taking note of the following process:
1. retrieve the token from the request.
2. Make an API request `GET /hub/api/user`,
with the token in the `Authorization` header.
For example, with [requests][]:
```python
r = requests.get(
"http://127.0.0.1:8081/hub/api/user",
headers = {
'Authorization' : f'token {api_token}',
},
)
r.raise_for_status()
user = r.json()
```
3. On success, the reply will be a JSON model describing the user:
```python
{
"name": "inara",
# groups may be omitted, depending on permissions
"groups": ["serenity", "guild"],
# scopes is new in JupyterHub 2.0
"scopes": [
"access:services",
"read:users:name",
"read:users!user=inara",
"..."
]
}
```
The `scopes` field can be used to manage access.
Note: a user will have access to a service to complete oauth access to the service for the first time.
Individual permissions may be revoked at any later point without revoking the token,
in which case the `scopes` field in this model should be checked on each access.
The default required scopes for access are available from `hub_auth.oauth_scopes` or `$JUPYTERHUB_OAUTH_ACCESS_SCOPES`.
An example of using an Externally-Managed Service and authentication is
in the [nbviewer README][nbviewer example] section on securing the notebook viewer,
and an example of its configuration is found [here](https://github.com/jupyter/nbviewer/blob/ed942b10a52b6259099e2dd687930871dc8aac22/nbviewer/providers/base.py#L95).
nbviewer can also be run as a Hub-Managed Service as described [nbviewer README][nbviewer example]
section on securing the notebook viewer.
[requests]: https://docs.python-requests.org/en/master/
[services_auth]: ../api/services.auth.html
[nbviewer example]: https://github.com/jupyter/nbviewer#securing-the-notebook-viewer
[fastapi example]: https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-fastapi
[fastapi]: https://fastapi.tiangolo.com
[jupyterhub_idle_culler]: https://github.com/jupyterhub/jupyterhub-idle-culler

View File

@@ -0,0 +1,254 @@
(jupyterhub-url)=
# JupyterHub URL scheme
This document describes how JupyterHub routes requests.
This does not include the [REST API](using-jupyterhub-rest-api) URLs.
In general, all URLs can be prefixed with `c.JupyterHub.base_url` to
run the whole JupyterHub application on a prefix.
All authenticated handlers redirect to `/hub/login` to log-in users
before being redirected back to the originating page.
The returned request should preserve all query parameters.
## `/`
The top-level request is always a simple redirect to `/hub/`,
to be handled by the default JupyterHub handler.
In general, all requests to `/anything` that do not start with `/hub/`
but are routed to the Hub, will be redirected to `/hub/anything` before being handled by the Hub.
## `/hub/`
This is an authenticated URL.
This handler redirects users to the default URL of the application,
which defaults to the user's default server.
That is, the handler redirects to `/hub/spawn` if the user's server is not running,
or to the server itself (`/user/:name`) if the server is running.
This default URL behavior can be customized in two ways:
First, to redirect users to the JupyterHub home page (`/hub/home`)
instead of spawning their server,
set `redirect_to_server` to False:
```python
c.JupyterHub.redirect_to_server = False
```
This might be useful if you have a Hub where you expect
users to be managing multiple server configurations
but automatic spawning is not desirable.
Second, you can customise the landing page to any page you like,
such as a custom service you have deployed e.g. with course information:
```python
c.JupyterHub.default_url = '/services/my-landing-service'
```
## `/hub/home`
![The Hub home page with named servers enabled](/images/named-servers-home.png)
By default, the Hub home page has just one or two buttons
for starting and stopping the user's server.
If named servers are enabled, there will be some additional
tools for management of the named servers.
_Version added: 1.0_ named server UI is new in 1.0.
## `/hub/login`
This is the JupyterHub login page.
If you have a form-based username+password login,
such as the default [PAMAuthenticator](https://en.wikipedia.org/wiki/Pluggable_authentication_module),
this page will render the login form.
![A login form](/images/login-form.png)
If login is handled by an external service,
e.g. with OAuth, this page will have a button,
declaring "Log in with ..." which users can click
to log in with the chosen service.
![A login redirect button](/images/login-button.png)
If you want to skip the user interaction and initiate login
via the button, you can set:
```python
c.Authenticator.auto_login = True
```
This can be useful when the user is "already logged in" via some mechanism.
However, a handshake via `redirects` is necessary to complete the authentication with JupyterHub.
## `/hub/logout`
Visiting `/hub/logout` clears [cookies](https://en.wikipedia.org/wiki/HTTP_cookie) from the current browser.
Note that **logging out does not stop a user's server(s)** by default.
If you would like to shut down user servers on logout,
you can enable this behavior with:
```python
c.JupyterHub.shutdown_on_logout = True
```
Be careful with this setting because logging out one browser
does not mean the user is no longer actively using their server from another machine.
## `/user/:username[/:servername]`
If a user's server is running, this URL is handled by the user's given server,
not by the Hub.
The username is the first part, and if using named servers,
the server name is the second part.
If the user's server is _not_ running, this will be redirected to `/hub/user/:username/...`
## `/hub/user/:username[/:servername]`
This URL indicates a request for a user server that is not running
(because `/user/...` would have been handled by the notebook server
if the specified server were running).
Handling this URL depends on two conditions: whether a requested user is found
as a match and the state of the requested user's notebook server,
for example:
1. the server is not active
a. user matches
b. user doesn't match
2. the server is ready
3. the server is pending, but not ready
If the server is pending spawn,
the browser will be redirected to `/hub/spawn-pending/:username/:servername`
to see a progress page while waiting for the server to be ready.
If the server is not active at all,
a page will be served with a link to `/hub/spawn/:username/:servername`.
Following that link will launch the requested server.
The HTTP status will be 503 in this case because a request has been made for a server that is not running.
If the server is ready, it is assumed that the proxy has not yet registered the route.
Some checks are performed and a delay is added before redirecting back to `/user/:username/:servername/...`.
If something is really wrong, this can result in a redirect loop.
Visiting this page will never result in triggering the spawn of servers
without additional user action (i.e. clicking the link on the page).
![Visiting a URL for a server that's not running](/images/not-running.png)
_Version changed: 1.0_
Prior to 1.0, this URL itself was responsible for spawning servers.
If the progress page was pending, the URL redirected it to running servers.
This was useful because it made sure that the requested servers were restarted after they stopped.
However, it could also be harmful because unused servers would continuously be restarted if e.g.
an idle JupyterLab frontend that constantly makes polling requests was openly pointed at it.
### Special handling of API requests
Requests to `/user/:username[/:servername]/api/...` are assumed to be
from applications connected to stopped servers.
These requests fail with a `503` status code and an informative JSON error message
that indicates how to spawn the server.
This is meant to help applications such as JupyterLab,
that are connected to a server that has stopped.
_Version changed: 1.0_
JupyterHub version 0.9 failed these API requests with status `404`,
but version 1.0 uses 503.
## `/user-redirect/...`
The `/user-redirect/...` URL is for sharing a URL that will redirect a user
to a path on their own default server.
This is useful when different users have the same file at the same URL on their servers,
and you want a single link to give to any user that will open that file on their server.
e.g. a link to `/user-redirect/notebooks/Index.ipynb`
will send user `hortense` to `/user/hortense/notebooks/Index.ipynb`
**DO NOT** share links to your own server with other users.
This will not work in general,
unless you grant those users access to your server.
**Contributions welcome:** The JupyterLab "shareable link" should share this link
when run with JupyterHub, but it does not.
See [jupyterlab-hub](https://github.com/jupyterhub/jupyterlab-hub)
where this should probably be done and
[this issue in JupyterLab](https://github.com/jupyterlab/jupyterlab/issues/5388)
that is intended to make it possible.
## Spawning
### `/hub/spawn[/:username[/:servername]]`
Requesting `/hub/spawn` will spawn the default server for the current user.
If the `username` and optionally `servername` are specified,
then the specified server for the specified user will be spawned.
Once spawn has been requested,
the browser is redirected to `/hub/spawn-pending/...`.
If `Spawner.options_form` is used,
this will render a form,
and a POST request will trigger the actual spawn and redirect.
![The spawn form](/images/spawn-form.png)
_Version added: 1.0_
1.0 adds the ability to specify `username` and `servername`.
Prior to 1.0, only `/hub/spawn` was recognized for the default server.
_Version changed: 1.0_
Prior to 1.0, this page redirected back to `/hub/user/:username`,
which was responsible for triggering spawn and rendering progress, etc.
### `/hub/spawn-pending[/:username[/:servername]]`
![The spawn pending page](/images/spawn-pending.png)
_Version added: 1.0_ this URL is new in JupyterHub 1.0.
This page renders the progress view for the given spawn request.
Once the server is ready,
the browser is redirected to the running server at `/user/:username/:servername/...`.
If this page is requested at any time after the specified server is ready,
the browser will be redirected to the running server.
Requesting this page will never trigger any side effects.
If the server is not running (e.g. because the spawn has failed),
the spawn failure message (if applicable) will be displayed,
and the page will show a link back to `/hub/spawn/...`.
## `/hub/token`
![The token management page](/images/token-page.png)
On this page, users can manage their JupyterHub API tokens.
They can revoke access and request new tokens for writing scripts
against the [JupyterHub REST API](using-jupyterhub-rest-api).
## `/hub/admin`
![The admin panel](/images/named-servers-admin.png)
Administrators can take various administrative actions from this page:
- add/remove users
- grant admin privileges
- start/stop user servers
- shutdown JupyterHub itself

View File

@@ -0,0 +1,195 @@
(gallery-of-deployments)=
# A Gallery of JupyterHub Deployments
**A JupyterHub Community Resource**
We've compiled this list of JupyterHub deployments to help the community
see the breadth and growth of JupyterHub's use in education, research, and
high performance computing.
Please submit pull requests to update information or to add new institutions or uses.
## Academic Institutions, Research Labs, and Supercomputer Centers
### University of California Berkeley
- [BIDS - Berkeley Institute for Data Science](https://bids.berkeley.edu/)
- [Teaching with Jupyter notebooks and JupyterHub](https://bids.berkeley.edu/resources/videos/teaching-ipythonjupyter-notebooks-and-jupyterhub)
- [Data 8](http://data8.org/)
- [GitHub organization](https://github.com/data-8)
- [NERSC](https://www.nersc.gov/)
- [Press release on Jupyter and Cori](https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2016/jupyter-notebooks-will-open-up-new-possibilities-on-nerscs-cori-supercomputer/)
- [Moving and sharing data](https://www.nersc.gov/assets/Uploads/03-MovingAndSharingData-Cholia.pdf)
- [Research IT](https://research-it.berkeley.edu)
- [JupyterHub server supports campus research computation](https://research-it.berkeley.edu/blog/17/01/24/free-fully-loaded-jupyterhub-server-supports-campus-research-computation)
### University of California Davis
- [Spinning up multiple Jupyter Notebooks on AWS for a tutorial](https://github.com/mblmicdiv/course2017/blob/HEAD/exercises/sourmash-setup.md)
Although not technically a JupyterHub deployment, this tutorial setup
may be helpful to others in the Jupyter community.
Thank you C. Titus Brown for sharing this with the Software Carpentry
mailing list.
```
* I started a big Amazon machine;
* I installed Docker and built a custom image containing my software of
interest;
* I ran multiple containers, one connected to port 8000, one on 8001,
etc. and gave each student a different port;
* students could connect in and use the Terminal program in Jupyter to
execute commands, and could upload/download files via the Jupyter
console interface;
* in theory I could have used notebooks too, but for this I didnt have
need.
I am aware that JupyterHub can probably do all of this including manage
the containers, but Im still a bit shy of diving into that; this was
fairly straightforward, gave me disposable containers that were isolated
for each individual student, and worked almost flawlessly. Should be
easy to do with RStudio too.
```
### Cal Poly San Luis Obispo
- [jupyterhub-deploy-teaching](https://github.com/jupyterhub/jupyterhub-deploy-teaching) based on work by Brian Granger for Cal Poly's Data Science 301 Course
### Chameleon
[Chameleon](https://www.chameleoncloud.org) is a NSF-funded configurable experimental environment for large-scale computer science systems research with [bare metal reconfigurability](https://chameleoncloud.readthedocs.io/en/latest/technical/baremetal.html). Chameleon users utilize JupyterHub to document and reproduce their complex CISE and networking experiments.
- [Shared JupyterHub](https://jupyter.chameleoncloud.org): provides a common "workbench" environment for any Chameleon user.
- [Trovi](https://www.chameleoncloud.org/experiment/share): a sharing portal of experiments, tutorials, and examples, which users can launch as a dedicated isolated environments on Chameleon's JupyterHub.
### Clemson University
- Advanced Computing
- [Palmetto cluster and JupyterHub](https://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
### University of Colorado Boulder
- (CU Research Computing) CURC
- [JupyterHub User Guide](https://curc.readthedocs.io/en/latest/gateways/jupyterhub.html)
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
- Profiles in IPython Clusters tab
- [Parallel Processing with JupyterHub tutorial](https://curc.readthedocs.io/en/latest/gateways/parallel-programming-jupyter.html)
### George Washington University
- [JupyterHub](https://go.gwu.edu/jupyter) with university single-sign-on. Deployed early 2017.
### HTCondor
- [HTCondor Python Bindings Tutorial from HTCondor Week 2017 includes information on their JupyterHub tutorials](https://research.cs.wisc.edu/htcondor/HTCondorWeek2017/presentations/TueBockelman_Python.pdf)
### University of Illinois
- https://datascience.business.illinois.edu (currently down; checked 10/26/22)
### IllustrisTNG Simulation Project
- [JupyterHub/Lab-based analysis platform, part of the TNG public data release](https://www.tng-project.org/data/)
### MIT and Lincoln Labs
- https://supercloud.mit.edu/
### Michigan State University
- [Setting up JupyterHub](https://mediaspace.msu.edu/media/Setting+Up+Your+JupyterHub+Password/1_hgv13aag/11980471)
### University of Minnesota
- [JupyterHub Inside HPC](https://insidehpc.com/tag/jupyterhub/)
### University of Missouri
- https://dsa.missouri.edu/faq/
### Paderborn University
- [Data Science (DICE) group](https://dice-research.org)
- [nbgraderutils](https://github.com/dice-group/nbgraderutils): Use JupyterHub + nbgrader + iJava kernel for online Java exercises. Used in lecture Statistical Natural Language Processing.
### Penn State University
- [Press release](https://news.psu.edu/story/523093/2018/05/24/new-open-source-web-apps-available-students-and-faculty): "New open-source web apps available for students and faculty"
### University of California San Diego
- San Diego Supercomputer Center - Andrea Zonca
- [Deploy JupyterHub on a Supercomputer with SSH](https://zonca.github.io/2017/05/jupyterhub-hpc-batchspawner-ssh.html)
- [Run Jupyterhub on a Supercomputer](https://zonca.github.io/2015/04/jupyterhub-hpc.html)
- [Deploy JupyterHub on a VM for a Workshop](https://zonca.github.io/2016/04/jupyterhub-sdsc-cloud.html)
- [Customize your Python environment in Jupyterhub](https://zonca.github.io/2017/02/customize-python-environment-jupyterhub.html)
- [Jupyterhub deployment on multiple nodes with Docker Swarm](https://zonca.github.io/2016/05/jupyterhub-docker-swarm.html)
- [Sample deployment of Jupyterhub in HPC on SDSC Comet](https://zonca.github.io/2017/02/sample-deployment-jupyterhub-hpc.html)
- Educational Technology Services - Paul Jamason
- [datahub.ucsd.edu](https://datahub.ucsd.edu)
### TACC University of Texas
### Texas A&M
- Kristen Thyng - Oceanography
- [Teaching with JupyterHub and nbgrader](http://kristenthyng.com/blog/2016/09/07/jupyterhub+nbgrader/)
### Elucidata
- What's new in Jupyter Notebooks @[Elucidata](https://elucidata.io/):
- [Using Jupyter Notebooks with Jupyterhub on GCP, managed by GKE](https://medium.com/elucidata/why-you-should-be-using-a-jupyter-notebook-8385a4ccd93d)
## Service Providers
### AWS
- [Run Jupyter Notebook and JupyterHub on Amazon EMR](https://aws.amazon.com/blogs/big-data/running-jupyter-notebook-and-jupyterhub-on-amazon-emr/)
### Google Cloud Platform
- [Using Tensorflow and JupyterHub in Classrooms](https://cloud.google.com/solutions/using-tensorflow-jupyterhub-classrooms)
- [using-tensorflow-and-jupyterhub blog post](https://opensource.googleblog.com/2016/10/using-tensorflow-and-jupyterhub.html)
### Everware
[Everware](https://github.com/everware) Reproducible and reusable science powered by jupyterhub and docker. Like nbviewer, but executable. CERN, Geneva [website](http://everware.xyz/)
### Microsoft Azure
- [Azure Data Science Virtual Machine release notes](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-linux-dsvm-intro)
### Rackspace Carina
- https://getcarina.com/blog/learning-how-to-whale/
- https://carolynvanslyck.com/talk/carina/jupyterhub/#/ (but carolynvanslyck is currently down; checked 10/26/22)
### Hadoop
- [Deploying JupyterHub on Hadoop](https://jupyterhub-on-hadoop.readthedocs.io)
## Miscellaneous
- https://medium.com/@ybarraud/setting-up-jupyterhub-with-sudospawner-and-anaconda-844628c0dbee#.rm3yt87e1
- [Mailing list UT deployment](https://groups.google.com/g/jupyter/c/nkPSEeMr8c0)
- [JupyterHub setup on Centos](https://gist.github.com/johnrc/604971f7d41ebf12370bf5729bf3e0a4)
- [Deploy JupyterHub to Docker Swarm](https://jupyterhub.surge.sh)
- https://www.laketide.com/building-your-lab-part-3/
- https://estrellita.hatenablog.com/entry/2015/07/31/083202
- https://www.walkingrandomly.com/?p=5734
- https://wrdrd.com/docs/consulting/education-technology
- https://bitbucket.org/jackhale/fenics-jupyter
- [LinuxCluster blog](https://linuxcluster.wordpress.com/category/application/jupyterhub/)
- [Spark Cluster on OpenStack with Multi-User Jupyter Notebook](https://arnesund.com/2015/09/21/spark-cluster-on-openstack-with-multi-user-jupyter-notebook/)

View File

@@ -0,0 +1,20 @@
# Monitoring
This section covers details on monitoring the state of your JupyterHub installation.
JupyterHub expose the `/metrics` endpoint that returns text describing its current
operational state formatted in a way [Prometheus](https://prometheus.io) understands.
Prometheus is a separate open source tool that can be configured to repeatedly poll
JupyterHub's `/metrics` endpoint to parse and save its current state.
By doing so, Prometheus can describe JupyterHub's evolving state over time.
This evolving state can then be accessed through Prometheus that expose its underlying
storage to those allowed to access it, and be presented with dashboards by a
tool like [Grafana](https://grafana.com).
```{toctree}
:maxdepth: 2
/reference/metrics
```

View File

@@ -0,0 +1,315 @@
(authenticators-reference)=
# Authenticators
The {class}`.Authenticator` is the mechanism for authorizing users to use the
Hub and single user notebook servers.
## The default PAM Authenticator
JupyterHub ships with the default [PAM][]-based Authenticator, for
logging in with local user accounts via a username and password.
## The OAuthenticator
Some login mechanisms, such as [OAuth][], don't map onto username and
password authentication, and instead use tokens. When using these
mechanisms, you can override the login handlers.
You can see an example implementation of an Authenticator that uses
[GitHub OAuth][] at [OAuthenticator][].
JupyterHub's [OAuthenticator][] currently supports the following
popular services:
- Auth0
- Bitbucket
- CILogon
- GitHub
- GitLab
- Globus
- Google
- MediaWiki
- Okpy
- OpenShift
A [generic implementation](https://github.com/jupyterhub/oauthenticator/blob/master/oauthenticator/generic.py), which you can use for OAuth authentication with any provider, is also available.
## The Dummy Authenticator
When testing, it may be helpful to use the
{class}`jupyterhub.auth.DummyAuthenticator`. This allows for any username and
password unless if a global password has been set. Once set, any username will
still be accepted but the correct password will need to be provided.
## Additional Authenticators
A partial list of other authenticators is available on the
[JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
## Technical Overview of Authentication
### How the Base Authenticator works
The base authenticator uses simple username and password authentication.
The base Authenticator has one central method:
#### Authenticator.authenticate method
Authenticator.authenticate(handler, data)
This method is passed the Tornado `RequestHandler` and the `POST data`
from JupyterHub's login form. Unless the login form has been customized,
`data` will have two keys:
- `username`
- `password`
The `authenticate` method's job is simple:
- return the username (non-empty str) of the authenticated user if
authentication is successful
- return `None` otherwise
Writing an Authenticator that looks up passwords in a dictionary
requires only overriding this one method:
```python
from IPython.utils.traitlets import Dict
from jupyterhub.auth import Authenticator
class DictionaryAuthenticator(Authenticator):
passwords = Dict(config=True,
help="""dict of username:password for authentication"""
)
async def authenticate(self, handler, data):
if self.passwords.get(data['username']) == data['password']:
return data['username']
```
#### Normalize usernames
Since the Authenticator and Spawner both use the same username,
sometimes you want to transform the name coming from the authentication service
(e.g. turning email addresses into local system usernames) before adding them to the Hub service.
Authenticators can define `normalize_username`, which takes a username.
The default normalization is to cast names to lowercase
For simple mappings, a configurable dict `Authenticator.username_map` is used to turn one name into another:
```python
c.Authenticator.username_map = {
'service-name': 'localname'
}
```
When using `PAMAuthenticator`, you can set
`c.PAMAuthenticator.pam_normalize_username = True`, which will
normalize usernames using PAM (basically round-tripping them: username
to uid to username), which is useful in case you use some external
service that allows multiple usernames mapping to the same user (such
as ActiveDirectory, yes, this really happens). When
`pam_normalize_username` is on, usernames are _not_ normalized to
lowercase.
#### Validate usernames
In most cases, there is a very limited set of acceptable usernames.
Authenticators can define `validate_username(username)`,
which should return True for a valid username and False for an invalid one.
The primary effect this has is improving error messages during user creation.
The default behavior is to use configurable `Authenticator.username_pattern`,
which is a regular expression string for validation.
To only allow usernames that start with 'w':
```python
c.Authenticator.username_pattern = r'w.*'
```
### How to write a custom authenticator
You can use custom Authenticator subclasses to enable authentication
via other mechanisms. One such example is using [GitHub OAuth][].
Because the username is passed from the Authenticator to the Spawner,
a custom Authenticator and Spawner are often used together.
For example, the Authenticator methods, {meth}`.Authenticator.pre_spawn_start`
and {meth}`.Authenticator.post_spawn_stop`, are hooks that can be used to do
auth-related startup (e.g. opening PAM sessions) and cleanup
(e.g. closing PAM sessions).
See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
If you are interested in writing a custom authenticator, you can read
[this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/authenticators.html).
### Registering custom Authenticators via entry points
As of JupyterHub 1.0, custom authenticators can register themselves via
the `jupyterhub.authenticators` entry point metadata.
To do this, in your `setup.py` add:
```python
setup(
...
entry_points={
'jupyterhub.authenticators': [
'myservice = mypackage:MyAuthenticator',
],
},
)
```
If you have added this metadata to your package,
admins can select your authenticator with the configuration:
```python
c.JupyterHub.authenticator_class = 'myservice'
```
instead of the full
```python
c.JupyterHub.authenticator_class = 'mypackage:MyAuthenticator'
```
previously required.
Additionally, configurable attributes for your authenticator will
appear in jupyterhub help output and auto-generated configuration files
via `jupyterhub --generate-config`.
### Authentication state
JupyterHub 0.8 adds the ability to persist state related to authentication,
such as auth-related tokens.
If such state should be persisted, `.authenticate()` should return a dictionary of the form:
```python
{
'name': username,
'auth_state': {
'key': 'value',
}
}
```
where `username` is the username that has been authenticated,
and `auth_state` is any JSON-serializable dictionary.
Because `auth_state` may contain sensitive information,
it is encrypted before being stored in the database.
To store auth_state, two conditions must be met:
1. persisting auth state must be enabled explicitly via configuration
```python
c.Authenticator.enable_auth_state = True
```
2. encryption must be enabled by the presence of `JUPYTERHUB_CRYPT_KEY` environment variable,
which should be a hex-encoded 32-byte key.
For example:
```bash
export JUPYTERHUB_CRYPT_KEY=$(openssl rand -hex 32)
```
JupyterHub uses [Fernet](https://cryptography.io/en/latest/fernet/) to encrypt auth_state.
To facilitate key-rotation, `JUPYTERHUB_CRYPT_KEY` may be a semicolon-separated list of encryption keys.
If there are multiple keys present, the **first** key is always used to persist any new auth_state.
#### Using auth_state
Typically, if `auth_state` is persisted it is desirable to affect the Spawner environment in some way.
This may mean defining environment variables, placing certificate in the user's home directory, etc.
The {meth}`Authenticator.pre_spawn_start` method can be used to pass information from authenticator state
to Spawner environment:
```python
class MyAuthenticator(Authenticator):
async def authenticate(self, handler, data=None):
username = await identify_user(handler, data)
upstream_token = await token_for_user(username)
return {
'name': username,
'auth_state': {
'upstream_token': upstream_token,
},
}
async def pre_spawn_start(self, user, spawner):
"""Pass upstream_token to spawner via environment variable"""
auth_state = await user.get_auth_state()
if not auth_state:
# auth_state not enabled
return
spawner.environment['UPSTREAM_TOKEN'] = auth_state['upstream_token']
```
Note that environment variable names and values are always strings, so passing multiple values means setting multiple environment variables or serializing more complex data into a single variable, e.g. as a JSON string.
auth state can also be used to configure the spawner via _config_ without subclassing
by setting `c.Spawner.auth_state_hook`. This function will be called with `(spawner, auth_state)`,
only when auth_state is defined.
For example:
(for KubeSpawner)
```python
def auth_state_hook(spawner, auth_state):
spawner.volumes = auth_state['user_volumes']
spawner.mounts = auth_state['user_mounts']
c.Spawner.auth_state_hook = auth_state_hook
```
(authenticator-groups)=
## Authenticator-managed group membership
:::{versionadded} 2.2
:::
Some identity providers may have their own concept of group membership that you would like to preserve in JupyterHub.
This is now possible with `Authenticator.managed_groups`.
You can set the config:
```python
c.Authenticator.manage_groups = True
```
to enable this behavior.
The default is False for Authenticators that ship with JupyterHub,
but may be True for custom Authenticators.
Check your Authenticator's documentation for manage_groups support.
If True, {meth}`.Authenticator.authenticate` and {meth}`.Authenticator.refresh_user` may include a field `groups`
which is a list of group names the user should be a member of:
- Membership will be added for any group in the list
- Membership in any groups not in the list will be revoked
- Any groups not already present in the database will be created
- If `None` is returned, no changes are made to the user's group membership
If authenticator-managed groups are enabled,
all group-management via the API is disabled.
## pre_spawn_start and post_spawn_stop hooks
Authenticators use two hooks, {meth}`.Authenticator.pre_spawn_start` and
{meth}`.Authenticator.post_spawn_stop(user, spawner)` to add pass additional state information
between the authenticator and a spawner. These hooks are typically used auth-related
startup, i.e. opening a PAM session, and auth-related cleanup, i.e. closing a
PAM session.
## JupyterHub as an OAuth provider
Beginning with version 0.8, JupyterHub is an OAuth provider.
[pam]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
[oauth]: https://en.wikipedia.org/wiki/OAuth
[github oauth]: https://developer.github.com/v3/oauth/
[oauthenticator]: https://github.com/jupyterhub/oauthenticator

View File

@@ -0,0 +1,412 @@
(spawners-reference)=
# Spawners
A [Spawner][] starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
- start a process
- poll whether a process is still running
- stop a process
## Examples
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
Some examples include:
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
- `dockerspawner.DockerSpawner` for spawning identical Docker containers for
each user
- `dockerspawner.SystemUserSpawner` for spawning Docker containers with an
environment and home directory for each user
- both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
launching containers on remote machines
- [SudoSpawner](https://github.com/jupyterhub/sudospawner) enables JupyterHub to
run without being root, by spawning an intermediate process via `sudo`
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
servers using batch systems
- [YarnSpawner](https://github.com/jupyterhub/yarnspawner) for spawning notebook
servers in YARN containers on a Hadoop cluster
- [SSHSpawner](https://github.com/NERSC/sshspawner) to spawn notebooks
on a remote server using SSH
- [KubeSpawner](https://github.com/jupyterhub/kubespawner) to spawn notebook servers on kubernetes cluster.
## Spawner control methods
### Spawner.start
`Spawner.start` should start a single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
The return value of `Spawner.start` should be the `(ip, port)` of the running server,
or a full URL as a string.
Most `Spawner.start` functions will look similar to this example:
```python
async def start(self):
self.ip = '127.0.0.1'
self.port = random_port()
# get environment variables,
# several of which are required for configuring the single-user server
env = self.get_env()
cmd = []
# get jupyterhub command to run,
# typically ['jupyterhub-singleuser']
cmd.extend(self.cmd)
cmd.extend(self.get_args())
await self._actually_start_server_somehow(cmd, env)
# url may not match self.ip:self.port, but it could!
url = self._get_connectable_url()
return url
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
#### Note on IPs and ports
`Spawner.ip` and `Spawner.port` attributes set the _bind_ URL,
which the single-user server should listen on
(passed to the single-user process via the `JUPYTERHUB_SERVICE_URL` environment variable).
The _return_ value is the IP and port (or full URL) the Hub should _connect to_.
These are not necessarily the same, and usually won't be in any Spawner that works with remote resources or containers.
The default for `Spawner.ip`, and `Spawner.port` is `127.0.0.1:{random}`,
which is appropriate for Spawners that launch local processes,
where everything is on localhost and each server needs its own port.
For remote or container Spawners, it will often make sense to use a different value,
such as `ip = '0.0.0.0'` and a fixed port, e.g. `8888`.
The defaults can be changed in the class,
preserving configuration with traitlets:
```python
from traitlets import default
from jupyterhub.spawner import Spawner
class MySpawner(Spawner):
@default("ip")
def _default_ip(self):
return '0.0.0.0'
@default("port")
def _default_port(self):
return 8888
async def start(self):
env = self.get_env()
cmd = []
# get jupyterhub command to run,
# typically ['jupyterhub-singleuser']
cmd.extend(self.cmd)
cmd.extend(self.get_args())
remote_server_info = await self._actually_start_server_somehow(cmd, env)
url = self.get_public_url_from(remote_server_info)
return url
```
#### Exception handling
When `Spawner.start` raises an Exception, a message can be passed on to the user via the exception using a `.jupyterhub_html_message` or `.jupyterhub_message` attribute.
When the Exception has a `.jupyterhub_html_message` attribute, it will be rendered as HTML to the user.
Alternatively `.jupyterhub_message` is rendered as unformatted text.
If both attributes are not present, the Exception will be shown to the user as unformatted text.
### Spawner.poll
`Spawner.poll` checks if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
In the case of local processes, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running. On Windows, it uses `psutil.pid_exists`.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
single-user notebook servers. To do this task, a Spawner may need to persist
some information that can be restored later.
A JSON-able dictionary of state can be used to store persisted information.
Unlike start, stop, and poll methods, the state methods must not be coroutines.
In the case of single processes, the Spawner state is only the process ID of the server:
```python
def get_state(self):
"""get the current state"""
state = super().get_state()
if self.pid:
state['pid'] = self.pid
return state
def load_state(self, state):
"""load state from the database"""
super().load_state(state)
if 'pid' in state:
self.pid = state['pid']
def clear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid = 0
```
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
![spawn-form](/images/spawn-form.png)
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyterhub/jupyterhub/blob/HEAD/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
```python
{
'integer': ['5'],
'text': ['some text'],
'select': ['a', 'b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
```python
def options_from_form(self, formdata):
options = {}
options['integer'] = int(formdata['integer'][0]) # single integer value
options['text'] = formdata['text'][0] # single string value
options['select'] = formdata['select'] # list already correct
options['notinform'] = 'extra info' # not in the form at all
return options
```
which would return:
```python
{
'integer': 5,
'text': 'some text',
'select': ['a', 'b'],
'notinform': 'extra info',
}
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
[spawner]: https://github.com/jupyterhub/jupyterhub/blob/HEAD/jupyterhub/spawner.py
## Writing a custom spawner
If you are interested in building a custom spawner, you can read [this tutorial](https://jupyterhub-tutorial.readthedocs.io/en/latest/spawners.html).
### Registering custom Spawners via entry points
As of JupyterHub 1.0, custom Spawners can register themselves via
the `jupyterhub.spawners` entry point metadata.
To do this, in your `setup.py` add:
```python
setup(
...
entry_points={
'jupyterhub.spawners': [
'myservice = mypackage:MySpawner',
],
},
)
```
If you have added this metadata to your package,
users can select your spawner with the configuration:
```python
c.JupyterHub.spawner_class = 'myservice'
```
instead of the full
```python
c.JupyterHub.spawner_class = 'mypackage:MySpawner'
```
previously required.
Additionally, configurable attributes for your spawner will
appear in jupyterhub help output and auto-generated configuration files
via `jupyterhub --generate-config`.
## Environment variables and command-line arguments
Spawners mainly do one thing: launch a command in an environment.
The command-line is constructed from user configuration:
- Spawner.cmd (default: `['jupyterhub-singleuser']`)
- Spawner.args (CLI args to pass to the cmd, default: empty)
where the configuration:
```python
c.Spawner.cmd = ["my-singleuser-wrapper"]
c.Spawner.args = ["--debug", "--flag"]
```
would result in spawning the command:
```bash
my-singleuser-wrapper --debug --flag
```
The `Spawner.get_args()` method is how `Spawner.args` is accessed,
and can be used by Spawners to customize/extend user-provided arguments.
Prior to 2.0, JupyterHub unconditionally added certain options _if specified_ to the command-line,
such as `--ip={Spawner.ip}` and `--port={Spawner.port}`.
These have now all been moved to environment variables,
and from JupyterHub 2.0,
the command-line launched by JupyterHub is fully specified by overridable configuration `Spawner.cmd + Spawner.args`.
Most process configuration is passed via environment variables.
Additional variables can be specified via the `Spawner.environment` configuration.
The process environment is returned by `Spawner.get_env`, which specifies the following environment variables:
- `JUPYTERHUB_SERVICE_URL` - the _bind_ URL where the server should launch its HTTP server (`http://127.0.0.1:12345`).
This includes `Spawner.ip` and `Spawner.port`; _new in 2.0, prior to 2.0 IP, port were on the command-line and only if specified_
- `JUPYTERHUB_SERVICE_PREFIX` - the URL prefix the service will run on (e.g. `/user/name/`)
- `JUPYTERHUB_USER` - the JupyterHub user's username
- `JUPYTERHUB_SERVER_NAME` - the server's name, if using named servers (default server has an empty name)
- `JUPYTERHUB_API_URL` - the full URL for the JupyterHub API (http://17.0.0.1:8001/hub/api)
- `JUPYTERHUB_BASE_URL` - the base URL of the whole jupyterhub deployment, i.e. the bit before `hub/` or `user/`,
as set by `c.JupyterHub.base_url` (default: `/`)
- `JUPYTERHUB_API_TOKEN` - the API token the server can use to make requests to the Hub.
This is also the OAuth client secret.
- `JUPYTERHUB_CLIENT_ID` - the OAuth client ID for authenticating visitors.
- `JUPYTERHUB_OAUTH_CALLBACK_URL` - the callback URL to use in OAuth, typically `/user/:name/oauth_callback`
- `JUPYTERHUB_OAUTH_ACCESS_SCOPES` - the scopes required to access the server (called `JUPYTERHUB_OAUTH_SCOPES` prior to 3.0)
- `JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES` - the scopes the service is allowed to request.
If no scopes are requested explicitly, these scopes will be requested.
Optional environment variables, depending on configuration:
- `JUPYTERHUB_SSL_[KEYFILE|CERTFILE|CLIENT_CI]` - SSL configuration, when `internal_ssl` is enabled
- `JUPYTERHUB_ROOT_DIR` - the root directory of the server (notebook directory), when `Spawner.notebook_dir` is defined (new in 2.0)
- `JUPYTERHUB_DEFAULT_URL` - the default URL for the server (for redirects from `/user/:name/`),
if `Spawner.default_url` is defined
(new in 2.0, previously passed via CLI)
- `JUPYTERHUB_DEBUG=1` - generic debug flag, sets maximum log level when `Spawner.debug` is True
(new in 2.0, previously passed via CLI)
- `JUPYTERHUB_DISABLE_USER_CONFIG=1` - disable loading user config,
sets maximum log level when `Spawner.debug` is True (new in 2.0,
previously passed via CLI)
- `JUPYTERHUB_[MEM|CPU]_[LIMIT_GUARANTEE]` - the values of CPU and memory limits and guarantees.
These are not expected to be enforced by the process,
but are made available as a hint,
e.g. for resource monitoring extensions.
## Spawners, resource limits, and guarantees (Optional)
Some spawners of the single-user notebook servers allow setting limits or
guarantees on resources, such as CPU and memory. To provide a consistent
experience for sysadmins and users, we provide a standard way to set and
discover these resource limits and guarantees, such as for memory and CPU.
For the limits and guarantees to be useful, **the spawner must implement
support for them**. For example, `LocalProcessSpawner`, the default
spawner, does not support limits and guarantees. One of the spawners
that supports limits and guarantees is the
[`systemdspawner`](https://github.com/jupyterhub/systemdspawner).
### Memory Limits & Guarantees
`c.Spawner.mem_limit`: A **limit** specifies the _maximum amount of memory_
that may be allocated, though there is no promise that the maximum amount will
be available. In supported spawners, you can set `c.Spawner.mem_limit` to
limit the total amount of memory that a single-user notebook server can
allocate. Attempting to use more memory than this limit will cause errors. The
single-user notebook server can discover its own memory limit by looking at
the environment variable `MEM_LIMIT`, which is specified in absolute bytes.
`c.Spawner.mem_guarantee`: Sometimes, a **guarantee** of a _minimum amount of
memory_ is desirable. In this case, you can set `c.Spawner.mem_guarantee` to
to provide a guarantee that at minimum this much memory will always be
available for the single-user notebook server to use. The environment variable
`MEM_GUARANTEE` will also be set in the single-user notebook server.
**The spawner's underlying system or cluster is responsible for enforcing these
limits and providing these guarantees.** If these values are set to `None`, no
limits or guarantees are provided, and no environment values are set.
### CPU Limits & Guarantees
`c.Spawner.cpu_limit`: In supported spawners, you can set
`c.Spawner.cpu_limit` to limit the total number of cpu-cores that a
single-user notebook server can use. These can be fractional - `0.5` means 50%
of one CPU core, `4.0` is 4 CPU-cores, etc. This value is also set in the
single-user notebook server's environment variable `CPU_LIMIT`. The limit does
not claim that you will be able to use all the CPU up to your limit as other
higher priority applications might be taking up CPU.
`c.Spawner.cpu_guarantee`: You can set `c.Spawner.cpu_guarantee` to provide a
guarantee for CPU usage. The environment variable `CPU_GUARANTEE` will be set
in the single-user notebook server when a guarantee is being provided.
**The spawner's underlying system or cluster is responsible for enforcing these
limits and providing these guarantees.** If these values are set to `None`, no
limits or guarantees are provided, and no environment values are set.
### Encryption
Communication between the `Proxy`, `Hub`, and `Notebook` can be secured by
turning on `internal_ssl` in `jupyterhub_config.py`. For a custom spawner to
utilize these certs, there are two methods of interest on the base `Spawner`
class: `.create_certs` and `.move_certs`.
The first method, `.create_certs` will sign a key-cert pair using an internally
trusted authority for notebooks. During this process, `.create_certs` can
apply `ip` and `dns` name information to the cert via an `alt_names` `kwarg`.
This is used for certificate authentication (verification). Without proper
verification, the `Notebook` will be unable to communicate with the `Hub` and
vice versa when `internal_ssl` is enabled. For example, given a deployment
using the `DockerSpawner` which will start containers with `ips` from the
`docker` subnet pool, the `DockerSpawner` would need to instead choose a
container `ip` prior to starting and pass that to `.create_certs` (TODO: edit).
In general though, this method will not need to be changed and the default
`ip`/`dns` (localhost) info will suffice.
When `.create_certs` is run, it will create the certificates in a default,
central location specified by `c.JupyterHub.internal_certs_location`. For
`Spawners` that need access to these certs elsewhere (i.e. on another host
altogether), the `.move_certs` method can be overridden to move the certs
appropriately. Again, using `DockerSpawner` as an example, this would entail
moving certs to a directory that will get mounted into the container this
spawner starts.

View File

@@ -0,0 +1,132 @@
# Technical Overview
The **Technical Overview** section gives you a high-level view of:
- JupyterHub's major Subsystems: Hub, Proxy, Single-User Notebook Server
- how the subsystems interact
- the process from JupyterHub access to user login
- JupyterHub's default behavior
- customizing JupyterHub
The goal of this section is to share a deeper technical understanding of
JupyterHub and how it works.
## The Major Subsystems: Hub, Proxy, Single-User Notebook Server
JupyterHub is a set of processes that together, provide a single-user Jupyter
Notebook server for each person in a group. Three subsystems are started
by the `jupyterhub` command line program:
- **Hub** (Python/Tornado): manages user accounts, authentication, and
coordinates Single User Notebook Servers using a [Spawner](spawners-reference).
- **Proxy**: the public-facing part of JupyterHub that uses a dynamic proxy
to route HTTP requests to the Hub and Single User Notebook Servers.
[configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
(node-http-proxy) is the default proxy.
- **Single-User Notebook Server** (Python/Tornado): a dedicated,
single-user, Jupyter Notebook server is started for each user on the system
when the user logs in. The object that starts the single-user notebook
servers is called a **[Spawner](spawners-reference)**.
![JupyterHub subsystems](/images/jhub-parts.png)
## How the Subsystems Interact
Users access JupyterHub through a web browser, by going to the IP address or
the domain name of the server.
The basic principles of operation are:
- The Hub spawns the proxy (in the default JupyterHub configuration)
- The proxy forwards all requests to the Hub by default
- The Hub handles login and spawns single-user notebook servers on demand
- The Hub configures the proxy to forward URL prefixes to single-user notebook
servers
The proxy is the only process that listens on a public interface. The Hub sits
behind the proxy at `/hub`. Single-user servers sit behind the proxy at
`/user/[username]`.
Different **[authenticators](authenticators-reference)** control access
to JupyterHub. The default one [(PAM)](https://en.wikipedia.org/wiki/Pluggable_authentication_module) uses the user accounts on the server where
JupyterHub is running. If you use this, you will need to create a user account
on the system for each user on your team. However, using other authenticators you can
allow users to sign in with e.g. a GitHub account, or with any single-sign-on
system your organization has.
Next, **[spawners](spawners-reference)** control how JupyterHub starts
the individual notebook server for each user. The default spawner will
start a notebook server on the same machine running under their system username.
The other main option is to start each server in a separate container, often using [Docker](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/).
## The Process from JupyterHub Access to User Login
When a user accesses JupyterHub, the following events take place:
- Login data is handed to the [Authenticator](authenticators-reference) instance for
validation
- The Authenticator returns the username if the login information is valid
- A single-user notebook server instance is [spawned](spawners-reference) for the
logged-in user
- When the single-user notebook server starts, the proxy is notified to forward
requests made to `/user/[username]/*`, to the single-user notebook server.
- A [cookie](https://en.wikipedia.org/wiki/HTTP_cookie) is set on `/hub/`, containing an encrypted token. (Prior to version
0.8, a cookie for `/user/[username]` was used too.)
- The browser is redirected to `/user/[username]`, and the request is handled by
the single-user notebook server.
How does the single-user server identify the user with the Hub via OAuth?
- On request, the single-user server checks a cookie
- If no cookie is set, the single-user server redirects to the Hub for verification via OAuth
- After verification at the Hub, the browser is redirected back to the
single-user server
- The token is verified and stored in a cookie
- If no user is identified, the browser is redirected back to `/hub/login`
## Default Behavior
By default, the **Proxy** listens on all public interfaces on port 8000.
Thus you can reach JupyterHub through either:
- `http://localhost:8000`
- or any other public IP or domain pointing to your system.
In their default configuration, the other services, the **Hub** and
**Single-User Notebook Servers**, all communicate with each other on localhost
only.
By default, starting JupyterHub will write two files to disk in the current
working directory:
- `jupyterhub.sqlite` is the SQLite database containing all of the state of the
**Hub**. This file allows the **Hub** to remember which users are running and
where, as well as storing other information enabling you to restart parts of
JupyterHub separately. It is important to note that this database contains
**no** sensitive information other than **Hub** usernames.
- `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
This file needs to persist so that a **Hub** server restart will avoid
invalidating cookies. Conversely, deleting this file and restarting the server
effectively invalidates all login cookies. The cookie secret file is discussed
in the [Cookie Secret section of the Security Settings document](security-basics).
The location of these files can be specified via configuration settings. It is
recommended that these files be stored in standard UNIX filesystem locations,
such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
all security and runtime files.
## Customizing JupyterHub
There are two basic extension points for JupyterHub:
- How users are authenticated by [Authenticators](authenticators-reference)
- How user's single-user notebook server processes are started by
[Spawners](spawners-reference)
Each is governed by a customizable class, and JupyterHub ships with basic
defaults for each.
To enable custom authentication and/or spawning, subclass `Authenticator` or
`Spawner`, and override the relevant methods.