mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-17 06:52:59 +00:00
Restructure doc folder structure
This commit is contained in:
142
docs/source/reference/authenticators.md
Normal file
142
docs/source/reference/authenticators.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Authenticators
|
||||
|
||||
The [Authenticator][] is the mechanism for authorizing users to use the
|
||||
Hub and single user notebook servers.
|
||||
|
||||
## The default PAM Authenticator
|
||||
|
||||
JupyterHub ships only with the default [PAM][]-based Authenticator,
|
||||
for logging in with local user accounts via a username and password.
|
||||
|
||||
## The OAuthenticator
|
||||
|
||||
Some login mechanisms, such as [OAuth][], don't map onto username and
|
||||
password authentication, and instead use tokens. When using these
|
||||
mechanisms, you can override the login handlers.
|
||||
|
||||
You can see an example implementation of an Authenticator that uses
|
||||
[GitHub OAuth][] at [OAuthenticator][].
|
||||
|
||||
JupyterHub's [OAuthenticator][] currently supports the following
|
||||
popular services:
|
||||
|
||||
- Auth0
|
||||
- Bitbucket
|
||||
- CILogon
|
||||
- GitHub
|
||||
- GitLab
|
||||
- Globus
|
||||
- Google
|
||||
- MediaWiki
|
||||
- Okpy
|
||||
- OpenShift
|
||||
|
||||
A generic implementation, which you can use for OAuth authentication
|
||||
with any provider, is also available.
|
||||
|
||||
## Additional Authenticators
|
||||
|
||||
- ldapauthenticator for LDAP
|
||||
- tmpauthenticator for temporary accounts
|
||||
|
||||
## Technical Overview of Authentication
|
||||
|
||||
### How the Base Authenticator works
|
||||
|
||||
The base authenticator uses simple username and password authentication.
|
||||
|
||||
The base Authenticator has one central method:
|
||||
|
||||
#### Authenticator.authenticate method
|
||||
|
||||
Authenticator.authenticate(handler, data)
|
||||
|
||||
This method is passed the Tornado `RequestHandler` and the `POST data`
|
||||
from JupyterHub's login form. Unless the login form has been customized,
|
||||
`data` will have two keys:
|
||||
|
||||
- `username`
|
||||
- `password`
|
||||
|
||||
The `authenticate` method's job is simple:
|
||||
|
||||
- return the username (non-empty str) of the authenticated user if
|
||||
authentication is successful
|
||||
- return `None` otherwise
|
||||
|
||||
Writing an Authenticator that looks up passwords in a dictionary
|
||||
requires only overriding this one method:
|
||||
|
||||
```python
|
||||
from tornado import gen
|
||||
from IPython.utils.traitlets import Dict
|
||||
from jupyterhub.auth import Authenticator
|
||||
|
||||
class DictionaryAuthenticator(Authenticator):
|
||||
|
||||
passwords = Dict(config=True,
|
||||
help="""dict of username:password for authentication"""
|
||||
)
|
||||
|
||||
@gen.coroutine
|
||||
def authenticate(self, handler, data):
|
||||
if self.passwords.get(data['username']) == data['password']:
|
||||
return data['username']
|
||||
```
|
||||
|
||||
#### Normalize usernames
|
||||
|
||||
Since the Authenticator and Spawner both use the same username,
|
||||
sometimes you want to transform the name coming from the authentication service
|
||||
(e.g. turning email addresses into local system usernames) before adding them to the Hub service.
|
||||
Authenticators can define `normalize_username`, which takes a username.
|
||||
The default normalization is to cast names to lowercase
|
||||
|
||||
For simple mappings, a configurable dict `Authenticator.username_map` is used to turn one name into another:
|
||||
|
||||
```python
|
||||
c.Authenticator.username_map = {
|
||||
'service-name': 'localname'
|
||||
}
|
||||
```
|
||||
|
||||
#### Validate usernames
|
||||
|
||||
In most cases, there is a very limited set of acceptable usernames.
|
||||
Authenticators can define `validate_username(username)`,
|
||||
which should return True for a valid username and False for an invalid one.
|
||||
The primary effect this has is improving error messages during user creation.
|
||||
|
||||
The default behavior is to use configurable `Authenticator.username_pattern`,
|
||||
which is a regular expression string for validation.
|
||||
|
||||
To only allow usernames that start with 'w':
|
||||
|
||||
```python
|
||||
c.Authenticator.username_pattern = r'w.*'
|
||||
```
|
||||
|
||||
### How to write a custom authenticator
|
||||
|
||||
You can use custom Authenticator subclasses to enable authentication
|
||||
via other mechanisms. One such example is using [GitHub OAuth][].
|
||||
|
||||
Because the username is passed from the Authenticator to the Spawner,
|
||||
a custom Authenticator and Spawner are often used together.
|
||||
|
||||
See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
|
||||
|
||||
If you are interested in writing a custom authenticator, you can read
|
||||
[this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/authenticators.html).
|
||||
|
||||
|
||||
## JupyterHub as an OAuth provider
|
||||
|
||||
Beginning with version 0.8, JupyterHub is an OAuth provider.
|
||||
|
||||
|
||||
[Authenticator]: https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/auth.py
|
||||
[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
|
||||
[OAuth]: https://en.wikipedia.org/wiki/OAuth
|
||||
[GitHub OAuth]: https://developer.github.com/v3/oauth/
|
||||
[OAuthenticator]: https://github.com/jupyterhub/oauthenticator
|
211
docs/source/reference/config-examples.md
Normal file
211
docs/source/reference/config-examples.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# Configuration examples
|
||||
|
||||
This section provides examples, including configuration files and tips, for the
|
||||
following configurations:
|
||||
|
||||
- Using GitHub OAuth
|
||||
- Using nginx reverse proxy
|
||||
|
||||
## Using GitHub OAuth
|
||||
|
||||
In this example, we show a configuration file for a fairly standard JupyterHub
|
||||
deployment with the following assumptions:
|
||||
|
||||
* Running JupyterHub on a single cloud server
|
||||
* Using SSL on the standard HTTPS port 443
|
||||
* Using GitHub OAuth (using oauthenticator) for login
|
||||
* Users exist locally on the server
|
||||
* Users' notebooks to be served from `~/assignments` to allow users to browse
|
||||
for notebooks within other users' home directories
|
||||
* You want the landing page for each user to be a `Welcome.ipynb` notebook in
|
||||
their assignments directory.
|
||||
* All runtime files are put into `/srv/jupyterhub` and log files in `/var/log`.
|
||||
|
||||
The `jupyterhub_config.py` file would have these settings:
|
||||
|
||||
```python
|
||||
# jupyterhub_config.py file
|
||||
c = get_config()
|
||||
|
||||
import os
|
||||
pjoin = os.path.join
|
||||
|
||||
runtime_dir = os.path.join('/srv/jupyterhub')
|
||||
ssl_dir = pjoin(runtime_dir, 'ssl')
|
||||
if not os.path.exists(ssl_dir):
|
||||
os.makedirs(ssl_dir)
|
||||
|
||||
# Allows multiple single-server per user
|
||||
c.JupyterHub.allow_named_servers = True
|
||||
|
||||
# https on :443
|
||||
c.JupyterHub.port = 443
|
||||
c.JupyterHub.ssl_key = pjoin(ssl_dir, 'ssl.key')
|
||||
c.JupyterHub.ssl_cert = pjoin(ssl_dir, 'ssl.cert')
|
||||
|
||||
# put the JupyterHub cookie secret and state db
|
||||
# in /var/run/jupyterhub
|
||||
c.JupyterHub.cookie_secret_file = pjoin(runtime_dir, 'cookie_secret')
|
||||
c.JupyterHub.db_url = pjoin(runtime_dir, 'jupyterhub.sqlite')
|
||||
# or `--db=/path/to/jupyterhub.sqlite` on the command-line
|
||||
|
||||
# put the log file in /var/log
|
||||
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'
|
||||
|
||||
# use GitHub OAuthenticator for local users
|
||||
c.JupyterHub.authenticator_class = 'oauthenticator.LocalGitHubOAuthenticator'
|
||||
c.GitHubOAuthenticator.oauth_callback_url = os.environ['OAUTH_CALLBACK_URL']
|
||||
|
||||
# create system users that don't exist yet
|
||||
c.LocalAuthenticator.create_system_users = True
|
||||
|
||||
# specify users and admin
|
||||
c.Authenticator.whitelist = {'rgbkrk', 'minrk', 'jhamrick'}
|
||||
c.Authenticator.admin_users = {'jhamrick', 'rgbkrk'}
|
||||
|
||||
# start single-user notebook servers in ~/assignments,
|
||||
# with ~/assignments/Welcome.ipynb as the default landing page
|
||||
# this config could also be put in
|
||||
# /etc/jupyter/jupyter_notebook_config.py
|
||||
c.Spawner.notebook_dir = '~/assignments'
|
||||
c.Spawner.args = ['--NotebookApp.default_url=/notebooks/Welcome.ipynb']
|
||||
```
|
||||
|
||||
Using the GitHub Authenticator requires a few additional
|
||||
environment variable to be set prior to launching JupyterHub:
|
||||
|
||||
```bash
|
||||
export GITHUB_CLIENT_ID=github_id
|
||||
export GITHUB_CLIENT_SECRET=github_secret
|
||||
export OAUTH_CALLBACK_URL=https://example.com/hub/oauth_callback
|
||||
export CONFIGPROXY_AUTH_TOKEN=super-secret
|
||||
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
|
||||
```
|
||||
|
||||
## Using nginx reverse proxy
|
||||
|
||||
In the following example, we show configuration files for a JupyterHub server
|
||||
running locally on port `8000` but accessible from the outside on the standard
|
||||
SSL port `443`. This could be useful if the JupyterHub server machine is also
|
||||
hosting other domains or content on `443`. The goal in this example is to
|
||||
satisfy the following:
|
||||
|
||||
* JupyterHub is running on a server, accessed *only* via `HUB.DOMAIN.TLD:443`
|
||||
* On the same machine, `NO_HUB.DOMAIN.TLD` strictly serves different content,
|
||||
also on port `443`
|
||||
* `nginx` is used to manage the web servers / reverse proxy (which means that
|
||||
only nginx will be able to bind two servers to `443`)
|
||||
* After testing, the server in question should be able to score an A+ on the
|
||||
Qualys SSL Labs [SSL Server Test](https://www.ssllabs.com/ssltest/)
|
||||
|
||||
Let's start out with needed JupyterHub configuration in `jupyterhub_config.py`:
|
||||
|
||||
```python
|
||||
# Force the proxy to only listen to connections to 127.0.0.1
|
||||
c.JupyterHub.ip = '127.0.0.1'
|
||||
```
|
||||
|
||||
The **`nginx` server config file** is fairly standard fare except for the two
|
||||
`location` blocks within the `HUB.DOMAIN.TLD` config file:
|
||||
|
||||
```bash
|
||||
# HTTP server to redirect all 80 traffic to SSL/HTTPS
|
||||
server {
|
||||
listen 80;
|
||||
server_name HUB.DOMAIN.TLD;
|
||||
|
||||
# Tell all requests to port 80 to be 302 redirected to HTTPS
|
||||
return 302 https://$host$request_uri;
|
||||
}
|
||||
|
||||
# HTTPS server to handle JupyterHub
|
||||
server {
|
||||
listen 443;
|
||||
ssl on;
|
||||
|
||||
server_name HUB.DOMAIN.TLD;
|
||||
|
||||
ssl_certificate /etc/letsencrypt/live/HUB.DOMAIN.TLD/fullchain.pem;
|
||||
ssl_certificate_key /etc/letsencrypt/live/HUB.DOMAIN.TLD/privkey.pem;
|
||||
|
||||
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
|
||||
ssl_prefer_server_ciphers on;
|
||||
ssl_dhparam /etc/ssl/certs/dhparam.pem;
|
||||
ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
|
||||
ssl_session_timeout 1d;
|
||||
ssl_session_cache shared:SSL:50m;
|
||||
ssl_stapling on;
|
||||
ssl_stapling_verify on;
|
||||
add_header Strict-Transport-Security max-age=15768000;
|
||||
|
||||
# Managing literal requests to the JupyterHub front end
|
||||
location / {
|
||||
proxy_pass https://127.0.0.1:8000;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
}
|
||||
|
||||
# Managing WebHook/Socket requests between hub user servers and external proxy
|
||||
location ~* /(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? {
|
||||
proxy_pass https://127.0.0.1:8000;
|
||||
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
# WebSocket support
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection $connection_upgrade;
|
||||
|
||||
}
|
||||
|
||||
# Managing requests to verify letsencrypt host
|
||||
location ~ /.well-known {
|
||||
allow all;
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
`nginx` will now be the front facing element of JupyterHub on `443` which means
|
||||
it is also free to bind other servers, like `NO_HUB.DOMAIN.TLD` to the same port
|
||||
on the same machine and network interface. In fact, one can simply use the same
|
||||
server blocks as above for `NO_HUB` and simply add line for the root directory
|
||||
of the site as well as the applicable location call:
|
||||
|
||||
```bash
|
||||
server {
|
||||
listen 80;
|
||||
server_name NO_HUB.DOMAIN.TLD;
|
||||
|
||||
# Tell all requests to port 80 to be 302 redirected to HTTPS
|
||||
return 302 https://$host$request_uri;
|
||||
}
|
||||
|
||||
server {
|
||||
listen 443;
|
||||
ssl on;
|
||||
|
||||
# INSERT OTHER SSL PARAMETERS HERE AS ABOVE
|
||||
|
||||
# Set the appropriate root directory
|
||||
root /var/www/html
|
||||
|
||||
# Set URI handling
|
||||
location / {
|
||||
try_files $uri $uri/ =404;
|
||||
}
|
||||
|
||||
# Managing requests to verify letsencrypt host
|
||||
location ~ /.well-known {
|
||||
allow all;
|
||||
}
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
Now just restart `nginx`, restart the JupyterHub, and enjoy accessing
|
||||
https://HUB.DOMAIN.TLD while serving other content securely on
|
||||
https://NO_HUB.DOMAIN.TLD.
|
14
docs/source/reference/index.rst
Normal file
14
docs/source/reference/index.rst
Normal file
@@ -0,0 +1,14 @@
|
||||
Technical Reference
|
||||
===================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
technical-overview
|
||||
websecurity
|
||||
authenticators
|
||||
spawners
|
||||
services
|
||||
rest
|
||||
upgrading
|
||||
config-examples
|
132
docs/source/reference/rest.md
Normal file
132
docs/source/reference/rest.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Using JupyterHub's REST API
|
||||
|
||||
This section will give you information on:
|
||||
|
||||
- what you can do with the API
|
||||
- create an API token
|
||||
- add API tokens to the config files
|
||||
- make an API request programmatically using the requests library
|
||||
- learn more about JupyterHub's API
|
||||
|
||||
## What you can do with the API
|
||||
|
||||
Using the [JupyterHub REST API][], you can perform actions on the Hub,
|
||||
such as:
|
||||
|
||||
- checking which users are active
|
||||
- adding or removing users
|
||||
- stopping or starting single user notebook servers
|
||||
- authenticating services
|
||||
|
||||
A [REST](https://en.wikipedia.org/wiki/Representational_state_transfer)
|
||||
API provides a standard way for users to get and send information to the
|
||||
Hub.
|
||||
|
||||
## Create an API token
|
||||
|
||||
To send requests using JupyterHub API, you must pass an API token with
|
||||
the request.
|
||||
|
||||
As of [version 0.6.0](./changelog.html), the preferred way of
|
||||
generating an API token is:
|
||||
|
||||
```bash
|
||||
openssl rand -hex 32
|
||||
```
|
||||
|
||||
This `openssl` command generates a potential token that can then be
|
||||
added to JupyterHub using `.api_tokens` configuration setting in
|
||||
`jupyterhub_config.py`.
|
||||
|
||||
|
||||
Alternatively, use the `jupyterhub token` command to generate a token
|
||||
for a specific hub user by passing the 'username':
|
||||
|
||||
```bash
|
||||
jupyterhub token <username>
|
||||
```
|
||||
|
||||
This command generates a random string to use as a token and registers
|
||||
it for the given user with the Hub's database.
|
||||
|
||||
In [version 0.8.0](./changelog.html), a TOKEN request page for
|
||||
generating an API token is available from the JupyterHub user interface:
|
||||
|
||||

|
||||
|
||||
|
||||
## Add API tokens to the config file
|
||||
|
||||
You may also add a dictionary of API tokens and usernames to the hub's
|
||||
configuration file, `jupyterhub_config.py` (note that
|
||||
the **key** is the 'secret-token' while the **value** is the 'username'):
|
||||
|
||||
```python
|
||||
c.JupyterHub.api_tokens = {
|
||||
'secret-token': 'username',
|
||||
}
|
||||
```
|
||||
|
||||
## Make an API request
|
||||
|
||||
To authenticate your requests, pass the API token in the request's
|
||||
Authorization header.
|
||||
|
||||
### Use requests
|
||||
|
||||
Using the popular Python [requests](http://docs.python-requests.org/en/master/)
|
||||
library, here's example code to make an API request for the users of a JupyterHub
|
||||
deployment. An API GET request is made, and the request sends an API token for
|
||||
authorization. The response contains information about the users:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
api_url = 'http://127.0.0.1:8081/hub/api'
|
||||
|
||||
r = requests.get(api_url + '/users',
|
||||
headers={
|
||||
'Authorization': 'token %s' % token,
|
||||
}
|
||||
)
|
||||
|
||||
r.raise_for_status()
|
||||
users = r.json()
|
||||
```
|
||||
|
||||
This example provides a slightly more complicated request, yet the
|
||||
process is very similar:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
api_url = 'http://127.0.0.1:8081/hub/api'
|
||||
|
||||
data = {'name': 'mygroup', 'users': ['user1', 'user2']}
|
||||
|
||||
r = requests.post(api_url + '/groups/formgrade-data301/users',
|
||||
headers={
|
||||
'Authorization': 'token %s' % token,
|
||||
},
|
||||
json=data
|
||||
)
|
||||
r.raise_for_status()
|
||||
r.json()
|
||||
```
|
||||
|
||||
Note that the API token authorizes **JupyterHub** REST API requests. The same
|
||||
token does **not** authorize access to the [Jupyter Notebook REST API][]
|
||||
provided by notebook servers managed by JupyterHub. A different token is used
|
||||
to access the **Jupyter Notebook** API.
|
||||
|
||||
## Learn more about the API
|
||||
|
||||
You can see the full [JupyterHub REST API][] for details. This REST API Spec can
|
||||
be viewed in a more [interactive style on swagger's petstore][].
|
||||
Both resources contain the same information and differ only in its display.
|
||||
Note: The Swagger specification is being renamed the [OpenAPI Initiative][].
|
||||
|
||||
[interactive style on swagger's petstore]: http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/docs/rest-api.yml#!/default
|
||||
[OpenAPI Initiative]: https://www.openapis.org/
|
||||
[JupyterHub REST API]: ./_static/rest-api/index.html
|
||||
[Jupyter Notebook REST API]: http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyter/notebook/master/notebook/services/api/api.yaml
|
361
docs/source/reference/services.md
Normal file
361
docs/source/reference/services.md
Normal file
@@ -0,0 +1,361 @@
|
||||
# Services
|
||||
|
||||
With version 0.7, JupyterHub adds support for **Services**.
|
||||
|
||||
This section provides the following information about Services:
|
||||
|
||||
- [Definition of a Service](#definition-of-a-service)
|
||||
- [Properties of a Service](#properties-of-a-service)
|
||||
- [Hub-Managed Services](#hub-managed-services)
|
||||
- [Launching a Hub-Managed Service](#launching-a-hub-managed-service)
|
||||
- [Externally-Managed Services](#externally-managed-services)
|
||||
- [Writing your own Services](#writing-your-own-services)
|
||||
- [Hub Authentication and Services](#hub-authentication-and-services)
|
||||
|
||||
## Definition of a Service
|
||||
|
||||
When working with JupyterHub, a **Service** is defined as a process that interacts
|
||||
with the Hub's REST API. A Service may perform a specific or
|
||||
action or task. For example, the following tasks can each be a unique Service:
|
||||
|
||||
- shutting down individuals' single user notebook servers that have been idle
|
||||
for some time
|
||||
- registering additional web servers which should use the Hub's authentication
|
||||
and be served behind the Hub's proxy.
|
||||
|
||||
Two key features help define a Service:
|
||||
|
||||
- Is the Service **managed** by JupyterHub?
|
||||
- Does the Service have a web server that should be added to the proxy's
|
||||
table?
|
||||
|
||||
Currently, these characteristics distinguish two types of Services:
|
||||
|
||||
- A **Hub-Managed Service** which is managed by JupyterHub
|
||||
- An **Externally-Managed Service** which runs its own web server and
|
||||
communicates operation instructions via the Hub's API.
|
||||
|
||||
## Properties of a Service
|
||||
|
||||
A Service may have the following properties:
|
||||
|
||||
- `name: str` - the name of the service
|
||||
- `admin: bool (default - false)` - whether the service should have
|
||||
administrative privileges
|
||||
- `url: str (default - None)` - The URL where the service is/should be. If a
|
||||
url is specified for where the Service runs its own web server,
|
||||
the service will be added to the proxy at `/services/:name`
|
||||
- `api_token: str (default - None)` - For Externally-Managed Services you need to specify
|
||||
an API token to perform API requests to the Hub
|
||||
|
||||
If a service is also to be managed by the Hub, it has a few extra options:
|
||||
|
||||
- `command: (str/Popen list`) - Command for JupyterHub to spawn the service.
|
||||
- Only use this if the service should be a subprocess.
|
||||
- If command is not specified, the Service is assumed to be managed
|
||||
externally.
|
||||
- If a command is specified for launching the Service, the Service will
|
||||
be started and managed by the Hub.
|
||||
- `environment: dict` - additional environment variables for the Service.
|
||||
- `user: str` - the name of a system user to manage the Service. If
|
||||
unspecified, run as the same user as the Hub.
|
||||
|
||||
## Hub-Managed Services
|
||||
|
||||
A **Hub-Managed Service** is started by the Hub, and the Hub is responsible
|
||||
for the Service's actions. A Hub-Managed Service can only be a local
|
||||
subprocess of the Hub. The Hub will take care of starting the process and
|
||||
restarts it if it stops.
|
||||
|
||||
While Hub-Managed Services share some similarities with notebook Spawners,
|
||||
there are no plans for Hub-Managed Services to support the same spawning
|
||||
abstractions as a notebook Spawner.
|
||||
|
||||
If you wish to run a Service in a Docker container or other deployment
|
||||
environments, the Service can be registered as an
|
||||
**Externally-Managed Service**, as described below.
|
||||
|
||||
## Launching a Hub-Managed Service
|
||||
|
||||
A Hub-Managed Service is characterized by its specified `command` for launching
|
||||
the Service. For example, a 'cull idle' notebook server task configured as a
|
||||
Hub-Managed Service would include:
|
||||
|
||||
- the Service name,
|
||||
- admin permissions, and
|
||||
- the `command` to launch the Service which will cull idle servers after a
|
||||
timeout interval
|
||||
|
||||
This example would be configured as follows in `jupyterhub_config.py`:
|
||||
|
||||
```python
|
||||
c.JupyterHub.services = [
|
||||
{
|
||||
'name': 'cull-idle',
|
||||
'admin': True,
|
||||
'command': ['python', '/path/to/cull-idle.py', '--timeout']
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
A Hub-Managed Service may also be configured with additional optional
|
||||
parameters, which describe the environment needed to start the Service process:
|
||||
|
||||
- `environment: dict` - additional environment variables for the Service.
|
||||
- `user: str` - name of the user to run the server if different from the Hub.
|
||||
Requires Hub to be root.
|
||||
- `cwd: path` directory in which to run the Service, if different from the
|
||||
Hub directory.
|
||||
|
||||
The Hub will pass the following environment variables to launch the Service:
|
||||
|
||||
```bash
|
||||
JUPYTERHUB_SERVICE_NAME: The name of the service
|
||||
JUPYTERHUB_API_TOKEN: API token assigned to the service
|
||||
JUPYTERHUB_API_URL: URL for the JupyterHub API (default, http://127.0.0.1:8080/hub/api)
|
||||
JUPYTERHUB_BASE_URL: Base URL of the Hub (https://mydomain[:port]/)
|
||||
JUPYTERHUB_SERVICE_PREFIX: URL path prefix of this service (/services/:service-name/)
|
||||
JUPYTERHUB_SERVICE_URL: Local URL where the service is expected to be listening.
|
||||
Only for proxied web services.
|
||||
```
|
||||
|
||||
For the previous 'cull idle' Service example, these environment variables
|
||||
would be passed to the Service when the Hub starts the 'cull idle' Service:
|
||||
|
||||
```bash
|
||||
JUPYTERHUB_SERVICE_NAME: 'cull-idle'
|
||||
JUPYTERHUB_API_TOKEN: API token assigned to the service
|
||||
JUPYTERHUB_API_URL: http://127.0.0.1:8080/hub/api
|
||||
JUPYTERHUB_BASE_URL: https://mydomain[:port]
|
||||
JUPYTERHUB_SERVICE_PREFIX: /services/cull-idle/
|
||||
```
|
||||
|
||||
See the JupyterHub GitHub repo for additional information about the
|
||||
[`cull-idle` example](https://github.com/jupyterhub/jupyterhub/tree/master/examples/cull-idle).
|
||||
|
||||
## Externally-Managed Services
|
||||
|
||||
You may prefer to use your own service management tools, such as Docker or
|
||||
systemd, to manage a JupyterHub Service. These **Externally-Managed
|
||||
Services**, unlike Hub-Managed Services, are not subprocesses of the Hub. You
|
||||
must tell JupyterHub which API token the Externally-Managed Service is using
|
||||
to perform its API requests. Each Externally-Managed Service will need a
|
||||
unique API token, because the Hub authenticates each API request and the API
|
||||
token is used to identify the originating Service or user.
|
||||
|
||||
A configuration example of an Externally-Managed Service with admin access and
|
||||
running its own web server is:
|
||||
|
||||
```python
|
||||
c.JupyterHub.services = [
|
||||
{
|
||||
'name': 'my-web-service',
|
||||
'url': 'https://10.0.1.1:1984',
|
||||
'api_token': 'super-secret',
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
In this case, the `url` field will be passed along to the Service as
|
||||
`JUPYTERHUB_SERVICE_URL`.
|
||||
|
||||
## Writing your own Services
|
||||
|
||||
When writing your own services, you have a few decisions to make (in addition
|
||||
to what your service does!):
|
||||
|
||||
1. Does my service need a public URL?
|
||||
2. Do I want JupyterHub to start/stop the service?
|
||||
3. Does my service need to authenticate users?
|
||||
|
||||
When a Service is managed by JupyterHub, the Hub will pass the necessary
|
||||
information to the Service via the environment variables described above. A
|
||||
flexible Service, whether managed by the Hub or not, can make use of these
|
||||
same environment variables.
|
||||
|
||||
When you run a service that has a url, it will be accessible under a
|
||||
`/services/` prefix, such as `https://myhub.horse/services/my-service/`. For
|
||||
your service to route proxied requests properly, it must take
|
||||
`JUPYTERHUB_SERVICE_PREFIX` into account when routing requests. For example, a
|
||||
web service would normally service its root handler at `'/'`, but the proxied
|
||||
service would need to serve `JUPYTERHUB_SERVICE_PREFIX + '/'`.
|
||||
|
||||
## Hub Authentication and Services
|
||||
|
||||
JupyterHub 0.7 introduces some utilities for using the Hub's authentication
|
||||
mechanism to govern access to your service. When a user logs into JupyterHub,
|
||||
the Hub sets a **cookie (`jupyterhub-services`)**. The service can use this
|
||||
cookie to authenticate requests.
|
||||
|
||||
JupyterHub ships with a reference implementation of Hub authentication that
|
||||
can be used by services. You may go beyond this reference implementation and
|
||||
create custom hub-authenticating clients and services. We describe the process
|
||||
below.
|
||||
|
||||
The reference, or base, implementation is the [`HubAuth`][HubAuth] class,
|
||||
which implements the requests to the Hub.
|
||||
|
||||
To use HubAuth, you must set the `.api_token`, either programmatically when constructing the class,
|
||||
or via the `JUPYTERHUB_API_TOKEN` environment variable.
|
||||
|
||||
Most of the logic for authentication implementation is found in the
|
||||
[`HubAuth.user_for_cookie`](services.auth.html#jupyterhub.services.auth.HubAuth.user_for_cookie)
|
||||
method, which makes a request of the Hub, and returns:
|
||||
|
||||
- None, if no user could be identified, or
|
||||
- a dict of the following form:
|
||||
|
||||
```python
|
||||
{
|
||||
"name": "username",
|
||||
"groups": ["list", "of", "groups"],
|
||||
"admin": False, # or True
|
||||
}
|
||||
```
|
||||
|
||||
You are then free to use the returned user information to take appropriate
|
||||
action.
|
||||
|
||||
HubAuth also caches the Hub's response for a number of seconds,
|
||||
configurable by the `cookie_cache_max_age` setting (default: five minutes).
|
||||
|
||||
### Flask Example
|
||||
|
||||
For example, you have a Flask service that returns information about a user.
|
||||
JupyterHub's HubAuth class can be used to authenticate requests to the Flask
|
||||
service. See the `service-whoami-flask` example in the
|
||||
[JupyterHub GitHub repo](https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-whoami-flask)
|
||||
for more details.
|
||||
|
||||
```python
|
||||
from functools import wraps
|
||||
import json
|
||||
import os
|
||||
from urllib.parse import quote
|
||||
|
||||
from flask import Flask, redirect, request, Response
|
||||
|
||||
from jupyterhub.services.auth import HubAuth
|
||||
|
||||
prefix = os.environ.get('JUPYTERHUB_SERVICE_PREFIX', '/')
|
||||
|
||||
auth = HubAuth(
|
||||
api_token=os.environ['JUPYTERHUB_API_TOKEN'],
|
||||
cookie_cache_max_age=60,
|
||||
)
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
|
||||
def authenticated(f):
|
||||
"""Decorator for authenticating with the Hub"""
|
||||
@wraps(f)
|
||||
def decorated(*args, **kwargs):
|
||||
cookie = request.cookies.get(auth.cookie_name)
|
||||
if cookie:
|
||||
user = auth.user_for_cookie(cookie)
|
||||
else:
|
||||
user = None
|
||||
if user:
|
||||
return f(user, *args, **kwargs)
|
||||
else:
|
||||
# redirect to login url on failed auth
|
||||
return redirect(auth.login_url + '?next=%s' % quote(request.path))
|
||||
return decorated
|
||||
|
||||
|
||||
@app.route(prefix + '/')
|
||||
@authenticated
|
||||
def whoami(user):
|
||||
return Response(
|
||||
json.dumps(user, indent=1, sort_keys=True),
|
||||
mimetype='application/json',
|
||||
)
|
||||
```
|
||||
|
||||
|
||||
### Authenticating tornado services with JupyterHub
|
||||
|
||||
Since most Jupyter services are written with tornado,
|
||||
we include a mixin class, [`HubAuthenticated`][HubAuthenticated],
|
||||
for quickly authenticating your own tornado services with JupyterHub.
|
||||
|
||||
Tornado's `@web.authenticated` method calls a Handler's `.get_current_user`
|
||||
method to identify the user. Mixing in `HubAuthenticated` defines
|
||||
`get_current_user` to use HubAuth. If you want to configure the HubAuth
|
||||
instance beyond the default, you'll want to define an `initialize` method,
|
||||
such as:
|
||||
|
||||
```python
|
||||
class MyHandler(HubAuthenticated, web.RequestHandler):
|
||||
hub_users = {'inara', 'mal'}
|
||||
|
||||
def initialize(self, hub_auth):
|
||||
self.hub_auth = hub_auth
|
||||
|
||||
@web.authenticated
|
||||
def get(self):
|
||||
...
|
||||
```
|
||||
|
||||
|
||||
The HubAuth will automatically load the desired configuration from the Service
|
||||
environment variables.
|
||||
|
||||
If you want to limit user access, you can whitelist users through either the
|
||||
`.hub_users` attribute or `.hub_groups`. These are sets that check against the
|
||||
username and user group list, respectively. If a user matches neither the user
|
||||
list nor the group list, they will not be allowed access. If both are left
|
||||
undefined, then any user will be allowed.
|
||||
|
||||
|
||||
### Implementing your own Authentication with JupyterHub
|
||||
|
||||
If you don't want to use the reference implementation
|
||||
(e.g. you find the implementation a poor fit for your Flask app),
|
||||
you can implement authentication via the Hub yourself.
|
||||
We recommend looking at the [`HubAuth`][HubAuth] class implementation for reference,
|
||||
and taking note of the following process:
|
||||
|
||||
1. retrieve the cookie `jupyterhub-services` from the request.
|
||||
2. Make an API request `GET /hub/api/authorizations/cookie/jupyterhub-services/cookie-value`,
|
||||
where cookie-value is the url-encoded value of the `jupyterhub-services` cookie.
|
||||
This request must be authenticated with a Hub API token in the `Authorization` header.
|
||||
For example, with [requests][]:
|
||||
|
||||
```python
|
||||
r = requests.get(
|
||||
'/'.join((["http://127.0.0.1:8081/hub/api",
|
||||
"authorizations/cookie/jupyterhub-services",
|
||||
quote(encrypted_cookie, safe=''),
|
||||
]),
|
||||
headers = {
|
||||
'Authorization' : 'token %s' % api_token,
|
||||
},
|
||||
)
|
||||
r.raise_for_status()
|
||||
user = r.json()
|
||||
```
|
||||
|
||||
3. On success, the reply will be a JSON model describing the user:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "inara",
|
||||
"groups": ["serenity", "guild"],
|
||||
|
||||
}
|
||||
```
|
||||
|
||||
An example of using an Externally-Managed Service and authentication is
|
||||
in [nbviewer README]_ section on securing the notebook viewer,
|
||||
and an example of its configuration is found [here](https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/base.py#L94).
|
||||
nbviewer can also be run as a Hub-Managed Service as described [nbviewer README]_
|
||||
section on securing the notebook viewer.
|
||||
|
||||
|
||||
[requests]: http://docs.python-requests.org/en/master/
|
||||
[services_auth]: api/services.auth.html
|
||||
[HubAuth]: api/services.auth.html#jupyterhub.services.auth.HubAuth
|
||||
[HubAuthenticated]: api/services.auth.html#jupyterhub.services.auth.HubAuthenticated
|
||||
[nbviewer example]: https://github.com/jupyter/nbviewer#securing-the-notebook-viewer
|
213
docs/source/reference/spawners.md
Normal file
213
docs/source/reference/spawners.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Spawners
|
||||
|
||||
A [Spawner][] starts each single-user notebook server.
|
||||
The Spawner represents an abstract interface to a process,
|
||||
and a custom Spawner needs to be able to take three actions:
|
||||
|
||||
- start the process
|
||||
- poll whether the process is still running
|
||||
- stop the process
|
||||
|
||||
|
||||
## Examples
|
||||
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
|
||||
Some examples include:
|
||||
|
||||
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
|
||||
* `dockerspawner.DockerSpawner` for spawning identical Docker containers for
|
||||
each users
|
||||
* `dockerspawner.SystemUserSpawner` for spawning Docker containers with an
|
||||
environment and home directory for each users
|
||||
* both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
|
||||
launching containers on remote machines
|
||||
- [SudoSpawner](https://github.com/jupyterhub/sudospawner) enables JupyterHub to
|
||||
run without being root, by spawning an intermediate process via `sudo`
|
||||
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
|
||||
servers using batch systems
|
||||
- [RemoteSpawner](https://github.com/zonca/remotespawner) to spawn notebooks
|
||||
and a remote server and tunnel the port via SSH
|
||||
|
||||
|
||||
## Spawner control methods
|
||||
|
||||
### Spawner.start
|
||||
|
||||
`Spawner.start` should start the single-user server for a single user.
|
||||
Information about the user can be retrieved from `self.user`,
|
||||
an object encapsulating the user's name, authentication, and server info.
|
||||
|
||||
The return value of `Spawner.start` should be the (ip, port) of the running server.
|
||||
|
||||
**NOTE:** When writing coroutines, *never* `yield` in between a database change and a commit.
|
||||
|
||||
Most `Spawner.start` functions will look similar to this example:
|
||||
|
||||
```python
|
||||
def start(self):
|
||||
self.ip = '127.0.0.1'
|
||||
self.port = random_port()
|
||||
yield self._actually_start_server_somehow()
|
||||
return (self.ip, self.port)
|
||||
```
|
||||
|
||||
When `Spawner.start` returns, the single-user server process should actually be running,
|
||||
not just requested. JupyterHub can handle `Spawner.start` being very slow
|
||||
(such as PBS-style batch queues, or instantiating whole AWS instances)
|
||||
via relaxing the `Spawner.start_timeout` config value.
|
||||
|
||||
### Spawner.poll
|
||||
|
||||
`Spawner.poll` should check if the spawner is still running.
|
||||
It should return `None` if it is still running,
|
||||
and an integer exit status, otherwise.
|
||||
|
||||
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
|
||||
to check if the local process is still running.
|
||||
|
||||
### Spawner.stop
|
||||
|
||||
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
|
||||
|
||||
|
||||
## Spawner state
|
||||
|
||||
JupyterHub should be able to stop and restart without tearing down
|
||||
single-user notebook servers. To do this task, a Spawner may need to persist
|
||||
some information that can be restored later.
|
||||
A JSON-able dictionary of state can be used to store persisted information.
|
||||
|
||||
Unlike start, stop, and poll methods, the state methods must not be coroutines.
|
||||
|
||||
For the single-process case, the Spawner state is only the process ID of the server:
|
||||
|
||||
```python
|
||||
def get_state(self):
|
||||
"""get the current state"""
|
||||
state = super().get_state()
|
||||
if self.pid:
|
||||
state['pid'] = self.pid
|
||||
return state
|
||||
|
||||
def load_state(self, state):
|
||||
"""load state from the database"""
|
||||
super().load_state(state)
|
||||
if 'pid' in state:
|
||||
self.pid = state['pid']
|
||||
|
||||
def clear_state(self):
|
||||
"""clear any state (called after shutdown)"""
|
||||
super().clear_state()
|
||||
self.pid = 0
|
||||
```
|
||||
|
||||
|
||||
## Spawner options form
|
||||
|
||||
(new in 0.4)
|
||||
|
||||
Some deployments may want to offer options to users to influence how their servers are started.
|
||||
This may include cluster-based deployments, where users specify what resources should be available,
|
||||
or docker-based deployments where users can select from a list of base images.
|
||||
|
||||
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
|
||||
inserted unmodified into the spawn form.
|
||||
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
|
||||
|
||||

|
||||
|
||||
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
|
||||
|
||||
See [this example](https://github.com/jupyterhub/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
|
||||
|
||||
### `Spawner.options_from_form`
|
||||
|
||||
Options from this form will always be a dictionary of lists of strings, e.g.:
|
||||
|
||||
```python
|
||||
{
|
||||
'integer': ['5'],
|
||||
'text': ['some text'],
|
||||
'select': ['a', 'b'],
|
||||
}
|
||||
```
|
||||
|
||||
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
|
||||
which is a method to turn the form data into the correct structure.
|
||||
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
|
||||
|
||||
```python
|
||||
def options_from_form(self, formdata):
|
||||
options = {}
|
||||
options['integer'] = int(formdata['integer'][0]) # single integer value
|
||||
options['text'] = formdata['text'][0] # single string value
|
||||
options['select'] = formdata['select'] # list already correct
|
||||
options['notinform'] = 'extra info' # not in the form at all
|
||||
return options
|
||||
```
|
||||
|
||||
which would return:
|
||||
|
||||
```python
|
||||
{
|
||||
'integer': 5,
|
||||
'text': 'some text',
|
||||
'select': ['a', 'b'],
|
||||
'notinform': 'extra info',
|
||||
}
|
||||
```
|
||||
|
||||
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
|
||||
|
||||
|
||||
[Spawner]: https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/spawner.py
|
||||
|
||||
## Writing a custom spawner
|
||||
|
||||
If you are interested in building a custom spawner, you can read [this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/spawners.html).
|
||||
|
||||
## Spawners, resource limits, and guarantees (Optional)
|
||||
|
||||
Some spawners of the single-user notebook servers allow setting limits or
|
||||
guarantees on resources, such as CPU and memory. To provide a consistent
|
||||
experience for sysadmins and users, we provide a standard way to set and
|
||||
discover these resource limits and guarantees, such as for memory and CPU. For
|
||||
the limits and guarantees to be useful, the spawner must implement support for
|
||||
them.
|
||||
|
||||
### Memory Limits & Guarantees
|
||||
|
||||
`c.Spawner.mem_limit`: A **limit** specifies the *maximum amount of memory*
|
||||
that may be allocated, though there is no promise that the maximum amount will
|
||||
be available. In supported spawners, you can set `c.Spawner.mem_limit` to
|
||||
limit the total amount of memory that a single-user notebook server can
|
||||
allocate. Attempting to use more memory than this limit will cause errors. The
|
||||
single-user notebook server can discover its own memory limit by looking at
|
||||
the environment variable `MEM_LIMIT`, which is specified in absolute bytes.
|
||||
|
||||
`c.Spawner.mem_guarantee`: Sometimes, a **guarantee** of a *minumum amount of
|
||||
memory* is desirable. In this case, you can set `c.Spawner.mem_guarantee` to
|
||||
to provide a guarantee that at minimum this much memory will always be
|
||||
available for the single-user notebook server to use. The environment variable
|
||||
`MEM_GUARANTEE` will also be set in the single-user notebook server.
|
||||
|
||||
The spawner's underlying system or cluster is responsible for enforcing these
|
||||
limits and providing these guarantees. If these values are set to `None`, no
|
||||
limits or guarantees are provided, and no environment values are set.
|
||||
|
||||
### CPU Limits & Guarantees
|
||||
|
||||
`c.Spawner.cpu_limit`: In supported spawners, you can set
|
||||
`c.Spawner.cpu_limit` to limit the total number of cpu-cores that a
|
||||
single-user notebook server can use. These can be fractional - `0.5` means 50%
|
||||
of one CPU core, `4.0` is 4 cpu-cores, etc. This value is also set in the
|
||||
single-user notebook server's environment variable `CPU_LIMIT`. The limit does
|
||||
not claim that you will be able to use all the CPU up to your limit as other
|
||||
higher priority applications might be taking up CPU.
|
||||
|
||||
`c.Spawner.cpu_guarantee`: You can set `c.Spawner.cpu_guarantee` to provide a
|
||||
guarantee for CPU usage. The environment variable `CPU_GUARANTEE` will be set
|
||||
in the single-user notebook server when a guarantee is being provided.
|
||||
|
||||
The spawner's underlying system or cluster is responsible for enforcing these
|
||||
limits and providing these guarantees. If these values are set to `None`, no
|
||||
limits or guarantees are provided, and no environment values are set.
|
133
docs/source/reference/technical-overview.md
Normal file
133
docs/source/reference/technical-overview.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Technical Overview
|
||||
|
||||
The **Technical Overview** section gives you a high-level view of:
|
||||
|
||||
- JupyterHub's Subsystems: Hub, Proxy, Single-User Notebook Server
|
||||
- how the subsystems interact
|
||||
- the process from JupyterHub access to user login
|
||||
- JupyterHub's default behavior
|
||||
- customizing JupyterHub
|
||||
|
||||
The goal of this section is to share a deeper technical understanding of
|
||||
JupyterHub and how it works.
|
||||
|
||||
## The Subsystems: Hub, Proxy, Single-User Notebook Server
|
||||
|
||||
JupyterHub is a set of processes that together provide a single user Jupyter
|
||||
Notebook server for each person in a group. Three major subsystems are started
|
||||
by the `jupyterhub` command line program:
|
||||
|
||||
- **Hub** (Python/Tornado): manages user accounts, authentication, and
|
||||
coordinates Single User Notebook Servers using a Spawner.
|
||||
|
||||
- **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy
|
||||
to route HTTP requests to the Hub and Single User Notebook Servers.
|
||||
[configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
|
||||
(node-http-proxy) is the default proxy.
|
||||
|
||||
- **Single-User Notebook Server** (Python/Tornado): a dedicated,
|
||||
single-user, Jupyter Notebook server is started for each user on the system
|
||||
when the user logs in. The object that starts the single-user notebook
|
||||
servers is called a **Spawner**.
|
||||
|
||||

|
||||
|
||||
## How the Subsystems Interact
|
||||
|
||||
Users access JupyterHub through a web browser, by going to the IP address or
|
||||
the domain name of the server.
|
||||
|
||||
The basic principles of operation are:
|
||||
|
||||
- The Hub spawns the proxy (in the default JupyterHub configuration)
|
||||
- The proxy forwards all requests to the Hub by default
|
||||
- The Hub handles login, and spawns single-user notebook servers on demand
|
||||
- The Hub configures the proxy to forward url prefixes to single-user notebook
|
||||
servers
|
||||
|
||||
The proxy is the only process that listens on a public interface. The Hub sits
|
||||
behind the proxy at `/hub`. Single-user servers sit behind the proxy at
|
||||
`/user/[username]`.
|
||||
|
||||
Different **[authenticators](./authenticators.html)** control access
|
||||
to JupyterHub. The default one (PAM) uses the user accounts on the server where
|
||||
JupyterHub is running. If you use this, you will need to create a user account
|
||||
on the system for each user on your team. Using other authenticators, you can
|
||||
allow users to sign in with e.g. a GitHub account, or with any single-sign-on
|
||||
system your organization has.
|
||||
|
||||
Next, **[spawners](./spawners.html)** control how JupyterHub starts
|
||||
the individual notebook server for each user. The default spawner will
|
||||
start a notebook server on the same machine running under their system username.
|
||||
The other main option is to start each server in a separate container, often
|
||||
using Docker.
|
||||
|
||||
## The Process from JupyterHub Access to User Login
|
||||
|
||||
When a user accesses JupyterHub, the following events take place:
|
||||
|
||||
- Login data is handed to the [Authenticator](./authenticators.html) instance for
|
||||
validation
|
||||
- The Authenticator returns the username if the login information is valid
|
||||
- A single-user notebook server instance is [spawned](./spawners.html) for the
|
||||
logged-in user
|
||||
- When the single-user notebook server starts, the proxy is notified to forward
|
||||
requests to `/user/[username]/*` to the single-user notebook server.
|
||||
- A cookie is set on `/hub/`, containing an encrypted token. (Prior to version
|
||||
0.8, a cookie for `/user/[username]` was used too.)
|
||||
- The browser is redirected to `/user/[username]`, and the request is handled by
|
||||
the single-user notebook server.
|
||||
|
||||
The single-user server identifies the user with the Hub via OAuth:
|
||||
|
||||
- on request, the single-user server checks a cookie
|
||||
- if no cookie is set, redirect to the Hub for verification via OAuth
|
||||
- after verification at the Hub, the browser is redirected back to the
|
||||
single-user server
|
||||
- the token is verified and stored in a cookie
|
||||
- if no user is identified, the browser is redirected back to `/hub/login`
|
||||
|
||||
## Default Behavior
|
||||
|
||||
By default, the **Proxy** listens on all public interfaces on port 8000.
|
||||
Thus you can reach JupyterHub through either:
|
||||
|
||||
- `http://localhost:8000`
|
||||
- or any other public IP or domain pointing to your system.
|
||||
|
||||
In their default configuration, the other services, the **Hub** and
|
||||
**Single-User Notebook Servers**, all communicate with each other on localhost
|
||||
only.
|
||||
|
||||
By default, starting JupyterHub will write two files to disk in the current
|
||||
working directory:
|
||||
|
||||
- `jupyterhub.sqlite` is the SQLite database containing all of the state of the
|
||||
**Hub**. This file allows the **Hub** to remember which users are running and
|
||||
where, as well as storing other information enabling you to restart parts of
|
||||
JupyterHub separately. It is important to note that this database contains
|
||||
**no** sensitive information other than **Hub** usernames.
|
||||
- `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
|
||||
This file needs to persist so that a **Hub** server restart will avoid
|
||||
invalidating cookies. Conversely, deleting this file and restarting the server
|
||||
effectively invalidates all login cookies. The cookie secret file is discussed
|
||||
in the [Cookie Secret section of the Security Settings document](./security-basics.html).
|
||||
|
||||
The location of these files can be specified via configuration settings. It is
|
||||
recommended that these files be stored in standard UNIX filesystem locations,
|
||||
such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
|
||||
all security and runtime files.
|
||||
|
||||
## Customizing JupyterHub
|
||||
|
||||
There are two basic extension points for JupyterHub:
|
||||
|
||||
- How users are authenticated by [Authenticators](./authenticators.html)
|
||||
- How user's single-user notebook server processes are started by
|
||||
[Spawners](./spawners.html)
|
||||
|
||||
Each is governed by a customizable class, and JupyterHub ships with basic
|
||||
defaults for each.
|
||||
|
||||
To enable custom authentication and/or spawning, subclass `Authenticator` or
|
||||
`Spawner`, and override the relevant methods.
|
106
docs/source/reference/upgrading.md
Normal file
106
docs/source/reference/upgrading.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Upgrading JupyterHub and its database
|
||||
|
||||
From time to time, you may wish to upgrade JupyterHub to take advantage
|
||||
of new releases. Much of this process is automated using scripts,
|
||||
such as those generated by alembic for database upgrades. Before upgrading a
|
||||
JupyterHub deployment, it's critical to backup your data and configurations
|
||||
before shutting down the JupyterHub process and server.
|
||||
|
||||
## Databases: SQLite (default) or RDBMS (PostgreSQL, MySQL)
|
||||
|
||||
The default database for JupyterHub is a [SQLite](https://sqlite.org) database.
|
||||
We have chosen SQLite as JupyterHub's default for its lightweight simplicity
|
||||
in certain uses such as testing, small deployments and workshops.
|
||||
|
||||
When running a long term deployment or a production system, we recommend using
|
||||
a traditional RDBMS database, such as [PostgreSQL](https://www.postgresql.org)
|
||||
or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE`
|
||||
statement.
|
||||
|
||||
For production systems, SQLite has some disadvantages when used with JupyterHub:
|
||||
|
||||
- `upgrade-db` may not work, and you may need to start with a fresh database
|
||||
- `downgrade-db` **will not** work if you want to rollback to an earlier
|
||||
version, so backup the `jupyterhub.sqlite` file before upgrading
|
||||
|
||||
The sqlite documentation provides a helpful page about [when to use sqlite and
|
||||
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
|
||||
|
||||
## The upgrade process
|
||||
|
||||
Five fundamental process steps are needed when upgrading JupyterHub and its
|
||||
database:
|
||||
|
||||
1. Backup JupyterHub database
|
||||
2. Backup JupyterHub configuration file
|
||||
3. Shutdown the Hub
|
||||
4. Upgrade JupyterHub
|
||||
5. Upgrade the database using run `jupyterhub upgrade-db`
|
||||
|
||||
Let's take a closer look at each step in the upgrade process as well as some
|
||||
additional information about JupyterHub databases.
|
||||
|
||||
### Backup JupyterHub database
|
||||
|
||||
To prevent unintended loss of data or configuration information, you should
|
||||
back up the JupyterHub database (the default SQLite database or a RDBMS
|
||||
database using PostgreSQL, MySQL, or others supported by SQLAlchemy):
|
||||
|
||||
- If using the default SQLite database, back up the `jupyterhub.sqlite`
|
||||
database.
|
||||
- If using an RDBMS database such as PostgreSQL, MySQL, or other supported by
|
||||
SQLAlchemy, back up the JupyterHub database.
|
||||
|
||||
Losing the Hub database is often not a big deal. Information that resides only
|
||||
in the Hub database includes:
|
||||
|
||||
- active login tokens (user cookies, service tokens)
|
||||
- users added via GitHub UI, instead of config files
|
||||
- info about running servers
|
||||
|
||||
If the following conditions are true, you should be fine clearing the Hub
|
||||
database and starting over:
|
||||
|
||||
- users specified in config file
|
||||
- user servers are stopped during upgrade
|
||||
- don't mind causing users to login again after upgrade
|
||||
|
||||
### Backup JupyterHub configuration file
|
||||
|
||||
Additionally, backing up your configuration file, `jupyterhub_config.py`, to
|
||||
a secure location.
|
||||
|
||||
### Shutdown JupyterHub
|
||||
|
||||
Prior to shutting down JupyterHub, you should notify the Hub users of the
|
||||
scheduled downtime. This gives users the opportunity to finish any outstanding
|
||||
work in process.
|
||||
|
||||
Next, shutdown the JupyterHub service.
|
||||
|
||||
### Upgrade JupyterHub
|
||||
|
||||
Follow directions that correspond to your package manager, `pip` or `conda`,
|
||||
for the new JupyterHub release. These directions will guide you to the
|
||||
specific command. In general, `pip install -U jupyterhub` or
|
||||
`conda upgrade jupyterhub`
|
||||
|
||||
### Upgrade JupyterHub databases
|
||||
|
||||
To run the upgrade process for JupyterHub databases, enter:
|
||||
|
||||
```
|
||||
jupyterhub upgrade-db
|
||||
```
|
||||
|
||||
## Upgrade checklist
|
||||
|
||||
1. Backup JupyterHub database:
|
||||
- `jupyterhub.sqlite` when using the default sqlite database
|
||||
- Your JupyterHub database when using an RDBMS
|
||||
2. Backup JupyterHub configuration file: `jupyterhub_config.py`
|
||||
3. Shutdown the Hub
|
||||
4. Upgrade JupyterHub
|
||||
- `pip install -U jupyterhub` when using `pip`
|
||||
- `conda upgrade jupyterhub` when using `conda`
|
||||
5. Upgrade the database using run `jupyterhub upgrade-db`
|
112
docs/source/reference/websecurity.md
Normal file
112
docs/source/reference/websecurity.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# Security Overview
|
||||
|
||||
The **Security Overview** section helps you learn about:
|
||||
|
||||
- the design of JupyterHub with respect to web security
|
||||
- the semi-trusted user
|
||||
- the available mitigations to protect untrusted users from each other
|
||||
- the value of periodic security audits.
|
||||
|
||||
This overview also helps you obtain a deeper understanding of how JupyterHub
|
||||
works.
|
||||
|
||||
## Semi-trusted and untrusted users
|
||||
|
||||
JupyterHub is designed to be a *simple multi-user server for modestly sized
|
||||
groups* of **semi-trusted** users. While the design reflects serving semi-trusted
|
||||
users, JupyterHub is not necessarily unsuitable for serving **untrusted** users.
|
||||
|
||||
Using JupyterHub with **untrusted** users does mean more work by the
|
||||
administrator. Much care is required to secure a Hub, with extra caution on
|
||||
protecting users from each other as the Hub is serving untrusted users.
|
||||
|
||||
One aspect of JupyterHub's *design simplicity* for **semi-trusted** users is that
|
||||
the Hub and single-user servers are placed in a *single domain*, behind a
|
||||
[*proxy*][configurable-http-proxy]. If the Hub is serving untrusted
|
||||
users, many of the web's cross-site protections are not applied between
|
||||
single-user servers and the Hub, or between single-user servers and each
|
||||
other, since browsers see the whole thing (proxy, Hub, and single user
|
||||
servers) as a single website (i.e. single domain).
|
||||
|
||||
## Protect users from each other
|
||||
|
||||
To protect users from each other, a user must **never** be able to write arbitrary
|
||||
HTML and serve it to another user on the Hub's domain. JupyterHub's
|
||||
authentication setup prevents a user writing arbitrary HTML and serving it to
|
||||
another user because only the owner of a given single-user notebook server is
|
||||
allowed to view user-authored pages served by the given single-user notebook
|
||||
server.
|
||||
|
||||
To protect all users from each other, JupyterHub administrators must
|
||||
ensure that:
|
||||
|
||||
* A user **does not have permission** to modify their single-user notebook server,
|
||||
including:
|
||||
- A user **may not** install new packages in the Python environment that runs
|
||||
their single-user server.
|
||||
- If the `PATH` is used to resolve the single-user executable (instead of
|
||||
using an absolute path), a user **may not** create new files in any `PATH`
|
||||
directory that precedes the directory containing `jupyterhub-singleuser`.
|
||||
- A user may not modify environment variables (e.g. PATH, PYTHONPATH) for
|
||||
their single-user server.
|
||||
* A user **may not** modify the configuration of the notebook server
|
||||
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
|
||||
|
||||
If any additional services are run on the same domain as the Hub, the services
|
||||
**must never** display user-authored HTML that is neither *sanitized* nor *sandboxed*
|
||||
(e.g. IFramed) to any user that lacks authentication as the author of a file.
|
||||
|
||||
## Mitigate security issues
|
||||
|
||||
Several approaches to mitigating these issues with configuration
|
||||
options provided by JupyterHub include:
|
||||
|
||||
### Enable subdomains
|
||||
|
||||
JupyterHub provides the ability to run single-user servers on their own
|
||||
subdomains. This means the cross-origin protections between servers has the
|
||||
desired effect, and user servers and the Hub are protected from each other. A
|
||||
user's single-user server will be at `username.jupyter.mydomain.com`. This also
|
||||
requires all user subdomains to point to the same address, which is most easily
|
||||
accomplished with wildcard DNS. Since this spreads the service across multiple
|
||||
domains, you will need wildcard SSL, as well. Unfortunately, for many
|
||||
institutional domains, wildcard DNS and SSL are not available. **If you do plan
|
||||
to serve untrusted users, enabling subdomains is highly encouraged**, as it
|
||||
resolves the cross-site issues.
|
||||
|
||||
### Disable user config
|
||||
|
||||
If subdomains are not available or not desirable, JupyterHub provides a a
|
||||
configuration option `Spawner.disable_user_config`, which can be set to prevent
|
||||
the user-owned configuration files from being loaded. After implementing this
|
||||
option, PATHs and package installation and PATHs are the other things that the
|
||||
admin must enforce.
|
||||
|
||||
### Prevent spawners from evaluating shell configuration files
|
||||
|
||||
For most Spawners, `PATH` is not something users can influence, but care should
|
||||
be taken to ensure that the Spawner does *not* evaluate shell configuration
|
||||
files prior to launching the server.
|
||||
|
||||
### Isolate packages using virtualenv
|
||||
|
||||
Package isolation is most easily handled by running the single-user server in
|
||||
a virtualenv with disabled system-site-packages. The user should not have
|
||||
permission to install packages into this environment.
|
||||
|
||||
It is important to note that the control over the environment only affects the
|
||||
single-user server, and not the environment(s) in which the user's kernel(s)
|
||||
may run. Installing additional packages in the kernel environment does not
|
||||
pose additional risk to the web application's security.
|
||||
|
||||
## Security audits
|
||||
|
||||
We recommend that you do periodic reviews of your deployment's security. It's
|
||||
good practice to keep JupyterHub, configurable-http-proxy, and nodejs
|
||||
versions up to date.
|
||||
|
||||
A handy website for testing your deployment is
|
||||
[Qualsys' SSL analyzer tool](https://www.ssllabs.com/ssltest/analyze.html).
|
||||
|
||||
|
||||
[configurable-http-proxy]: https://github.com/jupyterhub/configurable-http-proxy
|
Reference in New Issue
Block a user