Restructure doc folder structure

This commit is contained in:
Carol Willing
2017-08-09 13:22:59 -07:00
parent 4f826d8245
commit eb6a2f9e89
17 changed files with 16 additions and 12 deletions

View File

@@ -0,0 +1,142 @@
# Authenticators
The [Authenticator][] is the mechanism for authorizing users to use the
Hub and single user notebook servers.
## The default PAM Authenticator
JupyterHub ships only with the default [PAM][]-based Authenticator,
for logging in with local user accounts via a username and password.
## The OAuthenticator
Some login mechanisms, such as [OAuth][], don't map onto username and
password authentication, and instead use tokens. When using these
mechanisms, you can override the login handlers.
You can see an example implementation of an Authenticator that uses
[GitHub OAuth][] at [OAuthenticator][].
JupyterHub's [OAuthenticator][] currently supports the following
popular services:
- Auth0
- Bitbucket
- CILogon
- GitHub
- GitLab
- Globus
- Google
- MediaWiki
- Okpy
- OpenShift
A generic implementation, which you can use for OAuth authentication
with any provider, is also available.
## Additional Authenticators
- ldapauthenticator for LDAP
- tmpauthenticator for temporary accounts
## Technical Overview of Authentication
### How the Base Authenticator works
The base authenticator uses simple username and password authentication.
The base Authenticator has one central method:
#### Authenticator.authenticate method
Authenticator.authenticate(handler, data)
This method is passed the Tornado `RequestHandler` and the `POST data`
from JupyterHub's login form. Unless the login form has been customized,
`data` will have two keys:
- `username`
- `password`
The `authenticate` method's job is simple:
- return the username (non-empty str) of the authenticated user if
authentication is successful
- return `None` otherwise
Writing an Authenticator that looks up passwords in a dictionary
requires only overriding this one method:
```python
from tornado import gen
from IPython.utils.traitlets import Dict
from jupyterhub.auth import Authenticator
class DictionaryAuthenticator(Authenticator):
passwords = Dict(config=True,
help="""dict of username:password for authentication"""
)
@gen.coroutine
def authenticate(self, handler, data):
if self.passwords.get(data['username']) == data['password']:
return data['username']
```
#### Normalize usernames
Since the Authenticator and Spawner both use the same username,
sometimes you want to transform the name coming from the authentication service
(e.g. turning email addresses into local system usernames) before adding them to the Hub service.
Authenticators can define `normalize_username`, which takes a username.
The default normalization is to cast names to lowercase
For simple mappings, a configurable dict `Authenticator.username_map` is used to turn one name into another:
```python
c.Authenticator.username_map = {
'service-name': 'localname'
}
```
#### Validate usernames
In most cases, there is a very limited set of acceptable usernames.
Authenticators can define `validate_username(username)`,
which should return True for a valid username and False for an invalid one.
The primary effect this has is improving error messages during user creation.
The default behavior is to use configurable `Authenticator.username_pattern`,
which is a regular expression string for validation.
To only allow usernames that start with 'w':
```python
c.Authenticator.username_pattern = r'w.*'
```
### How to write a custom authenticator
You can use custom Authenticator subclasses to enable authentication
via other mechanisms. One such example is using [GitHub OAuth][].
Because the username is passed from the Authenticator to the Spawner,
a custom Authenticator and Spawner are often used together.
See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
If you are interested in writing a custom authenticator, you can read
[this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/authenticators.html).
## JupyterHub as an OAuth provider
Beginning with version 0.8, JupyterHub is an OAuth provider.
[Authenticator]: https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/auth.py
[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
[OAuth]: https://en.wikipedia.org/wiki/OAuth
[GitHub OAuth]: https://developer.github.com/v3/oauth/
[OAuthenticator]: https://github.com/jupyterhub/oauthenticator

View File

@@ -0,0 +1,211 @@
# Configuration examples
This section provides examples, including configuration files and tips, for the
following configurations:
- Using GitHub OAuth
- Using nginx reverse proxy
## Using GitHub OAuth
In this example, we show a configuration file for a fairly standard JupyterHub
deployment with the following assumptions:
* Running JupyterHub on a single cloud server
* Using SSL on the standard HTTPS port 443
* Using GitHub OAuth (using oauthenticator) for login
* Users exist locally on the server
* Users' notebooks to be served from `~/assignments` to allow users to browse
for notebooks within other users' home directories
* You want the landing page for each user to be a `Welcome.ipynb` notebook in
their assignments directory.
* All runtime files are put into `/srv/jupyterhub` and log files in `/var/log`.
The `jupyterhub_config.py` file would have these settings:
```python
# jupyterhub_config.py file
c = get_config()
import os
pjoin = os.path.join
runtime_dir = os.path.join('/srv/jupyterhub')
ssl_dir = pjoin(runtime_dir, 'ssl')
if not os.path.exists(ssl_dir):
os.makedirs(ssl_dir)
# Allows multiple single-server per user
c.JupyterHub.allow_named_servers = True
# https on :443
c.JupyterHub.port = 443
c.JupyterHub.ssl_key = pjoin(ssl_dir, 'ssl.key')
c.JupyterHub.ssl_cert = pjoin(ssl_dir, 'ssl.cert')
# put the JupyterHub cookie secret and state db
# in /var/run/jupyterhub
c.JupyterHub.cookie_secret_file = pjoin(runtime_dir, 'cookie_secret')
c.JupyterHub.db_url = pjoin(runtime_dir, 'jupyterhub.sqlite')
# or `--db=/path/to/jupyterhub.sqlite` on the command-line
# put the log file in /var/log
c.JupyterHub.extra_log_file = '/var/log/jupyterhub.log'
# use GitHub OAuthenticator for local users
c.JupyterHub.authenticator_class = 'oauthenticator.LocalGitHubOAuthenticator'
c.GitHubOAuthenticator.oauth_callback_url = os.environ['OAUTH_CALLBACK_URL']
# create system users that don't exist yet
c.LocalAuthenticator.create_system_users = True
# specify users and admin
c.Authenticator.whitelist = {'rgbkrk', 'minrk', 'jhamrick'}
c.Authenticator.admin_users = {'jhamrick', 'rgbkrk'}
# start single-user notebook servers in ~/assignments,
# with ~/assignments/Welcome.ipynb as the default landing page
# this config could also be put in
# /etc/jupyter/jupyter_notebook_config.py
c.Spawner.notebook_dir = '~/assignments'
c.Spawner.args = ['--NotebookApp.default_url=/notebooks/Welcome.ipynb']
```
Using the GitHub Authenticator requires a few additional
environment variable to be set prior to launching JupyterHub:
```bash
export GITHUB_CLIENT_ID=github_id
export GITHUB_CLIENT_SECRET=github_secret
export OAUTH_CALLBACK_URL=https://example.com/hub/oauth_callback
export CONFIGPROXY_AUTH_TOKEN=super-secret
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
```
## Using nginx reverse proxy
In the following example, we show configuration files for a JupyterHub server
running locally on port `8000` but accessible from the outside on the standard
SSL port `443`. This could be useful if the JupyterHub server machine is also
hosting other domains or content on `443`. The goal in this example is to
satisfy the following:
* JupyterHub is running on a server, accessed *only* via `HUB.DOMAIN.TLD:443`
* On the same machine, `NO_HUB.DOMAIN.TLD` strictly serves different content,
also on port `443`
* `nginx` is used to manage the web servers / reverse proxy (which means that
only nginx will be able to bind two servers to `443`)
* After testing, the server in question should be able to score an A+ on the
Qualys SSL Labs [SSL Server Test](https://www.ssllabs.com/ssltest/)
Let's start out with needed JupyterHub configuration in `jupyterhub_config.py`:
```python
# Force the proxy to only listen to connections to 127.0.0.1
c.JupyterHub.ip = '127.0.0.1'
```
The **`nginx` server config file** is fairly standard fare except for the two
`location` blocks within the `HUB.DOMAIN.TLD` config file:
```bash
# HTTP server to redirect all 80 traffic to SSL/HTTPS
server {
listen 80;
server_name HUB.DOMAIN.TLD;
# Tell all requests to port 80 to be 302 redirected to HTTPS
return 302 https://$host$request_uri;
}
# HTTPS server to handle JupyterHub
server {
listen 443;
ssl on;
server_name HUB.DOMAIN.TLD;
ssl_certificate /etc/letsencrypt/live/HUB.DOMAIN.TLD/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/HUB.DOMAIN.TLD/privkey.pem;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_dhparam /etc/ssl/certs/dhparam.pem;
ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';
ssl_session_timeout 1d;
ssl_session_cache shared:SSL:50m;
ssl_stapling on;
ssl_stapling_verify on;
add_header Strict-Transport-Security max-age=15768000;
# Managing literal requests to the JupyterHub front end
location / {
proxy_pass https://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Managing WebHook/Socket requests between hub user servers and external proxy
location ~* /(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? {
proxy_pass https://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
# Managing requests to verify letsencrypt host
location ~ /.well-known {
allow all;
}
}
```
`nginx` will now be the front facing element of JupyterHub on `443` which means
it is also free to bind other servers, like `NO_HUB.DOMAIN.TLD` to the same port
on the same machine and network interface. In fact, one can simply use the same
server blocks as above for `NO_HUB` and simply add line for the root directory
of the site as well as the applicable location call:
```bash
server {
listen 80;
server_name NO_HUB.DOMAIN.TLD;
# Tell all requests to port 80 to be 302 redirected to HTTPS
return 302 https://$host$request_uri;
}
server {
listen 443;
ssl on;
# INSERT OTHER SSL PARAMETERS HERE AS ABOVE
# Set the appropriate root directory
root /var/www/html
# Set URI handling
location / {
try_files $uri $uri/ =404;
}
# Managing requests to verify letsencrypt host
location ~ /.well-known {
allow all;
}
}
```
Now just restart `nginx`, restart the JupyterHub, and enjoy accessing
https://HUB.DOMAIN.TLD while serving other content securely on
https://NO_HUB.DOMAIN.TLD.

View File

@@ -0,0 +1,14 @@
Technical Reference
===================
.. toctree::
:maxdepth: 2
technical-overview
websecurity
authenticators
spawners
services
rest
upgrading
config-examples

View File

@@ -0,0 +1,132 @@
# Using JupyterHub's REST API
This section will give you information on:
- what you can do with the API
- create an API token
- add API tokens to the config files
- make an API request programmatically using the requests library
- learn more about JupyterHub's API
## What you can do with the API
Using the [JupyterHub REST API][], you can perform actions on the Hub,
such as:
- checking which users are active
- adding or removing users
- stopping or starting single user notebook servers
- authenticating services
A [REST](https://en.wikipedia.org/wiki/Representational_state_transfer)
API provides a standard way for users to get and send information to the
Hub.
## Create an API token
To send requests using JupyterHub API, you must pass an API token with
the request.
As of [version 0.6.0](./changelog.html), the preferred way of
generating an API token is:
```bash
openssl rand -hex 32
```
This `openssl` command generates a potential token that can then be
added to JupyterHub using `.api_tokens` configuration setting in
`jupyterhub_config.py`.
Alternatively, use the `jupyterhub token` command to generate a token
for a specific hub user by passing the 'username':
```bash
jupyterhub token <username>
```
This command generates a random string to use as a token and registers
it for the given user with the Hub's database.
In [version 0.8.0](./changelog.html), a TOKEN request page for
generating an API token is available from the JupyterHub user interface:
![Request API TOKEN page](images/api-token-request.png)
## Add API tokens to the config file
You may also add a dictionary of API tokens and usernames to the hub's
configuration file, `jupyterhub_config.py` (note that
the **key** is the 'secret-token' while the **value** is the 'username'):
```python
c.JupyterHub.api_tokens = {
'secret-token': 'username',
}
```
## Make an API request
To authenticate your requests, pass the API token in the request's
Authorization header.
### Use requests
Using the popular Python [requests](http://docs.python-requests.org/en/master/)
library, here's example code to make an API request for the users of a JupyterHub
deployment. An API GET request is made, and the request sends an API token for
authorization. The response contains information about the users:
```python
import requests
api_url = 'http://127.0.0.1:8081/hub/api'
r = requests.get(api_url + '/users',
headers={
'Authorization': 'token %s' % token,
}
)
r.raise_for_status()
users = r.json()
```
This example provides a slightly more complicated request, yet the
process is very similar:
```python
import requests
api_url = 'http://127.0.0.1:8081/hub/api'
data = {'name': 'mygroup', 'users': ['user1', 'user2']}
r = requests.post(api_url + '/groups/formgrade-data301/users',
headers={
'Authorization': 'token %s' % token,
},
json=data
)
r.raise_for_status()
r.json()
```
Note that the API token authorizes **JupyterHub** REST API requests. The same
token does **not** authorize access to the [Jupyter Notebook REST API][]
provided by notebook servers managed by JupyterHub. A different token is used
to access the **Jupyter Notebook** API.
## Learn more about the API
You can see the full [JupyterHub REST API][] for details. This REST API Spec can
be viewed in a more [interactive style on swagger's petstore][].
Both resources contain the same information and differ only in its display.
Note: The Swagger specification is being renamed the [OpenAPI Initiative][].
[interactive style on swagger's petstore]: http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/docs/rest-api.yml#!/default
[OpenAPI Initiative]: https://www.openapis.org/
[JupyterHub REST API]: ./_static/rest-api/index.html
[Jupyter Notebook REST API]: http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyter/notebook/master/notebook/services/api/api.yaml

View File

@@ -0,0 +1,361 @@
# Services
With version 0.7, JupyterHub adds support for **Services**.
This section provides the following information about Services:
- [Definition of a Service](#definition-of-a-service)
- [Properties of a Service](#properties-of-a-service)
- [Hub-Managed Services](#hub-managed-services)
- [Launching a Hub-Managed Service](#launching-a-hub-managed-service)
- [Externally-Managed Services](#externally-managed-services)
- [Writing your own Services](#writing-your-own-services)
- [Hub Authentication and Services](#hub-authentication-and-services)
## Definition of a Service
When working with JupyterHub, a **Service** is defined as a process that interacts
with the Hub's REST API. A Service may perform a specific or
action or task. For example, the following tasks can each be a unique Service:
- shutting down individuals' single user notebook servers that have been idle
for some time
- registering additional web servers which should use the Hub's authentication
and be served behind the Hub's proxy.
Two key features help define a Service:
- Is the Service **managed** by JupyterHub?
- Does the Service have a web server that should be added to the proxy's
table?
Currently, these characteristics distinguish two types of Services:
- A **Hub-Managed Service** which is managed by JupyterHub
- An **Externally-Managed Service** which runs its own web server and
communicates operation instructions via the Hub's API.
## Properties of a Service
A Service may have the following properties:
- `name: str` - the name of the service
- `admin: bool (default - false)` - whether the service should have
administrative privileges
- `url: str (default - None)` - The URL where the service is/should be. If a
url is specified for where the Service runs its own web server,
the service will be added to the proxy at `/services/:name`
- `api_token: str (default - None)` - For Externally-Managed Services you need to specify
an API token to perform API requests to the Hub
If a service is also to be managed by the Hub, it has a few extra options:
- `command: (str/Popen list`) - Command for JupyterHub to spawn the service.
- Only use this if the service should be a subprocess.
- If command is not specified, the Service is assumed to be managed
externally.
- If a command is specified for launching the Service, the Service will
be started and managed by the Hub.
- `environment: dict` - additional environment variables for the Service.
- `user: str` - the name of a system user to manage the Service. If
unspecified, run as the same user as the Hub.
## Hub-Managed Services
A **Hub-Managed Service** is started by the Hub, and the Hub is responsible
for the Service's actions. A Hub-Managed Service can only be a local
subprocess of the Hub. The Hub will take care of starting the process and
restarts it if it stops.
While Hub-Managed Services share some similarities with notebook Spawners,
there are no plans for Hub-Managed Services to support the same spawning
abstractions as a notebook Spawner.
If you wish to run a Service in a Docker container or other deployment
environments, the Service can be registered as an
**Externally-Managed Service**, as described below.
## Launching a Hub-Managed Service
A Hub-Managed Service is characterized by its specified `command` for launching
the Service. For example, a 'cull idle' notebook server task configured as a
Hub-Managed Service would include:
- the Service name,
- admin permissions, and
- the `command` to launch the Service which will cull idle servers after a
timeout interval
This example would be configured as follows in `jupyterhub_config.py`:
```python
c.JupyterHub.services = [
{
'name': 'cull-idle',
'admin': True,
'command': ['python', '/path/to/cull-idle.py', '--timeout']
}
]
```
A Hub-Managed Service may also be configured with additional optional
parameters, which describe the environment needed to start the Service process:
- `environment: dict` - additional environment variables for the Service.
- `user: str` - name of the user to run the server if different from the Hub.
Requires Hub to be root.
- `cwd: path` directory in which to run the Service, if different from the
Hub directory.
The Hub will pass the following environment variables to launch the Service:
```bash
JUPYTERHUB_SERVICE_NAME: The name of the service
JUPYTERHUB_API_TOKEN: API token assigned to the service
JUPYTERHUB_API_URL: URL for the JupyterHub API (default, http://127.0.0.1:8080/hub/api)
JUPYTERHUB_BASE_URL: Base URL of the Hub (https://mydomain[:port]/)
JUPYTERHUB_SERVICE_PREFIX: URL path prefix of this service (/services/:service-name/)
JUPYTERHUB_SERVICE_URL: Local URL where the service is expected to be listening.
Only for proxied web services.
```
For the previous 'cull idle' Service example, these environment variables
would be passed to the Service when the Hub starts the 'cull idle' Service:
```bash
JUPYTERHUB_SERVICE_NAME: 'cull-idle'
JUPYTERHUB_API_TOKEN: API token assigned to the service
JUPYTERHUB_API_URL: http://127.0.0.1:8080/hub/api
JUPYTERHUB_BASE_URL: https://mydomain[:port]
JUPYTERHUB_SERVICE_PREFIX: /services/cull-idle/
```
See the JupyterHub GitHub repo for additional information about the
[`cull-idle` example](https://github.com/jupyterhub/jupyterhub/tree/master/examples/cull-idle).
## Externally-Managed Services
You may prefer to use your own service management tools, such as Docker or
systemd, to manage a JupyterHub Service. These **Externally-Managed
Services**, unlike Hub-Managed Services, are not subprocesses of the Hub. You
must tell JupyterHub which API token the Externally-Managed Service is using
to perform its API requests. Each Externally-Managed Service will need a
unique API token, because the Hub authenticates each API request and the API
token is used to identify the originating Service or user.
A configuration example of an Externally-Managed Service with admin access and
running its own web server is:
```python
c.JupyterHub.services = [
{
'name': 'my-web-service',
'url': 'https://10.0.1.1:1984',
'api_token': 'super-secret',
}
]
```
In this case, the `url` field will be passed along to the Service as
`JUPYTERHUB_SERVICE_URL`.
## Writing your own Services
When writing your own services, you have a few decisions to make (in addition
to what your service does!):
1. Does my service need a public URL?
2. Do I want JupyterHub to start/stop the service?
3. Does my service need to authenticate users?
When a Service is managed by JupyterHub, the Hub will pass the necessary
information to the Service via the environment variables described above. A
flexible Service, whether managed by the Hub or not, can make use of these
same environment variables.
When you run a service that has a url, it will be accessible under a
`/services/` prefix, such as `https://myhub.horse/services/my-service/`. For
your service to route proxied requests properly, it must take
`JUPYTERHUB_SERVICE_PREFIX` into account when routing requests. For example, a
web service would normally service its root handler at `'/'`, but the proxied
service would need to serve `JUPYTERHUB_SERVICE_PREFIX + '/'`.
## Hub Authentication and Services
JupyterHub 0.7 introduces some utilities for using the Hub's authentication
mechanism to govern access to your service. When a user logs into JupyterHub,
the Hub sets a **cookie (`jupyterhub-services`)**. The service can use this
cookie to authenticate requests.
JupyterHub ships with a reference implementation of Hub authentication that
can be used by services. You may go beyond this reference implementation and
create custom hub-authenticating clients and services. We describe the process
below.
The reference, or base, implementation is the [`HubAuth`][HubAuth] class,
which implements the requests to the Hub.
To use HubAuth, you must set the `.api_token`, either programmatically when constructing the class,
or via the `JUPYTERHUB_API_TOKEN` environment variable.
Most of the logic for authentication implementation is found in the
[`HubAuth.user_for_cookie`](services.auth.html#jupyterhub.services.auth.HubAuth.user_for_cookie)
method, which makes a request of the Hub, and returns:
- None, if no user could be identified, or
- a dict of the following form:
```python
{
"name": "username",
"groups": ["list", "of", "groups"],
"admin": False, # or True
}
```
You are then free to use the returned user information to take appropriate
action.
HubAuth also caches the Hub's response for a number of seconds,
configurable by the `cookie_cache_max_age` setting (default: five minutes).
### Flask Example
For example, you have a Flask service that returns information about a user.
JupyterHub's HubAuth class can be used to authenticate requests to the Flask
service. See the `service-whoami-flask` example in the
[JupyterHub GitHub repo](https://github.com/jupyterhub/jupyterhub/tree/master/examples/service-whoami-flask)
for more details.
```python
from functools import wraps
import json
import os
from urllib.parse import quote
from flask import Flask, redirect, request, Response
from jupyterhub.services.auth import HubAuth
prefix = os.environ.get('JUPYTERHUB_SERVICE_PREFIX', '/')
auth = HubAuth(
api_token=os.environ['JUPYTERHUB_API_TOKEN'],
cookie_cache_max_age=60,
)
app = Flask(__name__)
def authenticated(f):
"""Decorator for authenticating with the Hub"""
@wraps(f)
def decorated(*args, **kwargs):
cookie = request.cookies.get(auth.cookie_name)
if cookie:
user = auth.user_for_cookie(cookie)
else:
user = None
if user:
return f(user, *args, **kwargs)
else:
# redirect to login url on failed auth
return redirect(auth.login_url + '?next=%s' % quote(request.path))
return decorated
@app.route(prefix + '/')
@authenticated
def whoami(user):
return Response(
json.dumps(user, indent=1, sort_keys=True),
mimetype='application/json',
)
```
### Authenticating tornado services with JupyterHub
Since most Jupyter services are written with tornado,
we include a mixin class, [`HubAuthenticated`][HubAuthenticated],
for quickly authenticating your own tornado services with JupyterHub.
Tornado's `@web.authenticated` method calls a Handler's `.get_current_user`
method to identify the user. Mixing in `HubAuthenticated` defines
`get_current_user` to use HubAuth. If you want to configure the HubAuth
instance beyond the default, you'll want to define an `initialize` method,
such as:
```python
class MyHandler(HubAuthenticated, web.RequestHandler):
hub_users = {'inara', 'mal'}
def initialize(self, hub_auth):
self.hub_auth = hub_auth
@web.authenticated
def get(self):
...
```
The HubAuth will automatically load the desired configuration from the Service
environment variables.
If you want to limit user access, you can whitelist users through either the
`.hub_users` attribute or `.hub_groups`. These are sets that check against the
username and user group list, respectively. If a user matches neither the user
list nor the group list, they will not be allowed access. If both are left
undefined, then any user will be allowed.
### Implementing your own Authentication with JupyterHub
If you don't want to use the reference implementation
(e.g. you find the implementation a poor fit for your Flask app),
you can implement authentication via the Hub yourself.
We recommend looking at the [`HubAuth`][HubAuth] class implementation for reference,
and taking note of the following process:
1. retrieve the cookie `jupyterhub-services` from the request.
2. Make an API request `GET /hub/api/authorizations/cookie/jupyterhub-services/cookie-value`,
where cookie-value is the url-encoded value of the `jupyterhub-services` cookie.
This request must be authenticated with a Hub API token in the `Authorization` header.
For example, with [requests][]:
```python
r = requests.get(
'/'.join((["http://127.0.0.1:8081/hub/api",
"authorizations/cookie/jupyterhub-services",
quote(encrypted_cookie, safe=''),
]),
headers = {
'Authorization' : 'token %s' % api_token,
},
)
r.raise_for_status()
user = r.json()
```
3. On success, the reply will be a JSON model describing the user:
```json
{
"name": "inara",
"groups": ["serenity", "guild"],
}
```
An example of using an Externally-Managed Service and authentication is
in [nbviewer README]_ section on securing the notebook viewer,
and an example of its configuration is found [here](https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/base.py#L94).
nbviewer can also be run as a Hub-Managed Service as described [nbviewer README]_
section on securing the notebook viewer.
[requests]: http://docs.python-requests.org/en/master/
[services_auth]: api/services.auth.html
[HubAuth]: api/services.auth.html#jupyterhub.services.auth.HubAuth
[HubAuthenticated]: api/services.auth.html#jupyterhub.services.auth.HubAuthenticated
[nbviewer example]: https://github.com/jupyter/nbviewer#securing-the-notebook-viewer

View File

@@ -0,0 +1,213 @@
# Spawners
A [Spawner][] starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
- start the process
- poll whether the process is still running
- stop the process
## Examples
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
Some examples include:
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
* `dockerspawner.DockerSpawner` for spawning identical Docker containers for
each users
* `dockerspawner.SystemUserSpawner` for spawning Docker containers with an
environment and home directory for each users
* both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
launching containers on remote machines
- [SudoSpawner](https://github.com/jupyterhub/sudospawner) enables JupyterHub to
run without being root, by spawning an intermediate process via `sudo`
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
servers using batch systems
- [RemoteSpawner](https://github.com/zonca/remotespawner) to spawn notebooks
and a remote server and tunnel the port via SSH
## Spawner control methods
### Spawner.start
`Spawner.start` should start the single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
The return value of `Spawner.start` should be the (ip, port) of the running server.
**NOTE:** When writing coroutines, *never* `yield` in between a database change and a commit.
Most `Spawner.start` functions will look similar to this example:
```python
def start(self):
self.ip = '127.0.0.1'
self.port = random_port()
yield self._actually_start_server_somehow()
return (self.ip, self.port)
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
### Spawner.poll
`Spawner.poll` should check if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
single-user notebook servers. To do this task, a Spawner may need to persist
some information that can be restored later.
A JSON-able dictionary of state can be used to store persisted information.
Unlike start, stop, and poll methods, the state methods must not be coroutines.
For the single-process case, the Spawner state is only the process ID of the server:
```python
def get_state(self):
"""get the current state"""
state = super().get_state()
if self.pid:
state['pid'] = self.pid
return state
def load_state(self, state):
"""load state from the database"""
super().load_state(state)
if 'pid' in state:
self.pid = state['pid']
def clear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid = 0
```
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
![spawn-form](images/spawn-form.png)
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyterhub/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
```python
{
'integer': ['5'],
'text': ['some text'],
'select': ['a', 'b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
```python
def options_from_form(self, formdata):
options = {}
options['integer'] = int(formdata['integer'][0]) # single integer value
options['text'] = formdata['text'][0] # single string value
options['select'] = formdata['select'] # list already correct
options['notinform'] = 'extra info' # not in the form at all
return options
```
which would return:
```python
{
'integer': 5,
'text': 'some text',
'select': ['a', 'b'],
'notinform': 'extra info',
}
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
[Spawner]: https://github.com/jupyterhub/jupyterhub/blob/master/jupyterhub/spawner.py
## Writing a custom spawner
If you are interested in building a custom spawner, you can read [this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/spawners.html).
## Spawners, resource limits, and guarantees (Optional)
Some spawners of the single-user notebook servers allow setting limits or
guarantees on resources, such as CPU and memory. To provide a consistent
experience for sysadmins and users, we provide a standard way to set and
discover these resource limits and guarantees, such as for memory and CPU. For
the limits and guarantees to be useful, the spawner must implement support for
them.
### Memory Limits & Guarantees
`c.Spawner.mem_limit`: A **limit** specifies the *maximum amount of memory*
that may be allocated, though there is no promise that the maximum amount will
be available. In supported spawners, you can set `c.Spawner.mem_limit` to
limit the total amount of memory that a single-user notebook server can
allocate. Attempting to use more memory than this limit will cause errors. The
single-user notebook server can discover its own memory limit by looking at
the environment variable `MEM_LIMIT`, which is specified in absolute bytes.
`c.Spawner.mem_guarantee`: Sometimes, a **guarantee** of a *minumum amount of
memory* is desirable. In this case, you can set `c.Spawner.mem_guarantee` to
to provide a guarantee that at minimum this much memory will always be
available for the single-user notebook server to use. The environment variable
`MEM_GUARANTEE` will also be set in the single-user notebook server.
The spawner's underlying system or cluster is responsible for enforcing these
limits and providing these guarantees. If these values are set to `None`, no
limits or guarantees are provided, and no environment values are set.
### CPU Limits & Guarantees
`c.Spawner.cpu_limit`: In supported spawners, you can set
`c.Spawner.cpu_limit` to limit the total number of cpu-cores that a
single-user notebook server can use. These can be fractional - `0.5` means 50%
of one CPU core, `4.0` is 4 cpu-cores, etc. This value is also set in the
single-user notebook server's environment variable `CPU_LIMIT`. The limit does
not claim that you will be able to use all the CPU up to your limit as other
higher priority applications might be taking up CPU.
`c.Spawner.cpu_guarantee`: You can set `c.Spawner.cpu_guarantee` to provide a
guarantee for CPU usage. The environment variable `CPU_GUARANTEE` will be set
in the single-user notebook server when a guarantee is being provided.
The spawner's underlying system or cluster is responsible for enforcing these
limits and providing these guarantees. If these values are set to `None`, no
limits or guarantees are provided, and no environment values are set.

View File

@@ -0,0 +1,133 @@
# Technical Overview
The **Technical Overview** section gives you a high-level view of:
- JupyterHub's Subsystems: Hub, Proxy, Single-User Notebook Server
- how the subsystems interact
- the process from JupyterHub access to user login
- JupyterHub's default behavior
- customizing JupyterHub
The goal of this section is to share a deeper technical understanding of
JupyterHub and how it works.
## The Subsystems: Hub, Proxy, Single-User Notebook Server
JupyterHub is a set of processes that together provide a single user Jupyter
Notebook server for each person in a group. Three major subsystems are started
by the `jupyterhub` command line program:
- **Hub** (Python/Tornado): manages user accounts, authentication, and
coordinates Single User Notebook Servers using a Spawner.
- **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy
to route HTTP requests to the Hub and Single User Notebook Servers.
[configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
(node-http-proxy) is the default proxy.
- **Single-User Notebook Server** (Python/Tornado): a dedicated,
single-user, Jupyter Notebook server is started for each user on the system
when the user logs in. The object that starts the single-user notebook
servers is called a **Spawner**.
![JupyterHub subsystems](images/jhub-parts.png)
## How the Subsystems Interact
Users access JupyterHub through a web browser, by going to the IP address or
the domain name of the server.
The basic principles of operation are:
- The Hub spawns the proxy (in the default JupyterHub configuration)
- The proxy forwards all requests to the Hub by default
- The Hub handles login, and spawns single-user notebook servers on demand
- The Hub configures the proxy to forward url prefixes to single-user notebook
servers
The proxy is the only process that listens on a public interface. The Hub sits
behind the proxy at `/hub`. Single-user servers sit behind the proxy at
`/user/[username]`.
Different **[authenticators](./authenticators.html)** control access
to JupyterHub. The default one (PAM) uses the user accounts on the server where
JupyterHub is running. If you use this, you will need to create a user account
on the system for each user on your team. Using other authenticators, you can
allow users to sign in with e.g. a GitHub account, or with any single-sign-on
system your organization has.
Next, **[spawners](./spawners.html)** control how JupyterHub starts
the individual notebook server for each user. The default spawner will
start a notebook server on the same machine running under their system username.
The other main option is to start each server in a separate container, often
using Docker.
## The Process from JupyterHub Access to User Login
When a user accesses JupyterHub, the following events take place:
- Login data is handed to the [Authenticator](./authenticators.html) instance for
validation
- The Authenticator returns the username if the login information is valid
- A single-user notebook server instance is [spawned](./spawners.html) for the
logged-in user
- When the single-user notebook server starts, the proxy is notified to forward
requests to `/user/[username]/*` to the single-user notebook server.
- A cookie is set on `/hub/`, containing an encrypted token. (Prior to version
0.8, a cookie for `/user/[username]` was used too.)
- The browser is redirected to `/user/[username]`, and the request is handled by
the single-user notebook server.
The single-user server identifies the user with the Hub via OAuth:
- on request, the single-user server checks a cookie
- if no cookie is set, redirect to the Hub for verification via OAuth
- after verification at the Hub, the browser is redirected back to the
single-user server
- the token is verified and stored in a cookie
- if no user is identified, the browser is redirected back to `/hub/login`
## Default Behavior
By default, the **Proxy** listens on all public interfaces on port 8000.
Thus you can reach JupyterHub through either:
- `http://localhost:8000`
- or any other public IP or domain pointing to your system.
In their default configuration, the other services, the **Hub** and
**Single-User Notebook Servers**, all communicate with each other on localhost
only.
By default, starting JupyterHub will write two files to disk in the current
working directory:
- `jupyterhub.sqlite` is the SQLite database containing all of the state of the
**Hub**. This file allows the **Hub** to remember which users are running and
where, as well as storing other information enabling you to restart parts of
JupyterHub separately. It is important to note that this database contains
**no** sensitive information other than **Hub** usernames.
- `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
This file needs to persist so that a **Hub** server restart will avoid
invalidating cookies. Conversely, deleting this file and restarting the server
effectively invalidates all login cookies. The cookie secret file is discussed
in the [Cookie Secret section of the Security Settings document](./security-basics.html).
The location of these files can be specified via configuration settings. It is
recommended that these files be stored in standard UNIX filesystem locations,
such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
all security and runtime files.
## Customizing JupyterHub
There are two basic extension points for JupyterHub:
- How users are authenticated by [Authenticators](./authenticators.html)
- How user's single-user notebook server processes are started by
[Spawners](./spawners.html)
Each is governed by a customizable class, and JupyterHub ships with basic
defaults for each.
To enable custom authentication and/or spawning, subclass `Authenticator` or
`Spawner`, and override the relevant methods.

View File

@@ -0,0 +1,106 @@
# Upgrading JupyterHub and its database
From time to time, you may wish to upgrade JupyterHub to take advantage
of new releases. Much of this process is automated using scripts,
such as those generated by alembic for database upgrades. Before upgrading a
JupyterHub deployment, it's critical to backup your data and configurations
before shutting down the JupyterHub process and server.
## Databases: SQLite (default) or RDBMS (PostgreSQL, MySQL)
The default database for JupyterHub is a [SQLite](https://sqlite.org) database.
We have chosen SQLite as JupyterHub's default for its lightweight simplicity
in certain uses such as testing, small deployments and workshops.
When running a long term deployment or a production system, we recommend using
a traditional RDBMS database, such as [PostgreSQL](https://www.postgresql.org)
or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE`
statement.
For production systems, SQLite has some disadvantages when used with JupyterHub:
- `upgrade-db` may not work, and you may need to start with a fresh database
- `downgrade-db` **will not** work if you want to rollback to an earlier
version, so backup the `jupyterhub.sqlite` file before upgrading
The sqlite documentation provides a helpful page about [when to use sqlite and
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
## The upgrade process
Five fundamental process steps are needed when upgrading JupyterHub and its
database:
1. Backup JupyterHub database
2. Backup JupyterHub configuration file
3. Shutdown the Hub
4. Upgrade JupyterHub
5. Upgrade the database using run `jupyterhub upgrade-db`
Let's take a closer look at each step in the upgrade process as well as some
additional information about JupyterHub databases.
### Backup JupyterHub database
To prevent unintended loss of data or configuration information, you should
back up the JupyterHub database (the default SQLite database or a RDBMS
database using PostgreSQL, MySQL, or others supported by SQLAlchemy):
- If using the default SQLite database, back up the `jupyterhub.sqlite`
database.
- If using an RDBMS database such as PostgreSQL, MySQL, or other supported by
SQLAlchemy, back up the JupyterHub database.
Losing the Hub database is often not a big deal. Information that resides only
in the Hub database includes:
- active login tokens (user cookies, service tokens)
- users added via GitHub UI, instead of config files
- info about running servers
If the following conditions are true, you should be fine clearing the Hub
database and starting over:
- users specified in config file
- user servers are stopped during upgrade
- don't mind causing users to login again after upgrade
### Backup JupyterHub configuration file
Additionally, backing up your configuration file, `jupyterhub_config.py`, to
a secure location.
### Shutdown JupyterHub
Prior to shutting down JupyterHub, you should notify the Hub users of the
scheduled downtime. This gives users the opportunity to finish any outstanding
work in process.
Next, shutdown the JupyterHub service.
### Upgrade JupyterHub
Follow directions that correspond to your package manager, `pip` or `conda`,
for the new JupyterHub release. These directions will guide you to the
specific command. In general, `pip install -U jupyterhub` or
`conda upgrade jupyterhub`
### Upgrade JupyterHub databases
To run the upgrade process for JupyterHub databases, enter:
```
jupyterhub upgrade-db
```
## Upgrade checklist
1. Backup JupyterHub database:
- `jupyterhub.sqlite` when using the default sqlite database
- Your JupyterHub database when using an RDBMS
2. Backup JupyterHub configuration file: `jupyterhub_config.py`
3. Shutdown the Hub
4. Upgrade JupyterHub
- `pip install -U jupyterhub` when using `pip`
- `conda upgrade jupyterhub` when using `conda`
5. Upgrade the database using run `jupyterhub upgrade-db`

View File

@@ -0,0 +1,112 @@
# Security Overview
The **Security Overview** section helps you learn about:
- the design of JupyterHub with respect to web security
- the semi-trusted user
- the available mitigations to protect untrusted users from each other
- the value of periodic security audits.
This overview also helps you obtain a deeper understanding of how JupyterHub
works.
## Semi-trusted and untrusted users
JupyterHub is designed to be a *simple multi-user server for modestly sized
groups* of **semi-trusted** users. While the design reflects serving semi-trusted
users, JupyterHub is not necessarily unsuitable for serving **untrusted** users.
Using JupyterHub with **untrusted** users does mean more work by the
administrator. Much care is required to secure a Hub, with extra caution on
protecting users from each other as the Hub is serving untrusted users.
One aspect of JupyterHub's *design simplicity* for **semi-trusted** users is that
the Hub and single-user servers are placed in a *single domain*, behind a
[*proxy*][configurable-http-proxy]. If the Hub is serving untrusted
users, many of the web's cross-site protections are not applied between
single-user servers and the Hub, or between single-user servers and each
other, since browsers see the whole thing (proxy, Hub, and single user
servers) as a single website (i.e. single domain).
## Protect users from each other
To protect users from each other, a user must **never** be able to write arbitrary
HTML and serve it to another user on the Hub's domain. JupyterHub's
authentication setup prevents a user writing arbitrary HTML and serving it to
another user because only the owner of a given single-user notebook server is
allowed to view user-authored pages served by the given single-user notebook
server.
To protect all users from each other, JupyterHub administrators must
ensure that:
* A user **does not have permission** to modify their single-user notebook server,
including:
- A user **may not** install new packages in the Python environment that runs
their single-user server.
- If the `PATH` is used to resolve the single-user executable (instead of
using an absolute path), a user **may not** create new files in any `PATH`
directory that precedes the directory containing `jupyterhub-singleuser`.
- A user may not modify environment variables (e.g. PATH, PYTHONPATH) for
their single-user server.
* A user **may not** modify the configuration of the notebook server
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
If any additional services are run on the same domain as the Hub, the services
**must never** display user-authored HTML that is neither *sanitized* nor *sandboxed*
(e.g. IFramed) to any user that lacks authentication as the author of a file.
## Mitigate security issues
Several approaches to mitigating these issues with configuration
options provided by JupyterHub include:
### Enable subdomains
JupyterHub provides the ability to run single-user servers on their own
subdomains. This means the cross-origin protections between servers has the
desired effect, and user servers and the Hub are protected from each other. A
user's single-user server will be at `username.jupyter.mydomain.com`. This also
requires all user subdomains to point to the same address, which is most easily
accomplished with wildcard DNS. Since this spreads the service across multiple
domains, you will need wildcard SSL, as well. Unfortunately, for many
institutional domains, wildcard DNS and SSL are not available. **If you do plan
to serve untrusted users, enabling subdomains is highly encouraged**, as it
resolves the cross-site issues.
### Disable user config
If subdomains are not available or not desirable, JupyterHub provides a a
configuration option `Spawner.disable_user_config`, which can be set to prevent
the user-owned configuration files from being loaded. After implementing this
option, PATHs and package installation and PATHs are the other things that the
admin must enforce.
### Prevent spawners from evaluating shell configuration files
For most Spawners, `PATH` is not something users can influence, but care should
be taken to ensure that the Spawner does *not* evaluate shell configuration
files prior to launching the server.
### Isolate packages using virtualenv
Package isolation is most easily handled by running the single-user server in
a virtualenv with disabled system-site-packages. The user should not have
permission to install packages into this environment.
It is important to note that the control over the environment only affects the
single-user server, and not the environment(s) in which the user's kernel(s)
may run. Installing additional packages in the kernel environment does not
pose additional risk to the web application's security.
## Security audits
We recommend that you do periodic reviews of your deployment's security. It's
good practice to keep JupyterHub, configurable-http-proxy, and nodejs
versions up to date.
A handy website for testing your deployment is
[Qualsys' SSL analyzer tool](https://www.ssllabs.com/ssltest/analyze.html).
[configurable-http-proxy]: https://github.com/jupyterhub/configurable-http-proxy