Merge branch 'master' into named_servers

This commit is contained in:
Min RK
2017-07-25 18:29:01 +02:00
committed by GitHub
7 changed files with 134 additions and 149 deletions

View File

@@ -4,7 +4,7 @@ Configuration Reference
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
howitworks technical-overview
websecurity websecurity
rest rest
authenticators authenticators

View File

@@ -4,7 +4,6 @@ Getting Started
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2
technical-overview
config-basics config-basics
networking-basics networking-basics
security-basics security-basics

View File

@@ -1,77 +0,0 @@
# How JupyterHub works
JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user Jupyter notebook server.
There are three basic processes involved:
- multi-user Hub (Python/Tornado)
- [configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy) (node-http-proxy)
- multiple single-user IPython notebook servers (Python/IPython/Tornado)
The proxy is the only process that listens on a public interface.
The Hub sits behind the proxy at `/hub`.
Single-user servers sit behind the proxy at `/user/[username]`.
## Logging in
When a new browser logs in to JupyterHub, the following events take place:
- Login data is handed to the [Authenticator](#authentication) instance for validation
- The Authenticator returns the username, if login information is valid
- A single-user server instance is [Spawned](#spawning) for the logged-in user
- When the server starts, the proxy is notified to forward `/user/[username]/*` to the single-user server
- Two cookies are set, one for `/hub/` and another for `/user/[username]`,
containing an encrypted token.
- The browser is redirected to `/user/[username]`, which is handled by the single-user server
Logging into a single-user server is authenticated via the Hub:
- On request, the single-user server forwards the encrypted cookie to the Hub for verification
- The Hub replies with the username if it is a valid cookie
- If the user is the owner of the server, access is allowed
- If it is the wrong user or an invalid cookie, the browser is redirected to `/hub/login`
## Customizing JupyterHub
There are two basic extension points for JupyterHub: How users are authenticated,
and how their server processes are started.
Each is governed by a customizable class,
and JupyterHub ships with just the most basic version of each.
To enable custom authentication and/or spawning,
subclass Authenticator or Spawner,
and override the relevant methods.
### Authentication
Authentication is customizable via the Authenticator class.
Authentication can be replaced by any mechanism,
such as OAuth, Kerberos, etc.
JupyterHub only ships with [PAM](https://en.wikipedia.org/wiki/Pluggable_authentication_module) authentication,
which requires the server to be run as root,
or at least with access to the PAM service,
which regular users typically do not have
(on Ubuntu, this requires being added to the `shadow` group).
[More info on custom Authenticators](authenticators.html).
See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
### Spawning
Each single-user server is started by a Spawner.
The Spawner represents an abstract interface to a process,
and needs to be able to take three actions:
1. start the process
2. poll whether the process is still running
3. stop the process
[More info on custom Spawners](spawners.html).
See a list of custom Spawners [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).

View File

@@ -40,7 +40,6 @@ Contents
**Getting Started** **Getting Started**
* :doc:`technical-overview`
* :doc:`config-basics` * :doc:`config-basics`
* :doc:`networking-basics` * :doc:`networking-basics`
* :doc:`security-basics` * :doc:`security-basics`
@@ -50,7 +49,7 @@ Contents
**Configuration Reference** **Configuration Reference**
* :doc:`howitworks` * :doc:`technical-overview`
* :doc:`websecurity` * :doc:`websecurity`
* :doc:`rest` * :doc:`rest`
* :doc:`authenticators` * :doc:`authenticators`

View File

@@ -2,7 +2,11 @@
## Platform support ## Platform support
JupyterHub is supported on Linux/Unix based systems. JupyterHub is supported on Linux/Unix based systems. To use JupyterHub, you need
a Unix server (typically Linux) running somewhere that is accessible to your
team on the network. The JupyterHub server can be on an internal network at your
organization, or it can run on the public internet (in which case, take care
with the Hub's [security](./security-basics.html)).
JupyterHub officially **does not** support Windows. You may be able to use JupyterHub officially **does not** support Windows. You may be able to use
JupyterHub on Windows if you use a Spawner and Authenticator that work on JupyterHub on Windows if you use a Spawner and Authenticator that work on
@@ -12,11 +16,13 @@ minor Windows compatibility issues (such as basic installation) **may** be accep
however. For Windows-based systems, we would recommend running JupyterHub in a however. For Windows-based systems, we would recommend running JupyterHub in a
docker container or Linux VM. docker container or Linux VM.
[Additional Reference:](http://www.tornadoweb.org/en/stable/#installation) Tornado's documentation on Windows platform support [Additional Reference:](http://www.tornadoweb.org/en/stable/#installation)
Tornado's documentation on Windows platform support
## Planning your installation ## Planning your installation
Prior to beginning installation, it's helpful to consider some of the following: Prior to beginning installation, it's helpful to consider some of the following:
- deployment system (bare metal, Docker) - deployment system (bare metal, Docker)
- Authentication (PAM, OAuth, etc.) - Authentication (PAM, OAuth, etc.)
- Spawner of singleuser notebook servers (Docker, Batch, etc.) - Spawner of singleuser notebook servers (Docker, Batch, etc.)
@@ -29,6 +35,6 @@ Prior to beginning installation, it's helpful to consider some of the following:
It is recommended to put all of the files used by JupyterHub into standard It is recommended to put all of the files used by JupyterHub into standard
UNIX filesystem locations. UNIX filesystem locations.
* `/srv/jupyterhub` for all security and runtime files - `/srv/jupyterhub` for all security and runtime files
* `/etc/jupyterhub` for all configuration files - `/etc/jupyterhub` for all configuration files
* `/var/log` for log files - `/var/log` for log files

View File

@@ -1,82 +1,93 @@
## Technical Overview # Technical Overview
The **Technical Overview** section gives you a high-level view of:
- JupyterHub's Subsystems: Hub, Proxy, Single-User Notebook Server
- how the subsystems interact
- the process from JupyterHub access to user login
- JupyterHub's default behavior
- customizing JupyterHub
The goal of this section is to share a deeper technical understanding of
JupyterHub and how it works.
## The Subsystems: Hub, Proxy, Single-User Notebook Server
JupyterHub is a set of processes that together provide a single user Jupyter JupyterHub is a set of processes that together provide a single user Jupyter
Notebook server for each person in a group. Notebook server for each person in a group. Three major subsystems are started
by the `jupyterhub` command line program:
### Three subsystems - **Hub** (Python/Tornado): manages user accounts, authentication, and
Three major subsystems run by the `jupyterhub` command line program: coordinates Single User Notebook Servers using a Spawner.
- **Single-User Notebook Server**: a dedicated, single-user, Jupyter Notebook server is
started for each user on the system when the user logs in. The object that
starts these servers is called a **Spawner**.
- **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy - **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy
to route HTTP requests to the Hub and Single User Notebook Servers. to route HTTP requests to the Hub and Single User Notebook Servers.
- **Hub**: manages user accounts, authentication, and coordinates Single User [configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
Notebook Servers using a Spawner. (node-http-proxy) is the default proxy.
- **Single-User Notebook Server** (Python/Tornado): a dedicated,
single-user, Jupyter Notebook server is started for each user on the system
when the user logs in. The object that starts the single-user notebook
servers is called a **Spawner**.
![JupyterHub subsystems](images/jhub-parts.png) ![JupyterHub subsystems](images/jhub-parts.png)
### Deployment server ## How the Subsystems Interact
To use JupyterHub, you need a Unix server (typically Linux) running somewhere
that is accessible to your team on the network. The JupyterHub server can be
on an internal network at your organization, or it can run on the public
internet (in which case, take care with the Hub's
[security](#security)).
### Basic operation
Users access JupyterHub through a web browser, by going to the IP address or Users access JupyterHub through a web browser, by going to the IP address or
the domain name of the server. the domain name of the server.
Basic principles of operation: The basic principles of operation are:
* Hub spawns proxy - The Hub spawns the proxy (in the default JupyterHub configuration)
* Proxy forwards all requests to hub by default - The proxy forwards all requests to the Hub by default
* Hub handles login, and spawns single-user servers on demand - The Hub handles login, and spawns single-user notebook servers on demand
* Hub configures proxy to forward url prefixes to single-user servers - The Hub configures the proxy to forward url prefixes to single-user notebook
servers
Different **[authenticators](authenticators.html)** control access The proxy is the only process that listens on a public interface. The Hub sits
behind the proxy at `/hub`. Single-user servers sit behind the proxy at
`/user/[username]`.
Different **[authenticators](./authenticators.html)** control access
to JupyterHub. The default one (PAM) uses the user accounts on the server where to JupyterHub. The default one (PAM) uses the user accounts on the server where
JupyterHub is running. If you use this, you will need to create a user account JupyterHub is running. If you use this, you will need to create a user account
on the system for each user on your team. Using other authenticators, you can on the system for each user on your team. Using other authenticators, you can
allow users to sign in with e.g. a GitHub account, or with any single-sign-on allow users to sign in with e.g. a GitHub account, or with any single-sign-on
system your organization has. system your organization has.
Next, **[spawners](spawners.html)** control how JupyterHub starts Next, **[spawners](./spawners.html)** control how JupyterHub starts
the individual notebook server for each user. The default spawner will the individual notebook server for each user. The default spawner will
start a notebook server on the same machine running under their system username. start a notebook server on the same machine running under their system username.
The other main option is to start each server in a separate container, often The other main option is to start each server in a separate container, often
using Docker. using Docker.
### Default behavior ## The Process from JupyterHub Access to User Login
**IMPORTANT: You should not run JupyterHub without SSL encryption on a public network.** When a user accesses JupyterHub, the following events take place:
See [Security documentation](#security) for how to configure JupyterHub to use SSL, - Login data is handed to the [Authenticator](./authenticators.html) instance for
or put it behind SSL termination in another proxy server, such as nginx. validation
- The Authenticator returns the username if the login information is valid
- A single-user notebook server instance is [spawned](./spawners.html) for the
logged-in user
- When the single-user notebook server starts, the proxy is notified to forward
requests to `/user/[username]/*` to the single-user notebook server.
- A cookie is set on `/hub/`, containing an encrypted token. (Prior to version
0.8, a cookie for `/user/[username]` was used too.)
- The browser is redirected to `/user/[username]`, and the request is handled by
the single-user notebook server.
--- The single-user server identifies the user with the Hub via OAuth:
**Deprecation note:** Removed `--no-ssl` in version 0.7. - on request, the single-user server checks a cookie
- if no cookie is set, redirect to the Hub for verification via OAuth
- after verification at the Hub, the browser is redirected back to the
single-user server
- the token is verified and stored in a cookie
- if no user is identified, the browser is redirected back to `/hub/login`
JupyterHub versions 0.5 and 0.6 require extra confirmation via `--no-ssl` to ## Default Behavior
allow running without SSL using the command `jupyterhub --no-ssl`. The
`--no-ssl` command line option is not needed anymore in version 0.7.
---
To start JupyterHub in its default configuration, type the following at the command line:
```bash
sudo jupyterhub
```
The default Authenticator that ships with JupyterHub authenticates users
with their system name and password (via [PAM][]).
Any user on the system with a password will be allowed to start a single-user notebook server.
The default Spawner starts servers locally as each user, one dedicated server per user.
These servers listen on localhost, and start in the given user's home directory.
By default, the **Proxy** listens on all public interfaces on port 8000. By default, the **Proxy** listens on all public interfaces on port 8000.
Thus you can reach JupyterHub through either: Thus you can reach JupyterHub through either:
@@ -84,21 +95,39 @@ Thus you can reach JupyterHub through either:
- `http://localhost:8000` - `http://localhost:8000`
- or any other public IP or domain pointing to your system. - or any other public IP or domain pointing to your system.
In their default configuration, the other services, the **Hub** and **Single-User Servers**, In their default configuration, the other services, the **Hub** and
all communicate with each other on localhost only. **Single-User Notebook Servers**, all communicate with each other on localhost
only.
By default, starting JupyterHub will write two files to disk in the current working directory: By default, starting JupyterHub will write two files to disk in the current
working directory:
- `jupyterhub.sqlite` is the sqlite database containing all of the state of the **Hub**. - `jupyterhub.sqlite` is the SQLite database containing all of the state of the
This file allows the **Hub** to remember what users are running and where, **Hub**. This file allows the **Hub** to remember which users are running and
as well as other information enabling you to restart parts of JupyterHub separately. It is where, as well as storing other information enabling you to restart parts of
important to note that this database contains *no* sensitive information other than **Hub** JupyterHub separately. It is important to note that this database contains
usernames. **no** sensitive information other than **Hub** usernames.
- `jupyterhub_cookie_secret` is the encryption key used for securing cookies. - `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
This file needs to persist in order for restarting the Hub server to avoid invalidating cookies. This file needs to persist so that a **Hub** server restart will avoid
Conversely, deleting this file and restarting the server effectively invalidates all login cookies. invalidating cookies. Conversely, deleting this file and restarting the server
The cookie secret file is discussed in the [Cookie Secret documentation](#cookie-secret). effectively invalidates all login cookies. The cookie secret file is discussed
in the [Cookie Secret section of the Security Settings document](./security-basics.html).
The location of these files can be specified via configuration. The location of these files can be specified via configuration settings. It is
recommended that these files be stored in standard UNIX filesystem locations,
such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
all security and runtime files.
[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module ## Customizing JupyterHub
There are two basic extension points for JupyterHub:
- How users are authenticated by [Authenticators](./authenticators.html)
- How user's single-user notebook server processes are started by
[Spawners](./spawners.html)
Each is governed by a customizable class, and JupyterHub ships with basic
defaults for each.
To enable custom authentication and/or spawning, subclass `Authenticator` or
`Spawner`, and override the relevant methods.

View File

@@ -6,7 +6,7 @@
import re import re
from datetime import timedelta from datetime import timedelta
from http.client import responses from http.client import responses
from urllib.parse import urlparse from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
from jinja2 import TemplateNotFound from jinja2 import TemplateNotFound
@@ -20,7 +20,7 @@ from .. import __version__
from .. import orm from .. import orm
from ..objects import Server from ..objects import Server
from ..spawner import LocalProcessSpawner from ..spawner import LocalProcessSpawner
from ..utils import url_path_join from ..utils import url_path_join, DT_SCALE
# pattern for the authentication token header # pattern for the authentication token header
auth_header_pat = re.compile(r'^(?:token|bearer)\s+([^\s]+)$', flags=re.IGNORECASE) auth_header_pat = re.compile(r'^(?:token|bearer)\s+([^\s]+)$', flags=re.IGNORECASE)
@@ -535,6 +535,7 @@ class UserSpawnHandler(BaseHandler):
@gen.coroutine @gen.coroutine
def get(self, name, user_path): def get(self, name, user_path):
current_user = self.get_current_user() current_user = self.get_current_user()
if current_user and current_user.name == name: if current_user and current_user.name == name:
# If people visit /user/:name directly on the Hub, # If people visit /user/:name directly on the Hub,
# the redirects will just loop, because the proxy is bypassed. # the redirects will just loop, because the proxy is bypassed.
@@ -569,12 +570,40 @@ class UserSpawnHandler(BaseHandler):
return return
else: else:
yield self.spawn_single_user(current_user) yield self.spawn_single_user(current_user)
# We do exponential backoff here - since otherwise we can get stuck in a redirect loop!
# This is important in many distributed proxy implementations - those are often eventually
# consistent and can take upto a couple of seconds to actually apply throughout the cluster.
try:
redirects = int(self.get_argument('redirects', 0))
except ValueError:
self.log.warning("Invalid redirects argument %r", self.get_argument('redirects'))
redirects = 0
if redirects >= self.settings.get('user_redirect_limit', 5):
# We stop if we've been redirected too many times.
raise web.HTTPError(500, "Redirect loop detected.")
# set login cookie anew # set login cookie anew
self.set_login_cookie(current_user) self.set_login_cookie(current_user)
without_prefix = self.request.uri[len(self.hub.base_url):] without_prefix = self.request.uri[len(self.hub.base_url):]
target = url_path_join(self.base_url, without_prefix) target = url_path_join(self.base_url, without_prefix)
if self.subdomain_host: if self.subdomain_host:
target = current_user.host + target target = current_user.host + target
# record redirect count in query parameter
if redirects:
self.log.warning("Redirect loop detected on %s", self.request.uri)
yield gen.sleep(min(1 * (DT_SCALE ** redirects), 10))
# rewrite target url with new `redirects` query value
url_parts = urlparse(target)
query_parts = parse_qs(url_parts.query)
query_parts['redirects'] = redirects + 1
url_parts = url_parts._replace(query=urlencode(query_parts))
target = urlunparse(url_parts)
else:
target = url_concat(target, {'redirects': 1})
self.redirect(target) self.redirect(target)
self.statsd.incr('redirects.user_after_login') self.statsd.incr('redirects.user_after_login')
elif current_user: elif current_user: