mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-18 15:33:02 +00:00
Merge branch 'master' into named_servers
This commit is contained in:
@@ -4,7 +4,7 @@ Configuration Reference
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
howitworks
|
||||
technical-overview
|
||||
websecurity
|
||||
rest
|
||||
authenticators
|
||||
|
@@ -4,7 +4,6 @@ Getting Started
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
technical-overview
|
||||
config-basics
|
||||
networking-basics
|
||||
security-basics
|
||||
|
@@ -1,77 +0,0 @@
|
||||
# How JupyterHub works
|
||||
|
||||
JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user Jupyter notebook server.
|
||||
|
||||
There are three basic processes involved:
|
||||
|
||||
- multi-user Hub (Python/Tornado)
|
||||
- [configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy) (node-http-proxy)
|
||||
- multiple single-user IPython notebook servers (Python/IPython/Tornado)
|
||||
|
||||
The proxy is the only process that listens on a public interface.
|
||||
The Hub sits behind the proxy at `/hub`.
|
||||
Single-user servers sit behind the proxy at `/user/[username]`.
|
||||
|
||||
|
||||
## Logging in
|
||||
|
||||
When a new browser logs in to JupyterHub, the following events take place:
|
||||
|
||||
- Login data is handed to the [Authenticator](#authentication) instance for validation
|
||||
- The Authenticator returns the username, if login information is valid
|
||||
- A single-user server instance is [Spawned](#spawning) for the logged-in user
|
||||
- When the server starts, the proxy is notified to forward `/user/[username]/*` to the single-user server
|
||||
- Two cookies are set, one for `/hub/` and another for `/user/[username]`,
|
||||
containing an encrypted token.
|
||||
- The browser is redirected to `/user/[username]`, which is handled by the single-user server
|
||||
|
||||
Logging into a single-user server is authenticated via the Hub:
|
||||
|
||||
- On request, the single-user server forwards the encrypted cookie to the Hub for verification
|
||||
- The Hub replies with the username if it is a valid cookie
|
||||
- If the user is the owner of the server, access is allowed
|
||||
- If it is the wrong user or an invalid cookie, the browser is redirected to `/hub/login`
|
||||
|
||||
|
||||
## Customizing JupyterHub
|
||||
|
||||
There are two basic extension points for JupyterHub: How users are authenticated,
|
||||
and how their server processes are started.
|
||||
Each is governed by a customizable class,
|
||||
and JupyterHub ships with just the most basic version of each.
|
||||
|
||||
To enable custom authentication and/or spawning,
|
||||
subclass Authenticator or Spawner,
|
||||
and override the relevant methods.
|
||||
|
||||
|
||||
### Authentication
|
||||
|
||||
Authentication is customizable via the Authenticator class.
|
||||
Authentication can be replaced by any mechanism,
|
||||
such as OAuth, Kerberos, etc.
|
||||
|
||||
JupyterHub only ships with [PAM](https://en.wikipedia.org/wiki/Pluggable_authentication_module) authentication,
|
||||
which requires the server to be run as root,
|
||||
or at least with access to the PAM service,
|
||||
which regular users typically do not have
|
||||
(on Ubuntu, this requires being added to the `shadow` group).
|
||||
|
||||
[More info on custom Authenticators](authenticators.html).
|
||||
|
||||
See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
|
||||
|
||||
|
||||
### Spawning
|
||||
|
||||
Each single-user server is started by a Spawner.
|
||||
The Spawner represents an abstract interface to a process,
|
||||
and needs to be able to take three actions:
|
||||
|
||||
1. start the process
|
||||
2. poll whether the process is still running
|
||||
3. stop the process
|
||||
|
||||
[More info on custom Spawners](spawners.html).
|
||||
|
||||
See a list of custom Spawners [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
|
@@ -40,7 +40,6 @@ Contents
|
||||
|
||||
**Getting Started**
|
||||
|
||||
* :doc:`technical-overview`
|
||||
* :doc:`config-basics`
|
||||
* :doc:`networking-basics`
|
||||
* :doc:`security-basics`
|
||||
@@ -50,7 +49,7 @@ Contents
|
||||
|
||||
**Configuration Reference**
|
||||
|
||||
* :doc:`howitworks`
|
||||
* :doc:`technical-overview`
|
||||
* :doc:`websecurity`
|
||||
* :doc:`rest`
|
||||
* :doc:`authenticators`
|
||||
|
@@ -2,7 +2,11 @@
|
||||
|
||||
## Platform support
|
||||
|
||||
JupyterHub is supported on Linux/Unix based systems.
|
||||
JupyterHub is supported on Linux/Unix based systems. To use JupyterHub, you need
|
||||
a Unix server (typically Linux) running somewhere that is accessible to your
|
||||
team on the network. The JupyterHub server can be on an internal network at your
|
||||
organization, or it can run on the public internet (in which case, take care
|
||||
with the Hub's [security](./security-basics.html)).
|
||||
|
||||
JupyterHub officially **does not** support Windows. You may be able to use
|
||||
JupyterHub on Windows if you use a Spawner and Authenticator that work on
|
||||
@@ -12,11 +16,13 @@ minor Windows compatibility issues (such as basic installation) **may** be accep
|
||||
however. For Windows-based systems, we would recommend running JupyterHub in a
|
||||
docker container or Linux VM.
|
||||
|
||||
[Additional Reference:](http://www.tornadoweb.org/en/stable/#installation) Tornado's documentation on Windows platform support
|
||||
[Additional Reference:](http://www.tornadoweb.org/en/stable/#installation)
|
||||
Tornado's documentation on Windows platform support
|
||||
|
||||
## Planning your installation
|
||||
|
||||
Prior to beginning installation, it's helpful to consider some of the following:
|
||||
|
||||
- deployment system (bare metal, Docker)
|
||||
- Authentication (PAM, OAuth, etc.)
|
||||
- Spawner of singleuser notebook servers (Docker, Batch, etc.)
|
||||
@@ -29,6 +35,6 @@ Prior to beginning installation, it's helpful to consider some of the following:
|
||||
It is recommended to put all of the files used by JupyterHub into standard
|
||||
UNIX filesystem locations.
|
||||
|
||||
* `/srv/jupyterhub` for all security and runtime files
|
||||
* `/etc/jupyterhub` for all configuration files
|
||||
* `/var/log` for log files
|
||||
- `/srv/jupyterhub` for all security and runtime files
|
||||
- `/etc/jupyterhub` for all configuration files
|
||||
- `/var/log` for log files
|
||||
|
@@ -1,82 +1,93 @@
|
||||
## Technical Overview
|
||||
# Technical Overview
|
||||
|
||||
The **Technical Overview** section gives you a high-level view of:
|
||||
|
||||
- JupyterHub's Subsystems: Hub, Proxy, Single-User Notebook Server
|
||||
- how the subsystems interact
|
||||
- the process from JupyterHub access to user login
|
||||
- JupyterHub's default behavior
|
||||
- customizing JupyterHub
|
||||
|
||||
The goal of this section is to share a deeper technical understanding of
|
||||
JupyterHub and how it works.
|
||||
|
||||
## The Subsystems: Hub, Proxy, Single-User Notebook Server
|
||||
|
||||
JupyterHub is a set of processes that together provide a single user Jupyter
|
||||
Notebook server for each person in a group.
|
||||
Notebook server for each person in a group. Three major subsystems are started
|
||||
by the `jupyterhub` command line program:
|
||||
|
||||
### Three subsystems
|
||||
Three major subsystems run by the `jupyterhub` command line program:
|
||||
- **Hub** (Python/Tornado): manages user accounts, authentication, and
|
||||
coordinates Single User Notebook Servers using a Spawner.
|
||||
|
||||
- **Single-User Notebook Server**: a dedicated, single-user, Jupyter Notebook server is
|
||||
started for each user on the system when the user logs in. The object that
|
||||
starts these servers is called a **Spawner**.
|
||||
- **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy
|
||||
to route HTTP requests to the Hub and Single User Notebook Servers.
|
||||
- **Hub**: manages user accounts, authentication, and coordinates Single User
|
||||
Notebook Servers using a Spawner.
|
||||
[configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
|
||||
(node-http-proxy) is the default proxy.
|
||||
|
||||
- **Single-User Notebook Server** (Python/Tornado): a dedicated,
|
||||
single-user, Jupyter Notebook server is started for each user on the system
|
||||
when the user logs in. The object that starts the single-user notebook
|
||||
servers is called a **Spawner**.
|
||||
|
||||

|
||||
|
||||
### Deployment server
|
||||
## How the Subsystems Interact
|
||||
|
||||
To use JupyterHub, you need a Unix server (typically Linux) running somewhere
|
||||
that is accessible to your team on the network. The JupyterHub server can be
|
||||
on an internal network at your organization, or it can run on the public
|
||||
internet (in which case, take care with the Hub's
|
||||
[security](#security)).
|
||||
|
||||
### Basic operation
|
||||
Users access JupyterHub through a web browser, by going to the IP address or
|
||||
the domain name of the server.
|
||||
|
||||
Basic principles of operation:
|
||||
The basic principles of operation are:
|
||||
|
||||
* Hub spawns proxy
|
||||
* Proxy forwards all requests to hub by default
|
||||
* Hub handles login, and spawns single-user servers on demand
|
||||
* Hub configures proxy to forward url prefixes to single-user servers
|
||||
- The Hub spawns the proxy (in the default JupyterHub configuration)
|
||||
- The proxy forwards all requests to the Hub by default
|
||||
- The Hub handles login, and spawns single-user notebook servers on demand
|
||||
- The Hub configures the proxy to forward url prefixes to single-user notebook
|
||||
servers
|
||||
|
||||
Different **[authenticators](authenticators.html)** control access
|
||||
The proxy is the only process that listens on a public interface. The Hub sits
|
||||
behind the proxy at `/hub`. Single-user servers sit behind the proxy at
|
||||
`/user/[username]`.
|
||||
|
||||
Different **[authenticators](./authenticators.html)** control access
|
||||
to JupyterHub. The default one (PAM) uses the user accounts on the server where
|
||||
JupyterHub is running. If you use this, you will need to create a user account
|
||||
on the system for each user on your team. Using other authenticators, you can
|
||||
allow users to sign in with e.g. a GitHub account, or with any single-sign-on
|
||||
system your organization has.
|
||||
|
||||
Next, **[spawners](spawners.html)** control how JupyterHub starts
|
||||
Next, **[spawners](./spawners.html)** control how JupyterHub starts
|
||||
the individual notebook server for each user. The default spawner will
|
||||
start a notebook server on the same machine running under their system username.
|
||||
The other main option is to start each server in a separate container, often
|
||||
using Docker.
|
||||
|
||||
### Default behavior
|
||||
## The Process from JupyterHub Access to User Login
|
||||
|
||||
**IMPORTANT: You should not run JupyterHub without SSL encryption on a public network.**
|
||||
When a user accesses JupyterHub, the following events take place:
|
||||
|
||||
See [Security documentation](#security) for how to configure JupyterHub to use SSL,
|
||||
or put it behind SSL termination in another proxy server, such as nginx.
|
||||
- Login data is handed to the [Authenticator](./authenticators.html) instance for
|
||||
validation
|
||||
- The Authenticator returns the username if the login information is valid
|
||||
- A single-user notebook server instance is [spawned](./spawners.html) for the
|
||||
logged-in user
|
||||
- When the single-user notebook server starts, the proxy is notified to forward
|
||||
requests to `/user/[username]/*` to the single-user notebook server.
|
||||
- A cookie is set on `/hub/`, containing an encrypted token. (Prior to version
|
||||
0.8, a cookie for `/user/[username]` was used too.)
|
||||
- The browser is redirected to `/user/[username]`, and the request is handled by
|
||||
the single-user notebook server.
|
||||
|
||||
---
|
||||
The single-user server identifies the user with the Hub via OAuth:
|
||||
|
||||
**Deprecation note:** Removed `--no-ssl` in version 0.7.
|
||||
- on request, the single-user server checks a cookie
|
||||
- if no cookie is set, redirect to the Hub for verification via OAuth
|
||||
- after verification at the Hub, the browser is redirected back to the
|
||||
single-user server
|
||||
- the token is verified and stored in a cookie
|
||||
- if no user is identified, the browser is redirected back to `/hub/login`
|
||||
|
||||
JupyterHub versions 0.5 and 0.6 require extra confirmation via `--no-ssl` to
|
||||
allow running without SSL using the command `jupyterhub --no-ssl`. The
|
||||
`--no-ssl` command line option is not needed anymore in version 0.7.
|
||||
|
||||
---
|
||||
|
||||
To start JupyterHub in its default configuration, type the following at the command line:
|
||||
|
||||
```bash
|
||||
sudo jupyterhub
|
||||
```
|
||||
|
||||
The default Authenticator that ships with JupyterHub authenticates users
|
||||
with their system name and password (via [PAM][]).
|
||||
Any user on the system with a password will be allowed to start a single-user notebook server.
|
||||
|
||||
The default Spawner starts servers locally as each user, one dedicated server per user.
|
||||
These servers listen on localhost, and start in the given user's home directory.
|
||||
## Default Behavior
|
||||
|
||||
By default, the **Proxy** listens on all public interfaces on port 8000.
|
||||
Thus you can reach JupyterHub through either:
|
||||
@@ -84,21 +95,39 @@ Thus you can reach JupyterHub through either:
|
||||
- `http://localhost:8000`
|
||||
- or any other public IP or domain pointing to your system.
|
||||
|
||||
In their default configuration, the other services, the **Hub** and **Single-User Servers**,
|
||||
all communicate with each other on localhost only.
|
||||
In their default configuration, the other services, the **Hub** and
|
||||
**Single-User Notebook Servers**, all communicate with each other on localhost
|
||||
only.
|
||||
|
||||
By default, starting JupyterHub will write two files to disk in the current working directory:
|
||||
By default, starting JupyterHub will write two files to disk in the current
|
||||
working directory:
|
||||
|
||||
- `jupyterhub.sqlite` is the sqlite database containing all of the state of the **Hub**.
|
||||
This file allows the **Hub** to remember what users are running and where,
|
||||
as well as other information enabling you to restart parts of JupyterHub separately. It is
|
||||
important to note that this database contains *no* sensitive information other than **Hub**
|
||||
usernames.
|
||||
- `jupyterhub.sqlite` is the SQLite database containing all of the state of the
|
||||
**Hub**. This file allows the **Hub** to remember which users are running and
|
||||
where, as well as storing other information enabling you to restart parts of
|
||||
JupyterHub separately. It is important to note that this database contains
|
||||
**no** sensitive information other than **Hub** usernames.
|
||||
- `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
|
||||
This file needs to persist in order for restarting the Hub server to avoid invalidating cookies.
|
||||
Conversely, deleting this file and restarting the server effectively invalidates all login cookies.
|
||||
The cookie secret file is discussed in the [Cookie Secret documentation](#cookie-secret).
|
||||
This file needs to persist so that a **Hub** server restart will avoid
|
||||
invalidating cookies. Conversely, deleting this file and restarting the server
|
||||
effectively invalidates all login cookies. The cookie secret file is discussed
|
||||
in the [Cookie Secret section of the Security Settings document](./security-basics.html).
|
||||
|
||||
The location of these files can be specified via configuration.
|
||||
The location of these files can be specified via configuration settings. It is
|
||||
recommended that these files be stored in standard UNIX filesystem locations,
|
||||
such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
|
||||
all security and runtime files.
|
||||
|
||||
[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
|
||||
## Customizing JupyterHub
|
||||
|
||||
There are two basic extension points for JupyterHub:
|
||||
|
||||
- How users are authenticated by [Authenticators](./authenticators.html)
|
||||
- How user's single-user notebook server processes are started by
|
||||
[Spawners](./spawners.html)
|
||||
|
||||
Each is governed by a customizable class, and JupyterHub ships with basic
|
||||
defaults for each.
|
||||
|
||||
To enable custom authentication and/or spawning, subclass `Authenticator` or
|
||||
`Spawner`, and override the relevant methods.
|
||||
|
@@ -6,7 +6,7 @@
|
||||
import re
|
||||
from datetime import timedelta
|
||||
from http.client import responses
|
||||
from urllib.parse import urlparse
|
||||
from urllib.parse import urlparse, urlunparse, parse_qs, urlencode
|
||||
|
||||
from jinja2 import TemplateNotFound
|
||||
|
||||
@@ -20,7 +20,7 @@ from .. import __version__
|
||||
from .. import orm
|
||||
from ..objects import Server
|
||||
from ..spawner import LocalProcessSpawner
|
||||
from ..utils import url_path_join
|
||||
from ..utils import url_path_join, DT_SCALE
|
||||
|
||||
# pattern for the authentication token header
|
||||
auth_header_pat = re.compile(r'^(?:token|bearer)\s+([^\s]+)$', flags=re.IGNORECASE)
|
||||
@@ -535,6 +535,7 @@ class UserSpawnHandler(BaseHandler):
|
||||
@gen.coroutine
|
||||
def get(self, name, user_path):
|
||||
current_user = self.get_current_user()
|
||||
|
||||
if current_user and current_user.name == name:
|
||||
# If people visit /user/:name directly on the Hub,
|
||||
# the redirects will just loop, because the proxy is bypassed.
|
||||
@@ -569,12 +570,40 @@ class UserSpawnHandler(BaseHandler):
|
||||
return
|
||||
else:
|
||||
yield self.spawn_single_user(current_user)
|
||||
|
||||
# We do exponential backoff here - since otherwise we can get stuck in a redirect loop!
|
||||
# This is important in many distributed proxy implementations - those are often eventually
|
||||
# consistent and can take upto a couple of seconds to actually apply throughout the cluster.
|
||||
try:
|
||||
redirects = int(self.get_argument('redirects', 0))
|
||||
except ValueError:
|
||||
self.log.warning("Invalid redirects argument %r", self.get_argument('redirects'))
|
||||
redirects = 0
|
||||
|
||||
if redirects >= self.settings.get('user_redirect_limit', 5):
|
||||
# We stop if we've been redirected too many times.
|
||||
raise web.HTTPError(500, "Redirect loop detected.")
|
||||
|
||||
# set login cookie anew
|
||||
self.set_login_cookie(current_user)
|
||||
without_prefix = self.request.uri[len(self.hub.base_url):]
|
||||
target = url_path_join(self.base_url, without_prefix)
|
||||
if self.subdomain_host:
|
||||
target = current_user.host + target
|
||||
|
||||
# record redirect count in query parameter
|
||||
if redirects:
|
||||
self.log.warning("Redirect loop detected on %s", self.request.uri)
|
||||
yield gen.sleep(min(1 * (DT_SCALE ** redirects), 10))
|
||||
# rewrite target url with new `redirects` query value
|
||||
url_parts = urlparse(target)
|
||||
query_parts = parse_qs(url_parts.query)
|
||||
query_parts['redirects'] = redirects + 1
|
||||
url_parts = url_parts._replace(query=urlencode(query_parts))
|
||||
target = urlunparse(url_parts)
|
||||
else:
|
||||
target = url_concat(target, {'redirects': 1})
|
||||
|
||||
self.redirect(target)
|
||||
self.statsd.incr('redirects.user_after_login')
|
||||
elif current_user:
|
||||
|
Reference in New Issue
Block a user