Merge branch 'master' into named_servers

2025-10-18 15:33:02 +00:00 · 2017-07-25 18:29:01 +02:00
parent e15a6bb758 f364f8e832
commit 194d6c9d4c
7 changed files with 134 additions and 149 deletions
--- a/docs/source/configuration-guide.rst
+++ b/docs/source/configuration-guide.rst
@@ -4,7 +4,7 @@ Configuration Reference
 .. toctree::
   :maxdepth: 2

-   howitworks
+   technical-overview
   websecurity
   rest
   authenticators
--- a/docs/source/getting-started.rst
+++ b/docs/source/getting-started.rst
@@ -4,7 +4,6 @@ Getting Started
 .. toctree::
   :maxdepth: 2

-   technical-overview
   config-basics
   networking-basics
   security-basics
--- a/docs/source/howitworks.md
+++ b/docs/source/howitworks.md
@@ -1,77 +0,0 @@
-# How JupyterHub works
-
-JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user Jupyter notebook server.
-
-There are three basic processes involved:
-
- multi-user Hub (Python/Tornado)
- [configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy) (node-http-proxy)
- multiple single-user IPython notebook servers (Python/IPython/Tornado)
-
-The proxy is the only process that listens on a public interface.
-The Hub sits behind the proxy at `/hub`.
-Single-user servers sit behind the proxy at `/user/[username]`.
-
-
-## Logging in
-
-When a new browser logs in to JupyterHub, the following events take place:
-
- Login data is handed to the [Authenticator](#authentication) instance for validation
- The Authenticator returns the username, if login information is valid
- A single-user server instance is [Spawned](#spawning) for the logged-in user
- When the server starts, the proxy is notified to forward `/user/[username]/*` to the single-user server
- Two cookies are set, one for `/hub/` and another for `/user/[username]`,
-  containing an encrypted token.
- The browser is redirected to `/user/[username]`, which is handled by the single-user server
-
-Logging into a single-user server is authenticated via the Hub:
-
- On request, the single-user server forwards the encrypted cookie to the Hub for verification
- The Hub replies with the username if it is a valid cookie
- If the user is the owner of the server, access is allowed
- If it is the wrong user or an invalid cookie, the browser is redirected to `/hub/login`
-
-
-## Customizing  JupyterHub
-
-There are two basic extension points for JupyterHub: How users are authenticated,
-and how their server processes are started.
-Each is governed by a customizable class,
-and JupyterHub ships with just the most basic version of each.
-
-To enable custom authentication and/or spawning,
-subclass Authenticator or Spawner,
-and override the relevant methods.
-
-
-### Authentication
-
-Authentication is customizable via the Authenticator class.
-Authentication can be replaced by any mechanism,
-such as OAuth, Kerberos, etc.
-
-JupyterHub only ships with [PAM](https://en.wikipedia.org/wiki/Pluggable_authentication_module) authentication,
-which requires the server to be run as root,
-or at least with access to the PAM service,
-which regular users typically do not have
-(on Ubuntu, this requires being added to the `shadow` group).
-
-[More info on custom Authenticators](authenticators.html).
-
-See a list of custom Authenticators [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
-
-
-### Spawning
-
-Each single-user server is started by a Spawner.
-The Spawner represents an abstract interface to a process,
-and needs to be able to take three actions:
-
-1. start the process
-2. poll whether the process is still running
-3. stop the process
-
-[More info on custom Spawners](spawners.html).
-
-See a list of custom Spawners [on the wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -40,7 +40,6 @@ Contents

 **Getting Started**

-* :doc:`technical-overview`
 * :doc:`config-basics`
 * :doc:`networking-basics`
 * :doc:`security-basics`
@@ -50,7 +49,7 @@ Contents

 **Configuration Reference**

-* :doc:`howitworks`
+* :doc:`technical-overview`
 * :doc:`websecurity`
 * :doc:`rest`
 * :doc:`authenticators`
--- a/docs/source/installation-basics.md
+++ b/docs/source/installation-basics.md
@@ -2,7 +2,11 @@

 ## Platform support

-JupyterHub is supported on Linux/Unix based systems.
+JupyterHub is supported on Linux/Unix based systems. To use JupyterHub, you need
+a Unix server (typically Linux) running somewhere that is accessible to your
+team on the network. The JupyterHub server can be on an internal network at your
+organization, or it can run on the public internet (in which case, take care
+with the Hub's [security](./security-basics.html)).

 JupyterHub officially **does not** support Windows. You may be able to use
 JupyterHub on Windows if you use a Spawner and Authenticator that work on
@@ -12,11 +16,13 @@ minor Windows compatibility issues (such as basic installation) **may** be accep
 however. For Windows-based systems, we would recommend running JupyterHub in a
 docker container or Linux VM.

-[Additional Reference:](http://www.tornadoweb.org/en/stable/#installation) Tornado's documentation on Windows platform support
+[Additional Reference:](http://www.tornadoweb.org/en/stable/#installation)
+Tornado's documentation on Windows platform support

 ## Planning your installation

 Prior to beginning installation, it's helpful to consider some of the following:
+
 - deployment system (bare metal, Docker)
 - Authentication (PAM, OAuth, etc.)
 - Spawner of singleuser notebook servers (Docker, Batch, etc.)
@@ -29,6 +35,6 @@ Prior to beginning installation, it's helpful to consider some of the following:
 It is recommended to put all of the files used by JupyterHub into standard
 UNIX filesystem locations.

-* `/srv/jupyterhub` for all security and runtime files
-* `/etc/jupyterhub` for all configuration files
-* `/var/log` for log files
+- `/srv/jupyterhub` for all security and runtime files
+- `/etc/jupyterhub` for all configuration files
+- `/var/log` for log files
--- a/docs/source/technical-overview.md
+++ b/docs/source/technical-overview.md
@@ -1,82 +1,93 @@
-## Technical Overview
+# Technical Overview
+
+The **Technical Overview** section gives you a high-level view of:
+
+- JupyterHub's Subsystems: Hub, Proxy, Single-User Notebook Server
+- how the subsystems interact
+- the process from JupyterHub access to user login
+- JupyterHub's default behavior
+- customizing JupyterHub
+
+The goal of this section is to share a deeper technical understanding of
+JupyterHub and how it works.
+
+## The Subsystems: Hub, Proxy, Single-User Notebook Server

 JupyterHub is a set of processes that together provide a single user Jupyter
-Notebook server for each person in a group.
+Notebook server for each person in a group. Three major subsystems are started
+by the `jupyterhub` command line program:

-### Three subsystems
-Three major subsystems run by the `jupyterhub` command line program:
+- **Hub** (Python/Tornado): manages user accounts, authentication, and
+  coordinates Single User Notebook Servers using a Spawner.

- **Single-User Notebook Server**: a dedicated, single-user, Jupyter Notebook server is
-  started for each user on the system when the user logs in. The object that
-  starts these servers is called a **Spawner**.
 - **Proxy**: the public facing part of JupyterHub that uses a dynamic proxy
  to route HTTP requests to the Hub and Single User Notebook Servers.
- **Hub**: manages user accounts, authentication, and coordinates Single User
-  Notebook Servers using a Spawner.
+  [configurable http proxy](https://github.com/jupyterhub/configurable-http-proxy)
+  (node-http-proxy) is the default proxy.
+
+- **Single-User Notebook Server** (Python/Tornado): a dedicated,
+  single-user, Jupyter Notebook server is started for each user on the system
+  when the user logs in. The object that starts the single-user notebook
+  servers is called a **Spawner**.    

 ![JupyterHub subsystems](images/jhub-parts.png)

-### Deployment server
+## How the Subsystems Interact

-To use JupyterHub, you need a Unix server (typically Linux) running somewhere
-that is accessible to your team on the network. The JupyterHub server can be
-on an internal network at your organization, or it can run on the public
-internet (in which case, take care with the Hub's
-[security](#security)).
-
-### Basic operation
 Users access JupyterHub through a web browser, by going to the IP address or
 the domain name of the server.

-Basic principles of operation:
+The basic principles of operation are:

-* Hub spawns proxy
-* Proxy forwards all requests to hub by default
-* Hub handles login, and spawns single-user servers on demand
-* Hub configures proxy to forward url prefixes to single-user servers
+- The Hub spawns the proxy (in the default JupyterHub configuration)
+- The proxy forwards all requests to the Hub by default
+- The Hub handles login, and spawns single-user notebook servers on demand
+- The Hub configures the proxy to forward url prefixes to single-user notebook
+  servers

-Different **[authenticators](authenticators.html)** control access
+The proxy is the only process that listens on a public interface. The Hub sits
+behind the proxy at `/hub`. Single-user servers sit behind the proxy at
+`/user/[username]`.
+
+Different **[authenticators](./authenticators.html)** control access
 to JupyterHub. The default one (PAM) uses the user accounts on the server where
 JupyterHub is running. If you use this, you will need to create a user account
 on the system for each user on your team. Using other authenticators, you can
 allow users to sign in with e.g. a GitHub account, or with any single-sign-on
 system your organization has.

-Next, **[spawners](spawners.html)** control how JupyterHub starts
+Next, **[spawners](./spawners.html)** control how JupyterHub starts
 the individual notebook server for each user. The default spawner will
 start a notebook server on the same machine running under their system username.
 The other main option is to start each server in a separate container, often
 using Docker.

-### Default behavior
+## The Process from JupyterHub Access to User Login

-**IMPORTANT: You should not run JupyterHub without SSL encryption on a public network.**
+When a user accesses JupyterHub, the following events take place:

-See [Security documentation](#security) for how to configure JupyterHub to use SSL,
-or put it behind SSL termination in another proxy server, such as nginx.
+- Login data is handed to the [Authenticator](./authenticators.html) instance for
+  validation
+- The Authenticator returns the username if the login information is valid
+- A single-user notebook server instance is [spawned](./spawners.html) for the
+  logged-in user
+- When the single-user notebook server starts, the proxy is notified to forward
+  requests to `/user/[username]/*` to the single-user notebook server.
+- A cookie is set on `/hub/`, containing an encrypted token. (Prior to version
+  0.8, a cookie for `/user/[username]` was used too.)
+- The browser is redirected to `/user/[username]`, and the request is handled by
+  the single-user notebook server.

---
+The single-user server identifies the user with the Hub via OAuth:

-**Deprecation note:** Removed `--no-ssl` in version 0.7.
+- on request, the single-user server checks a cookie
+- if no cookie is set, redirect to the Hub for verification via OAuth
+- after verification at the Hub, the browser is redirected back to the
+  single-user server
+- the token is verified and stored in a cookie
+- if no user is identified, the browser is redirected back to `/hub/login`

-JupyterHub versions 0.5 and 0.6 require extra confirmation via `--no-ssl` to
-allow running without SSL using the command `jupyterhub --no-ssl`. The
-`--no-ssl` command line option is not needed anymore in version 0.7.
-
---
-
-To start JupyterHub in its default configuration, type the following at the command line:
-
-```bash
-    sudo jupyterhub
-```
-
-The default Authenticator that ships with JupyterHub authenticates users
-with their system name and password (via [PAM][]).
-Any user on the system with a password will be allowed to start a single-user notebook server.
-
-The default Spawner starts servers locally as each user, one dedicated server per user.
-These servers listen on localhost, and start in the given user's home directory.
+## Default Behavior

 By default, the **Proxy** listens on all public interfaces on port 8000.
 Thus you can reach JupyterHub through either:
@@ -84,21 +95,39 @@ Thus you can reach JupyterHub through either:
 - `http://localhost:8000`
 - or any other public IP or domain pointing to your system.

-In their default configuration, the other services, the **Hub** and **Single-User Servers**,
-all communicate with each other on localhost only.
+In their default configuration, the other services, the **Hub** and
+**Single-User Notebook Servers**, all communicate with each other on localhost
+only.

-By default, starting JupyterHub will write two files to disk in the current working directory:
+By default, starting JupyterHub will write two files to disk in the current
+working directory:

- `jupyterhub.sqlite` is the sqlite database containing all of the state of the **Hub**.
-  This file allows the **Hub** to remember what users are running and where,
-  as well as other information enabling you to restart parts of JupyterHub separately. It is
-  important to note that this database contains *no* sensitive information other than **Hub**
-  usernames.
+- `jupyterhub.sqlite` is the SQLite database containing all of the state of the
+  **Hub**. This file allows the **Hub** to remember which users are running and
+  where, as well as storing other information enabling you to restart parts of
+  JupyterHub separately. It is important to note that this database contains
+  **no** sensitive information other than **Hub** usernames.
 - `jupyterhub_cookie_secret` is the encryption key used for securing cookies.
-  This file needs to persist in order for restarting the Hub server to avoid invalidating cookies.
-  Conversely, deleting this file and restarting the server effectively invalidates all login cookies.
-  The cookie secret file is discussed in the [Cookie Secret documentation](#cookie-secret).
+  This file needs to persist so that a **Hub** server restart will avoid
+  invalidating cookies. Conversely, deleting this file and restarting the server
+  effectively invalidates all login cookies. The cookie secret file is discussed
+  in the [Cookie Secret section of the Security Settings document](./security-basics.html).

-The location of these files can be specified via configuration.
+The location of these files can be specified via configuration settings. It is
+recommended that these files be stored in standard UNIX filesystem locations,
+such as `/etc/jupyterhub` for all configuration files and `/srv/jupyterhub` for
+all security and runtime files.

-[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
+## Customizing JupyterHub
+
+There are two basic extension points for JupyterHub:
+
+- How users are authenticated by [Authenticators](./authenticators.html)
+- How user's single-user notebook server processes are started by
+  [Spawners](./spawners.html)
+
+Each is governed by a customizable class, and JupyterHub ships with basic
+defaults for each.
+
+To enable custom authentication and/or spawning, subclass `Authenticator` or
+`Spawner`, and override the relevant methods.
--- a/jupyterhub/handlers/base.py
+++ b/jupyterhub/handlers/base.py
@@ -6,7 +6,7 @@
 import re
 from datetime import timedelta
 from http.client import responses
-from urllib.parse import urlparse
+from urllib.parse import urlparse, urlunparse, parse_qs, urlencode

 from jinja2 import TemplateNotFound

@@ -20,7 +20,7 @@ from .. import __version__
 from .. import orm
 from ..objects import Server
 from ..spawner import LocalProcessSpawner
-from ..utils import url_path_join
+from ..utils import url_path_join, DT_SCALE

 # pattern for the authentication token header
 auth_header_pat = re.compile(r'^(?:token|bearer)\s+([^\s]+)$', flags=re.IGNORECASE)
@@ -535,6 +535,7 @@ class UserSpawnHandler(BaseHandler):
    @gen.coroutine
    def get(self, name, user_path):
        current_user = self.get_current_user()
+
        if current_user and current_user.name == name:
            # If people visit /user/:name directly on the Hub,
            # the redirects will just loop, because the proxy is bypassed.
@@ -569,12 +570,40 @@ class UserSpawnHandler(BaseHandler):
                    return
                else:
                    yield self.spawn_single_user(current_user)
+
+            # We do exponential backoff here - since otherwise we can get stuck in a redirect loop!
+            # This is important in many distributed proxy implementations - those are often eventually
+            # consistent and can take upto a couple of seconds to actually apply throughout the cluster.
+            try:
+                redirects = int(self.get_argument('redirects', 0))
+            except ValueError:
+                self.log.warning("Invalid redirects argument %r", self.get_argument('redirects'))
+                redirects = 0
+
+            if redirects >= self.settings.get('user_redirect_limit', 5):
+                # We stop if we've been redirected too many times.
+                raise web.HTTPError(500, "Redirect loop detected.")
+
            # set login cookie anew
            self.set_login_cookie(current_user)
            without_prefix = self.request.uri[len(self.hub.base_url):]
            target = url_path_join(self.base_url, without_prefix)
            if self.subdomain_host:
                target = current_user.host + target
+
+            # record redirect count in query parameter
+            if redirects:
+                self.log.warning("Redirect loop detected on %s", self.request.uri)
+                yield gen.sleep(min(1 * (DT_SCALE ** redirects), 10))
+                # rewrite target url with new `redirects` query value
+                url_parts = urlparse(target)
+                query_parts = parse_qs(url_parts.query)
+                query_parts['redirects'] = redirects + 1
+                url_parts = url_parts._replace(query=urlencode(query_parts))
+                target = urlunparse(url_parts)
+            else:
+                target = url_concat(target, {'redirects': 1})
+
            self.redirect(target)
            self.statsd.incr('redirects.user_after_login')
        elif current_user: