add docs on custom authenticators and spawners

2025-10-17 15:03:02 +00:00 · 2015-03-07 15:55:29 -08:00
parent b6ab62ae3a
commit e6e9856861
3 changed files with 239 additions and 0 deletions
--- a/docs/authenticators.md
+++ b/docs/authenticators.md
@@ -0,0 +1,78 @@
+# Writing a custom Authenticator
+
+The [Authenticator][] is the mechanism for authorizing users.
+Basic authenticators use simple username and password authentication.
+JupyterHub ships only with a [PAM][]-based Authenticator,
+for logging in with local user accounts.
+
+You can use custom Authenticator subclasses to enable authentication via other systems.
+One such example is using [GitHub OAuth][].
+
+Because the username is passed from the Authenticator to the Spawner,
+a custom Authenticator and Spawner are often used together.
+
+
+## Basics of Authenticators
+
+A basic Authenticator has one central method:
+
+
+### Authenticator.authenticate
+
+    Authenticator.authenticate(handler, data)
+
+This method is passed the tornado RequestHandler and the POST data from the login form.
+Unless the login form has been customized, `data` will have two keys:
+
+- `username` (self-explanatory)
+- `password` (also self-explanatory)
+
+`authenticate`'s job is simple:
+
+- return a username (non-empty str)
+  of the authenticated user if authentication is successful
+- return `None` otherwise
+
+Writing an Authenticator that looks up passwords in a dictionary
+requires only overriding this one method:
+
+```python
+from tornado import gen
+from IPython.utils.traitlets import Dict
+from jupyterhub.auth import Authenticator
+
+class DictionaryAuthenticator(Authenticator):
+
+    passwords = Dict(config=True,
+        help="""dict of username:password for authentication"""
+    )
+    
+    @gen.coroutine
+    def authenticate(self, handler, data):
+        if self.passwords.get(data['username']) == data['password']:
+            return data['username']
+```
+
+
+### Authenticator.whitelist
+
+Authenticators can specify a whitelist of usernames to allow authentication.
+For local user authentication (e.g. PAM), this lets you limit which users
+can login.
+
+
+## OAuth and other non-password logins
+
+Some login mechanisms, such as [OAuth][], don't map onto username+password.
+For these, you can override the login handlers.
+
+You can see an example implementation of an Authenticator that uses [GitHub OAuth][]
+at [OAuthenticator][].
+
+
+[Authenticator]: ../jupyterhub/auth.py
+[PAM]: http://en.wikipedia.org/wiki/Pluggable_authentication_module
+[OAuth]: http://en.wikipedia.org/wiki/OAuth 
+[GitHub OAuth]: https://developer.github.com/v3/oauth/
+[OAuthenticator]: https://github.com/jupyter/oauthenticator
+
--- a/docs/howitworks.md
+++ b/docs/howitworks.md
@@ -0,0 +1,75 @@
+# How JupyterHub works
+
+JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user <del>IPython</del> Jupyter notebook server.
+
+There are three basic processes involved:
+
+- multi-user Hub (Python/Tornado)
+- configurable http proxy (nodejs)
+- multiple single-user IPython notebook servers (Python/IPython/Tornado)
+
+The proxy is the only process that listens on a public interface.
+The Hub sits behind the proxy at `/hub`.
+Single-user servers sit behind the proxy at `/user/[username]`.
+
+
+## Logging in
+
+When a new browser logs in to JupyterHub, the following events take place:
+
+- Login data is handed to the [Authenticator](#authentication) instance for validation
+- The Authenticator returns the username, if login information is valid
+- A single-user server instance is [Spawned](#spawning) for the logged-in user
+- When the server starts, the proxy is notified to forward `/user/[username]/*` to the single-user server
+- Two cookies are set, one for `/hub/` and another for `/user/[username]`,
+  containing an encrypted token.
+- The browser is redirected to `/user/[username]`, which is handled by the single-user server
+
+Logging into a single-user server is authenticated via the Hub:
+
+- On request, the single-user server forwards the encrypted cookie to the Hub for verification
+- The Hub replies with the username if it is a valid cookie
+- If the user is the owner of the server, access is allowed
+- If it is the wrong user or an invalid cookie, the browser is redirected to `/hub/login`
+
+
+## Customizing  JupyterHub
+
+There are two basic extension points for JupyterHub: How users are authenticated,
+and how their server processes are started.
+Each is governed by a customizable class,
+and JupyterHub ships with just the most basic version of each.
+
+To enable custom authentication and/or spawning,
+subclass Authenticator or Spawner,
+and override the relevant methods.
+
+
+### Authentication
+
+Authentication is customizable via the Authenticator class.
+Authentication can be replaced by any mechanism,
+such as OAuth, Kerberos, etc.
+
+JupyterHub only ships with [PAM](http://en.wikipedia.org/wiki/Pluggable_authentication_module) authentication,
+which requires the server to be run as root,
+or at least with access to the PAM service,
+which regular users typically do not have
+(on Ubuntu, this requires being added to the `shadow` group).
+
+[More info on custom Authenticators](authenticators.md).
+
+
+### Spawning
+
+Each single-user server is started by a Spawner.
+The Spawner represents an abstract interface to a process,
+and needs to be able to take three actions:
+
+1. start the process
+2. poll whether the process is still running
+3. stop the process
+
+[More info on custom Spawners](spawners.md).
+
+[An example using Docker](https://github.com/jupyter/dockerspawner).
--- a/docs/spawners.md
+++ b/docs/spawners.md
@@ -0,0 +1,86 @@
+# Writing a custom Spawner
+
+Each single-user server is started by a [Spawner][].
+The Spawner represents an abstract interface to a process,
+and a custom Spawner needs to be able to take three actions:
+
+1. start the process
+2. poll whether the process is still running
+3. stop the process
+
+## Spawner.start
+
+`Spawner.start` should start the single-user server for a single user.
+Information about the user can be retrieved from `self.user`,
+an object encapsulating the user's name, authentication, and server info.
+
+When `Spawner.start` returns, it should have stored the IP and port
+of the single-user server in `self.user.server`.
+
+**NOTE:** when writing coroutines, *never* `yield` in between a db change and a commit.
+Most `Spawner.start`s should have something looking like:
+
+```python
+def start(self):
+    self.user.server.ip = 'localhost' # or other host or IP address, as seen by the Hub
+    self.user.server.port = 1234 # port selected somehow
+    self.db.commit() # always commit before yield, if modifying db values
+    yield self._actually_start_server_somehow()
+```
+
+When `Spawner.start` returns, the single-user server process should actually be running,
+not just requested. JupyterHub can handle `Spawner.start` being very slow
+(such as PBS-style batch queues, or instantiating whole AWS instances)
+via relaxing the `Spawner.start_timeout` config value.
+
+
+## Spawner.poll
+
+`Spawner.poll` should check if the spawner is still running.
+It should return `None` if it is still running,
+and an integer exit status, otherwise.
+
+For the local process case, this uses `os.kill(PID, 0)`
+to check if the process is still around.
+
+
+## Spawner.stop
+
+`Spawner.stop` should stop the process. It must be a tornado coroutine,
+and should return when the process has finished exiting.
+
+
+## Spawner state
+
+JupyterHub should be able to stop and restart without having to teardown
+single-user servers. This means that a Spawner may need to persist
+some information that it can be restored.
+A dictionary of JSON-able state can be used to store this information.
+
+Unlike start/stop/poll, the state methods must not be coroutines.
+
+In the single-process case, this is only the process ID of the server:
+
+```python
+def get_state(self):
+    """get the current state"""
+    state = super().get_state()
+    if self.pid:
+        state['pid'] = self.pid
+    return state
+
+def load_state(self, state):
+    """load state from the database"""
+    super().load_state(state)
+    if 'pid' in state:
+        self.pid = state['pid']
+
+def clear_state(self):
+    """clear any state (called after shutdown)"""
+    super().clear_state()
+    self.pid = 0
+```
+
+
+
+[Spawner]: ../jupyterhub/spawner.py