add docs on custom authenticators and spawners

This commit is contained in:
Min RK
2015-03-07 15:55:29 -08:00
parent b6ab62ae3a
commit e6e9856861
3 changed files with 239 additions and 0 deletions

78
docs/authenticators.md Normal file
View File

@@ -0,0 +1,78 @@
# Writing a custom Authenticator
The [Authenticator][] is the mechanism for authorizing users.
Basic authenticators use simple username and password authentication.
JupyterHub ships only with a [PAM][]-based Authenticator,
for logging in with local user accounts.
You can use custom Authenticator subclasses to enable authentication via other systems.
One such example is using [GitHub OAuth][].
Because the username is passed from the Authenticator to the Spawner,
a custom Authenticator and Spawner are often used together.
## Basics of Authenticators
A basic Authenticator has one central method:
### Authenticator.authenticate
Authenticator.authenticate(handler, data)
This method is passed the tornado RequestHandler and the POST data from the login form.
Unless the login form has been customized, `data` will have two keys:
- `username` (self-explanatory)
- `password` (also self-explanatory)
`authenticate`'s job is simple:
- return a username (non-empty str)
of the authenticated user if authentication is successful
- return `None` otherwise
Writing an Authenticator that looks up passwords in a dictionary
requires only overriding this one method:
```python
from tornado import gen
from IPython.utils.traitlets import Dict
from jupyterhub.auth import Authenticator
class DictionaryAuthenticator(Authenticator):
passwords = Dict(config=True,
help="""dict of username:password for authentication"""
)
@gen.coroutine
def authenticate(self, handler, data):
if self.passwords.get(data['username']) == data['password']:
return data['username']
```
### Authenticator.whitelist
Authenticators can specify a whitelist of usernames to allow authentication.
For local user authentication (e.g. PAM), this lets you limit which users
can login.
## OAuth and other non-password logins
Some login mechanisms, such as [OAuth][], don't map onto username+password.
For these, you can override the login handlers.
You can see an example implementation of an Authenticator that uses [GitHub OAuth][]
at [OAuthenticator][].
[Authenticator]: ../jupyterhub/auth.py
[PAM]: http://en.wikipedia.org/wiki/Pluggable_authentication_module
[OAuth]: http://en.wikipedia.org/wiki/OAuth
[GitHub OAuth]: https://developer.github.com/v3/oauth/
[OAuthenticator]: https://github.com/jupyter/oauthenticator

75
docs/howitworks.md Normal file
View File

@@ -0,0 +1,75 @@
# How JupyterHub works
JupyterHub is a multi-user server that manages and proxies multiple instances of the single-user <del>IPython</del> Jupyter notebook server.
There are three basic processes involved:
- multi-user Hub (Python/Tornado)
- configurable http proxy (nodejs)
- multiple single-user IPython notebook servers (Python/IPython/Tornado)
The proxy is the only process that listens on a public interface.
The Hub sits behind the proxy at `/hub`.
Single-user servers sit behind the proxy at `/user/[username]`.
## Logging in
When a new browser logs in to JupyterHub, the following events take place:
- Login data is handed to the [Authenticator](#authentication) instance for validation
- The Authenticator returns the username, if login information is valid
- A single-user server instance is [Spawned](#spawning) for the logged-in user
- When the server starts, the proxy is notified to forward `/user/[username]/*` to the single-user server
- Two cookies are set, one for `/hub/` and another for `/user/[username]`,
containing an encrypted token.
- The browser is redirected to `/user/[username]`, which is handled by the single-user server
Logging into a single-user server is authenticated via the Hub:
- On request, the single-user server forwards the encrypted cookie to the Hub for verification
- The Hub replies with the username if it is a valid cookie
- If the user is the owner of the server, access is allowed
- If it is the wrong user or an invalid cookie, the browser is redirected to `/hub/login`
## Customizing JupyterHub
There are two basic extension points for JupyterHub: How users are authenticated,
and how their server processes are started.
Each is governed by a customizable class,
and JupyterHub ships with just the most basic version of each.
To enable custom authentication and/or spawning,
subclass Authenticator or Spawner,
and override the relevant methods.
### Authentication
Authentication is customizable via the Authenticator class.
Authentication can be replaced by any mechanism,
such as OAuth, Kerberos, etc.
JupyterHub only ships with [PAM](http://en.wikipedia.org/wiki/Pluggable_authentication_module) authentication,
which requires the server to be run as root,
or at least with access to the PAM service,
which regular users typically do not have
(on Ubuntu, this requires being added to the `shadow` group).
[More info on custom Authenticators](authenticators.md).
### Spawning
Each single-user server is started by a Spawner.
The Spawner represents an abstract interface to a process,
and needs to be able to take three actions:
1. start the process
2. poll whether the process is still running
3. stop the process
[More info on custom Spawners](spawners.md).
[An example using Docker](https://github.com/jupyter/dockerspawner).

86
docs/spawners.md Normal file
View File

@@ -0,0 +1,86 @@
# Writing a custom Spawner
Each single-user server is started by a [Spawner][].
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
1. start the process
2. poll whether the process is still running
3. stop the process
## Spawner.start
`Spawner.start` should start the single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
When `Spawner.start` returns, it should have stored the IP and port
of the single-user server in `self.user.server`.
**NOTE:** when writing coroutines, *never* `yield` in between a db change and a commit.
Most `Spawner.start`s should have something looking like:
```python
def start(self):
self.user.server.ip = 'localhost' # or other host or IP address, as seen by the Hub
self.user.server.port = 1234 # port selected somehow
self.db.commit() # always commit before yield, if modifying db values
yield self._actually_start_server_somehow()
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
## Spawner.poll
`Spawner.poll` should check if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, this uses `os.kill(PID, 0)`
to check if the process is still around.
## Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine,
and should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without having to teardown
single-user servers. This means that a Spawner may need to persist
some information that it can be restored.
A dictionary of JSON-able state can be used to store this information.
Unlike start/stop/poll, the state methods must not be coroutines.
In the single-process case, this is only the process ID of the server:
```python
def get_state(self):
"""get the current state"""
state = super().get_state()
if self.pid:
state['pid'] = self.pid
return state
def load_state(self, state):
"""load state from the database"""
super().load_state(state)
if 'pid' in state:
self.pid = state['pid']
def clear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid = 0
```
[Spawner]: ../jupyterhub/spawner.py