Files
jupyterhub/docs/source/spawners.md
2016-11-08 16:19:25 -08:00

194 lines
7.6 KiB
Markdown

# Spawners
A [Spawner][] starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
- start the process
- poll whether the process is still running
- stop the process
## Examples
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyter/jupyterhub/wiki/Spawners).
Some examples include:
- [DockerSpawner](https://github.com/jupyter/dockerspawner) for spawning user servers in Docker containers
* `dockerspawner.DockerSpawner` for spawning identical Docker containers for
each users
* `dockerspawner.SystemUserSpawner` for spawning Docker containers with an
environment and home directory for each users
* both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
launching containers on remote machines
- [SudoSpawner](https://github.com/jupyter/sudospawner) enables JupyterHub to
run without being root, by spawning an intermediate process via `sudo`
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
servers using batch systems
- [RemoteSpawner](https://github.com/zonca/remotespawner) to spawn notebooks
and a remote server and tunnel the port via SSH
## Spawner control methods
### Spawner.start
`Spawner.start` should start the single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
When `Spawner.start` returns, it should have stored the IP and port
of the single-user server in `self.user.server`.
**NOTE:** When writing coroutines, *never* `yield` in between a database change and a commit.
Most `Spawner.start` functions will look similar to this example:
```python
def start(self):
self.user.server.ip = 'localhost' # or other host or IP address, as seen by the Hub
self.user.server.port = 1234 # port selected somehow
self.db.commit() # always commit before yield, if modifying db values
yield self._actually_start_server_somehow()
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
### Spawner.poll
`Spawner.poll` should check if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
single-user notebook servers. To do this task, a Spawner may need to persist
some information that can be restored later.
A JSON-able dictionary of state can be used to store persisted information.
Unlike start, stop, and poll methods, the state methods must not be coroutines.
For the single-process case, the Spawner state is only the process ID of the server:
```python
def get_state(self):
"""get the current state"""
state = super().get_state()
if self.pid:
state['pid'] = self.pid
return state
def load_state(self, state):
"""load state from the database"""
super().load_state(state)
if 'pid' in state:
self.pid = state['pid']
def clear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid = 0
```
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
![spawn-form](images/spawn-form.png)
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyter/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
```python
{
'integer': ['5'],
'text': ['some text'],
'select': ['a', 'b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
```python
def options_from_form(self, formdata):
options = {}
options['integer'] = int(formdata['integer'][0]) # single integer value
options['text'] = formdata['text'][0] # single string value
options['select'] = formdata['select'] # list already correct
options['notinform'] = 'extra info' # not in the form at all
return options
```
which would return:
```python
{
'integer': 5,
'text': 'some text',
'select': ['a', 'b'],
'notinform': 'extra info',
}
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
[Spawner]: https://github.com/jupyter/jupyterhub/blob/master/jupyterhub/spawner.py
## Writing a custom spawner
If you are interested in building a custom spawner, you can read [this tutorial](http://jupyterhub-tutorial.readthedocs.io/en/latest/spawners.html).
## Exposing resource limits & guarantees
Some spawners allow setting CPU / Memory limits and / or guarantees on the spawned single-user servers. Such spawners should expose the limits/guarantees to the single-user servers via environment variables.
### Limits
If limits are being enforced, the following environment variables should be set in the single-user notebooks' environment:
* `LIMIT_MEM`: Maximum amount of memory the single user notebook is allowed to use, specified in integer bytes.
* `LIMIT_CPU`: Maximum number of 'cpu cores' a single user notebook is allowed to use, specified as a float. For example, `1.0` allows you to use one CPU core, `0.5` half a CPU core, `4.0` four CPU cores, etc.
If no resource limits are being enforced, these env variables should just not be set. Limits are also not guarantees - there are no guarantees that a single user notebook can use all the resources up to its limit.
If other resources are being limited, it is useful for the spawner to put a suitable value in an environment variable that starts with `LIMIT_`.
### Guarantees
Some spawners offer guarantees that specific resources will be available to the single-user notebook. These are probably reserved in some form in the underlying cluster system.
If guarantees are provided by the spawner, the following environment variables should be set in the single-user notebooks' environment:
* `GUARANTEE_MEM`: Minimum amount of memory that the single user notebook is guaranteed to be able to use, specified in integer bytes.
* `GUARANTEE_CPU`: Minimum number of 'cpu cores' a single-user notebook is allowed to use, specified as a float.
If other resources are being guaranteed, it is useful for the spawner to put a suitable value in an environment variable that starts with `GUARANTEE_`.