mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-07 18:14:10 +00:00
422 lines
17 KiB
Markdown
422 lines
17 KiB
Markdown
(spawners-reference)=
|
|
|
|
# Spawners
|
|
|
|
A [Spawner][] starts each single-user notebook server.
|
|
The Spawner represents an abstract interface to a process,
|
|
and a custom Spawner needs to be able to take three actions:
|
|
|
|
- start a process
|
|
- poll whether a process is still running
|
|
- stop a process
|
|
|
|
## Examples
|
|
|
|
Additional Spawners can be installed from separate packages.
|
|
Some examples include:
|
|
|
|
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
|
|
- `dockerspawner.DockerSpawner` for spawning identical Docker containers for
|
|
each user
|
|
- `dockerspawner.SystemUserSpawner` for spawning Docker containers with an
|
|
environment and home directory for each user
|
|
- both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
|
|
launching containers on remote machines
|
|
- [SudoSpawner](https://github.com/jupyterhub/sudospawner) enables JupyterHub to
|
|
run without being root, by spawning an intermediate process via `sudo`
|
|
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
|
|
servers using batch systems
|
|
- [YarnSpawner](https://github.com/jupyterhub/yarnspawner) for spawning notebook
|
|
servers in YARN containers on a Hadoop cluster
|
|
- [SSHSpawner](https://github.com/NERSC/sshspawner) to spawn notebooks
|
|
on a remote server using SSH
|
|
- [KubeSpawner](https://github.com/jupyterhub/kubespawner) to spawn notebook servers on kubernetes cluster.
|
|
- [NomadSpawner](https://github.com/mxab/jupyterhub-nomad-spawner) to spawn a notebook server as a Nomad job inside HashiCorp's Nomad cluster
|
|
|
|
## Spawner control methods
|
|
|
|
### Spawner.start
|
|
|
|
`Spawner.start` should start a single-user server for a single user.
|
|
Information about the user can be retrieved from `self.user`,
|
|
an object encapsulating the user's name, authentication, and server info.
|
|
|
|
The return value of `Spawner.start` should be the `(ip, port)` of the running server,
|
|
or a full URL as a string.
|
|
|
|
Most `Spawner.start` functions will look similar to this example:
|
|
|
|
```python
|
|
async def start(self):
|
|
self.ip = '127.0.0.1'
|
|
self.port = random_port()
|
|
# get environment variables,
|
|
# several of which are required for configuring the single-user server
|
|
env = self.get_env()
|
|
cmd = []
|
|
# get jupyterhub command to run,
|
|
# typically ['jupyterhub-singleuser']
|
|
cmd.extend(self.cmd)
|
|
cmd.extend(self.get_args())
|
|
|
|
await self._actually_start_server_somehow(cmd, env)
|
|
# url may not match self.ip:self.port, but it could!
|
|
url = self._get_connectable_url()
|
|
return url
|
|
```
|
|
|
|
When `Spawner.start` returns, the single-user server process should actually be running,
|
|
not just requested. JupyterHub can handle `Spawner.start` being very slow
|
|
(such as PBS-style batch queues, or instantiating whole AWS instances)
|
|
via relaxing the `Spawner.start_timeout` config value.
|
|
|
|
#### Note on IPs and ports
|
|
|
|
`Spawner.ip` and `Spawner.port` attributes set the _bind_ URL,
|
|
which the single-user server should listen on
|
|
(passed to the single-user process via the `JUPYTERHUB_SERVICE_URL` environment variable).
|
|
The _return_ value is the IP and port (or full URL) the Hub should _connect to_.
|
|
These are not necessarily the same, and usually won't be in any Spawner that works with remote resources or containers.
|
|
|
|
The default for `Spawner.ip`, and `Spawner.port` is `127.0.0.1:{random}`,
|
|
which is appropriate for Spawners that launch local processes,
|
|
where everything is on localhost and each server needs its own port.
|
|
For remote or container Spawners, it will often make sense to use a different value,
|
|
such as `ip = '0.0.0.0'` and a fixed port, e.g. `8888`.
|
|
The defaults can be changed in the class,
|
|
preserving configuration with traitlets:
|
|
|
|
```python
|
|
from traitlets import default
|
|
from jupyterhub.spawner import Spawner
|
|
|
|
class MySpawner(Spawner):
|
|
@default("ip")
|
|
def _default_ip(self):
|
|
return '0.0.0.0'
|
|
|
|
@default("port")
|
|
def _default_port(self):
|
|
return 8888
|
|
|
|
async def start(self):
|
|
env = self.get_env()
|
|
cmd = []
|
|
# get jupyterhub command to run,
|
|
# typically ['jupyterhub-singleuser']
|
|
cmd.extend(self.cmd)
|
|
cmd.extend(self.get_args())
|
|
|
|
remote_server_info = await self._actually_start_server_somehow(cmd, env)
|
|
url = self.get_public_url_from(remote_server_info)
|
|
return url
|
|
```
|
|
|
|
#### Exception handling
|
|
|
|
When `Spawner.start` raises an Exception, a message can be passed on to the user via the exception using a `.jupyterhub_html_message` or `.jupyterhub_message` attribute.
|
|
|
|
When the Exception has a `.jupyterhub_html_message` attribute, it will be rendered as HTML to the user.
|
|
|
|
Alternatively `.jupyterhub_message` is rendered as unformatted text.
|
|
|
|
If both attributes are not present, the Exception will be shown to the user as unformatted text.
|
|
|
|
### Spawner.poll
|
|
|
|
`Spawner.poll` checks if the spawner is still running.
|
|
It should return `None` if it is still running,
|
|
and an integer exit status, otherwise.
|
|
|
|
In the case of local processes, `Spawner.poll` uses `os.kill(PID, 0)`
|
|
to check if the local process is still running. On Windows, it uses `psutil.pid_exists`.
|
|
|
|
### Spawner.stop
|
|
|
|
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
|
|
|
|
## Spawner state
|
|
|
|
JupyterHub should be able to stop and restart without tearing down
|
|
single-user notebook servers. To do this task, a Spawner may need to persist
|
|
some information that can be restored later.
|
|
A JSON-able dictionary of state can be used to store persisted information.
|
|
|
|
Unlike start, stop, and poll methods, the state methods must not be coroutines.
|
|
|
|
In the case of single processes, the Spawner state is only the process ID of the server:
|
|
|
|
```python
|
|
def get_state(self):
|
|
"""get the current state"""
|
|
state = super().get_state()
|
|
if self.pid:
|
|
state['pid'] = self.pid
|
|
return state
|
|
|
|
def load_state(self, state):
|
|
"""load state from the database"""
|
|
super().load_state(state)
|
|
if 'pid' in state:
|
|
self.pid = state['pid']
|
|
|
|
def clear_state(self):
|
|
"""clear any state (called after shutdown)"""
|
|
super().clear_state()
|
|
self.pid = 0
|
|
```
|
|
|
|
## Spawner options form
|
|
|
|
(new in 0.4)
|
|
|
|
Some deployments may want to offer options to users to influence how their servers are started.
|
|
This may include cluster-based deployments, where users specify what resources should be available,
|
|
or docker-based deployments where users can select from a list of base images.
|
|
|
|
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
|
|
inserted unmodified into the spawn form.
|
|
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
|
|
|
|

|
|
|
|
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
|
|
|
|
See [this example](https://github.com/jupyterhub/jupyterhub/blob/HEAD/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
|
|
|
|
### `Spawner.options_from_form`
|
|
|
|
Options from this form will always be a dictionary of lists of strings, e.g.:
|
|
|
|
```python
|
|
{
|
|
'integer': ['5'],
|
|
'text': ['some text'],
|
|
'select': ['a', 'b'],
|
|
}
|
|
```
|
|
|
|
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
|
|
which is a method to turn the form data into the correct structure.
|
|
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
|
|
|
|
```python
|
|
def options_from_form(self, formdata):
|
|
options = {}
|
|
options['integer'] = int(formdata['integer'][0]) # single integer value
|
|
options['text'] = formdata['text'][0] # single string value
|
|
options['select'] = formdata['select'] # list already correct
|
|
options['notinform'] = 'extra info' # not in the form at all
|
|
return options
|
|
```
|
|
|
|
which would return:
|
|
|
|
```python
|
|
{
|
|
'integer': 5,
|
|
'text': 'some text',
|
|
'select': ['a', 'b'],
|
|
'notinform': 'extra info',
|
|
}
|
|
```
|
|
|
|
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
|
|
|
|
[spawner]: https://github.com/jupyterhub/jupyterhub/blob/HEAD/jupyterhub/spawner.py
|
|
|
|
## Writing a custom spawner
|
|
|
|
If you are interested in building a custom spawner, you can read [this tutorial](https://jupyterhub-tutorial.readthedocs.io/en/latest/spawners.html).
|
|
|
|
### Registering custom Spawners via entry points
|
|
|
|
As of JupyterHub 1.0, custom Spawners can register themselves via
|
|
the `jupyterhub.spawners` entry point metadata.
|
|
To do this, in your `setup.py` add:
|
|
|
|
```python
|
|
setup(
|
|
...
|
|
entry_points={
|
|
'jupyterhub.spawners': [
|
|
'myservice = mypackage:MySpawner',
|
|
],
|
|
},
|
|
)
|
|
```
|
|
|
|
If you have added this metadata to your package,
|
|
users can select your spawner with the configuration:
|
|
|
|
```python
|
|
c.JupyterHub.spawner_class = 'myservice'
|
|
```
|
|
|
|
instead of the full
|
|
|
|
```python
|
|
c.JupyterHub.spawner_class = 'mypackage:MySpawner'
|
|
```
|
|
|
|
previously required.
|
|
Additionally, configurable attributes for your spawner will
|
|
appear in jupyterhub help output and auto-generated configuration files
|
|
via `jupyterhub --generate-config`.
|
|
|
|
## Environment variables and command-line arguments
|
|
|
|
Spawners mainly do one thing: launch a command in an environment.
|
|
|
|
The command-line is constructed from user configuration:
|
|
|
|
- Spawner.cmd (default: `['jupyterhub-singleuser']`)
|
|
- Spawner.args (CLI args to pass to the cmd, default: empty)
|
|
|
|
where the configuration:
|
|
|
|
```python
|
|
c.Spawner.cmd = ["my-singleuser-wrapper"]
|
|
c.Spawner.args = ["--debug", "--flag"]
|
|
```
|
|
|
|
would result in spawning the command:
|
|
|
|
```bash
|
|
my-singleuser-wrapper --debug --flag
|
|
```
|
|
|
|
The `Spawner.get_args()` method is how `Spawner.args` is accessed,
|
|
and can be used by Spawners to customize/extend user-provided arguments.
|
|
|
|
Prior to 2.0, JupyterHub unconditionally added certain options _if specified_ to the command-line,
|
|
such as `--ip={Spawner.ip}` and `--port={Spawner.port}`.
|
|
These have now all been moved to environment variables,
|
|
and from JupyterHub 2.0,
|
|
the command-line launched by JupyterHub is fully specified by overridable configuration `Spawner.cmd + Spawner.args`.
|
|
|
|
Most process configuration is passed via environment variables.
|
|
Additional variables can be specified via the `Spawner.environment` configuration.
|
|
|
|
The process environment is returned by `Spawner.get_env`, which specifies the following environment variables:
|
|
|
|
- `JUPYTERHUB_SERVICE_URL` - the _bind_ URL where the server should launch its HTTP server (`http://127.0.0.1:12345`).
|
|
This includes `Spawner.ip` and `Spawner.port`; _new in 2.0, prior to 2.0 IP, port were on the command-line and only if specified_
|
|
- `JUPYTERHUB_SERVICE_PREFIX` - the URL prefix the service will run on (e.g. `/user/name/`)
|
|
- `JUPYTERHUB_USER` - the JupyterHub user's username
|
|
- `JUPYTERHUB_SERVER_NAME` - the server's name, if using named servers (default server has an empty name)
|
|
- `JUPYTERHUB_API_URL` - the full URL for the JupyterHub API (http://17.0.0.1:8001/hub/api)
|
|
- `JUPYTERHUB_BASE_URL` - the base URL of the whole jupyterhub deployment, i.e. the bit before `hub/` or `user/`,
|
|
as set by `c.JupyterHub.base_url` (default: `/`)
|
|
- `JUPYTERHUB_API_TOKEN` - the API token the server can use to make requests to the Hub.
|
|
This is also the OAuth client secret.
|
|
- `JUPYTERHUB_CLIENT_ID` - the OAuth client ID for authenticating visitors.
|
|
- `JUPYTERHUB_OAUTH_CALLBACK_URL` - the callback URL to use in OAuth, typically `/user/:name/oauth_callback`
|
|
- `JUPYTERHUB_OAUTH_ACCESS_SCOPES` - the scopes required to access the server (called `JUPYTERHUB_OAUTH_SCOPES` prior to 3.0)
|
|
- `JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES` - the scopes the service is allowed to request.
|
|
If no scopes are requested explicitly, these scopes will be requested.
|
|
- `JUPYTERHUB_PUBLIC_URL` - the public URL of the server,
|
|
e.g. `https://jupyterhub.example.org/user/name/`.
|
|
Empty if no public URL is specified (default).
|
|
Will be available if subdomains are configured.
|
|
- `JUPYTERHUB_PUBLIC_HUB_URL` - the public URL of JupyterHub as a whole,
|
|
e.g. `https://jupyterhub.example.org/`.
|
|
Empty if no public URL is specified (default).
|
|
Will be available if subdomains are configured.
|
|
|
|
Optional environment variables, depending on configuration:
|
|
|
|
- `JUPYTERHUB_SSL_[KEYFILE|CERTFILE|CLIENT_CI]` - SSL configuration, when `internal_ssl` is enabled
|
|
- `JUPYTERHUB_ROOT_DIR` - the root directory of the server (notebook directory), when `Spawner.notebook_dir` is defined (new in 2.0)
|
|
- `JUPYTERHUB_DEFAULT_URL` - the default URL for the server (for redirects from `/user/:name/`),
|
|
if `Spawner.default_url` is defined
|
|
(new in 2.0, previously passed via CLI)
|
|
- `JUPYTERHUB_DEBUG=1` - generic debug flag, sets maximum log level when `Spawner.debug` is True
|
|
(new in 2.0, previously passed via CLI)
|
|
- `JUPYTERHUB_DISABLE_USER_CONFIG=1` - disable loading user config,
|
|
sets maximum log level when `Spawner.debug` is True (new in 2.0,
|
|
previously passed via CLI)
|
|
|
|
- `JUPYTERHUB_[MEM|CPU]_[LIMIT_GUARANTEE]` - the values of CPU and memory limits and guarantees.
|
|
These are not expected to be enforced by the process,
|
|
but are made available as a hint,
|
|
e.g. for resource monitoring extensions.
|
|
|
|
## Spawners, resource limits, and guarantees (Optional)
|
|
|
|
Some spawners of the single-user notebook servers allow setting limits or
|
|
guarantees on resources, such as CPU and memory. To provide a consistent
|
|
experience for sysadmins and users, we provide a standard way to set and
|
|
discover these resource limits and guarantees, such as for memory and CPU.
|
|
For the limits and guarantees to be useful, **the spawner must implement
|
|
support for them**. For example, `LocalProcessSpawner`, the default
|
|
spawner, does not support limits and guarantees. One of the spawners
|
|
that supports limits and guarantees is the
|
|
[`systemdspawner`](https://github.com/jupyterhub/systemdspawner).
|
|
|
|
### Memory Limits & Guarantees
|
|
|
|
`c.Spawner.mem_limit`: A **limit** specifies the _maximum amount of memory_
|
|
that may be allocated, though there is no promise that the maximum amount will
|
|
be available. In supported spawners, you can set `c.Spawner.mem_limit` to
|
|
limit the total amount of memory that a single-user notebook server can
|
|
allocate. Attempting to use more memory than this limit will cause errors. The
|
|
single-user notebook server can discover its own memory limit by looking at
|
|
the environment variable `MEM_LIMIT`, which is specified in absolute bytes.
|
|
|
|
`c.Spawner.mem_guarantee`: Sometimes, a **guarantee** of a _minimum amount of
|
|
memory_ is desirable. In this case, you can set `c.Spawner.mem_guarantee` to
|
|
to provide a guarantee that at minimum this much memory will always be
|
|
available for the single-user notebook server to use. The environment variable
|
|
`MEM_GUARANTEE` will also be set in the single-user notebook server.
|
|
|
|
**The spawner's underlying system or cluster is responsible for enforcing these
|
|
limits and providing these guarantees.** If these values are set to `None`, no
|
|
limits or guarantees are provided, and no environment values are set.
|
|
|
|
### CPU Limits & Guarantees
|
|
|
|
`c.Spawner.cpu_limit`: In supported spawners, you can set
|
|
`c.Spawner.cpu_limit` to limit the total number of cpu-cores that a
|
|
single-user notebook server can use. These can be fractional - `0.5` means 50%
|
|
of one CPU core, `4.0` is 4 CPU-cores, etc. This value is also set in the
|
|
single-user notebook server's environment variable `CPU_LIMIT`. The limit does
|
|
not claim that you will be able to use all the CPU up to your limit as other
|
|
higher priority applications might be taking up CPU.
|
|
|
|
`c.Spawner.cpu_guarantee`: You can set `c.Spawner.cpu_guarantee` to provide a
|
|
guarantee for CPU usage. The environment variable `CPU_GUARANTEE` will be set
|
|
in the single-user notebook server when a guarantee is being provided.
|
|
|
|
**The spawner's underlying system or cluster is responsible for enforcing these
|
|
limits and providing these guarantees.** If these values are set to `None`, no
|
|
limits or guarantees are provided, and no environment values are set.
|
|
|
|
### Encryption
|
|
|
|
Communication between the `Proxy`, `Hub`, and `Notebook` can be secured by
|
|
turning on `internal_ssl` in `jupyterhub_config.py`. For a custom spawner to
|
|
utilize these certs, there are two methods of interest on the base `Spawner`
|
|
class: `.create_certs` and `.move_certs`.
|
|
|
|
The first method, `.create_certs` will sign a key-cert pair using an internally
|
|
trusted authority for notebooks. During this process, `.create_certs` can
|
|
apply `ip` and `dns` name information to the cert via an `alt_names` `kwarg`.
|
|
This is used for certificate authentication (verification). Without proper
|
|
verification, the `Notebook` will be unable to communicate with the `Hub` and
|
|
vice versa when `internal_ssl` is enabled. For example, given a deployment
|
|
using the `DockerSpawner` which will start containers with `ips` from the
|
|
`docker` subnet pool, the `DockerSpawner` would need to instead choose a
|
|
container `ip` prior to starting and pass that to `.create_certs` (TODO: edit).
|
|
|
|
In general though, this method will not need to be changed and the default
|
|
`ip`/`dns` (localhost) info will suffice.
|
|
|
|
When `.create_certs` is run, it will create the certificates in a default,
|
|
central location specified by `c.JupyterHub.internal_certs_location`. For
|
|
`Spawners` that need access to these certs elsewhere (i.e. on another host
|
|
altogether), the `.move_certs` method can be overridden to move the certs
|
|
appropriately. Again, using `DockerSpawner` as an example, this would entail
|
|
moving certs to a directory that will get mounted into the container this
|
|
spawner starts.
|