reflow proxy doc

This commit is contained in:
Carol Willing
2018-05-07 20:17:14 -07:00
parent 58f9237b12
commit 6e212fa476

View File

@@ -1,22 +1,26 @@
# Writing a custom Proxy implementation # Writing a custom Proxy implementation
JupyterHub 0.8 introduced the ability to write a custom implementation of the proxy. JupyterHub 0.8 introduced the ability to write a custom implementation of the
This enables deployments with different needs than the default proxy, proxy. This enables deployments with different needs than the default proxy,
configurable-http-proxy (CHP). configurable-http-proxy (CHP). CHP is a single-process nodejs proxy that they
CHP is a single-process nodejs proxy that they Hub manages by default as a subprocess Hub manages by default as a subprocess (it can be run externally, as well, and
(it can be run externally, as well, and typically is in production deployments). typically is in production deployments).
The upside to CHP, and why we use it by default, is that it's easy to install and run (if you have nodejs, you are set!). The upside to CHP, and why we use it by default, is that it's easy to install
The downsides are that it's a single process and does not support any persistence of the routing table. and run (if you have nodejs, you are set!). The downsides are that it's a
So if the proxy process dies, your whole JupyterHub instance is inaccessible until the Hub notices, restarts the proxy, and restores the routing table. single process and does not support any persistence of the routing table. So
For deployments that want to avoid such a single point of failure, if the proxy process dies, your whole JupyterHub instance is inaccessible
or leverage existing proxy infrastructure in their chosen deployment (such as Kubernetes ingress objects), until the Hub notices, restarts the proxy, and restores the routing table. For
the Proxy API provides a way to do that. deployments that want to avoid such a single point of failure, or leverage
existing proxy infrastructure in their chosen deployment (such as Kubernetes
ingress objects), the Proxy API provides a way to do that.
In general, for a proxy to be usable by JupyterHub, it must: In general, for a proxy to be usable by JupyterHub, it must:
1. support websockets without prior knowledge of the URL where websockets may occur 1. support websockets without prior knowledge of the URL where websockets may
2. support trie-based routing (i.e. allow different routes on `/foo` and `/foo/bar` and route based on specificity) occur
2. support trie-based routing (i.e. allow different routes on `/foo` and
`/foo/bar` and route based on specificity)
3. adding or removing a route should not cause existing connections to drop 3. adding or removing a route should not cause existing connections to drop
Optionally, if the JupyterHub deployment is to use host-based routing, Optionally, if the JupyterHub deployment is to use host-based routing,
@@ -35,10 +39,10 @@ class MyProxy(Proxy):
... ...
``` ```
## Starting and stopping the proxy ## Starting and stopping the proxy
If your proxy should be launched when the Hub starts, you must define how to start and stop your proxy: If your proxy should be launched when the Hub starts, you must define how
to start and stop your proxy:
```python ```python
from tornado import gen from tornado import gen
@@ -55,8 +59,8 @@ class MyProxy(Proxy):
These methods **may** be coroutines. These methods **may** be coroutines.
`c.Proxy.should_start` is a configurable flag that determines whether the Hub should call these methods when the Hub itself starts and stops. `c.Proxy.should_start` is a configurable flag that determines whether the
Hub should call these methods when the Hub itself starts and stops.
### Purely external proxies ### Purely external proxies
@@ -70,31 +74,30 @@ class MyProxy(Proxy):
should_start = False should_start = False
``` ```
## Routes
## Adding and removing routes At its most basic, a Proxy implementation defines a mechanism to add, remove,
and retrieve routes. A proxy that implements these three methods is complete.
At its most basic, a Proxy implementation defines a mechanism to add, remove, and retrieve routes.
A proxy that implements these three methods is complete.
Each of these methods **may** be a coroutine. Each of these methods **may** be a coroutine.
**Definition:** routespec **Definition:** routespec
A routespec, which will appear in these methods, is a string describing a route to be proxied, A routespec, which will appear in these methods, is a string describing a
such as `/user/name/`. A routespec will: route to be proxied, such as `/user/name/`. A routespec will:
1. always end with `/` 1. always end with `/`
2. always start with `/` if it is a path-based route `/proxy/path/` 2. always start with `/` if it is a path-based route `/proxy/path/`
3. precede the leading `/` with a host for host-based routing, e.g. `host.tld/proxy/path/` 3. precede the leading `/` with a host for host-based routing, e.g.
`host.tld/proxy/path/`
### Adding a route ### Adding a route
When adding a route, JupyterHub may pass a JSON-serializable dict as a `data` argument When adding a route, JupyterHub may pass a JSON-serializable dict as a `data`
that should be attacked to the proxy route. argument that should be attacked to the proxy route. When that route is
When that route is retrieved, the `data` argument should be returned as well. retrieved, the `data` argument should be returned as well. If your proxy
If your proxy implementation doesn't support storing data attached to routes, implementation doesn't support storing data attached to routes, then your
then your Python wrapper may have to handle storing the `data` piece itself, Python wrapper may have to handle storing the `data` piece itself, e.g in a
e.g in a simple file or database. simple file or database.
```python ```python
@gen.coroutine @gen.coroutine
@@ -113,12 +116,10 @@ proxy.add_route('/user/pgeorgiou/', 'http://127.0.0.1:1227',
{'user': 'pgeorgiou'}) {'user': 'pgeorgiou'})
``` ```
### Removing routes ### Removing routes
`delete_route()` is given a routespec to delete. `delete_route()` is given a routespec to delete. If there is no such route,
If there is no such route, `delete_route` should still succeed, `delete_route` should still succeed, but a warning may be issued.
but a warning may be issued.
```python ```python
@gen.coroutine @gen.coroutine
@@ -126,18 +127,17 @@ def delete_route(self, routespec):
"""Delete the route""" """Delete the route"""
``` ```
### Retrieving routes ### Retrieving routes
For retrieval, you only *need* to implement a single method that retrieves all routes. For retrieval, you only *need* to implement a single method that retrieves all
The return value for this function should be a dictionary, keyed by `routespect`, routes. The return value for this function should be a dictionary, keyed by
of dicts whose keys are the same three arguments passed to `add_route` `routespect`, of dicts whose keys are the same three arguments passed to
(`routespec`, `target`, `data`) `add_route` (`routespec`, `target`, `data`)
```python ```python
@gen.coroutine @gen.coroutine
def get_all_routes(self): def get_all_routes(self):
"""Return all routes, keyed by routespec"""" """Return all routes, keyed by routespec"""
``` ```
```python ```python
@@ -150,15 +150,15 @@ def get_all_routes(self):
} }
``` ```
## Note on activity tracking
JupyterHub can track activity of users, for use in services such as culling
#### Note on activity tracking idle servers. As of JupyterHub 0.8, this activity tracking is the
responsibility of the proxy. If your proxy implementation can track activity
JupyterHub can track activity of users, for use in services such as culling idle servers. to endpoints, it may add a `last_activity` key to the `data` of routes
As of JupyterHub 0.8, this activity tracking is the responsibility of the proxy. retrieved in `.get_all_routes()`. If present, the value of `last_activity`
If your proxy implementation can track activity to endpoints, should be an [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) UTC date
it may add a `last_activity` key to the `data` of routes retrieved in `.get_all_routes()`. string:
If present, the value of `last_activity` should be an [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) UTC date string:
```python ```python
{ {
@@ -173,11 +173,9 @@ If present, the value of `last_activity` should be an [ISO8601](https://en.wikip
} }
``` ```
If the proxy does not track activity, then only activity to the Hub itself is
tracked, and services such as cull-idle will not work.
If the proxy does not track activity, then only activity to the Hub itself is tracked, Now that `notebook-5.0` tracks activity internally, we can retrieve activity
and services such as cull-idle will not work. information from the single-user servers instead, removing the need to track
activity in the proxy. But this is not yet implemented in JupyterHub 0.8.0.
Now that `notebook-5.0` tracks activity internally,
we can retrieve activity information from the single-user servers instead,
removing the need to track activity in the proxy.
But this is not yet implemented in JupyterHub 0.8.0.