Merge pull request #1462 from minrk/proxy-docs

Document custom proxy implementations
2025-10-18 15:33:02 +00:00 · 2017-10-03 08:36:02 -07:00
parent b34be77fec 01a67ba156
commit 3fc74bd79e
5 changed files with 189 additions and 4 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -6,6 +6,7 @@ node_modules
 /build
 dist
 docs/_build
 docs/build
 docs/source/_static/rest-api
 .ipynb_checkpoints
 # ignore config file at the top-level of the repo
--- a/docs/source/api/services.auth.rst
+++ b/docs/source/api/services.auth.rst
@@ -17,7 +17,7 @@ Module: :mod:`jupyterhub.services.auth`
    :members:
 :class:`HubOAuth`
----------------
+-----------------
 .. autoconfigurable:: HubOAuth
    :members:
@@ -30,7 +30,7 @@ Module: :mod:`jupyterhub.services.auth`
    :members:
 :class:`HubOAuthenticated`
-------------------------
+--------------------------
 .. autoclass:: HubOAuthenticated
--- a/docs/source/changelog.md
+++ b/docs/source/changelog.md
@@ -23,7 +23,7 @@ in your Dockerfile is sufficient.
 #### Added
- JupyterHub now defined a `.Proxy` API for custom
+- JupyterHub now defined a `Proxy` API for custom
  proxy implementations other than the default.
  The defaults are unchanged,
  but configuration of the proxy is now done on the `ConfigurableHTTPProxy` class instead of the top-level JupyterHub.
@@ -32,7 +32,7 @@ in your Dockerfile is sufficient.
  (anything that uses HubAuth)
  can now accept token-authenticated requests via the Authentication header.
 - Authenticators can now store state in the Hub's database.
-  To do so, the `.authenticate` method should return a dict of the form
+  To do so, the `authenticate` method should return a dict of the form
  ```python
  {
--- a/docs/source/reference/index.rst
+++ b/docs/source/reference/index.rst
@@ -9,6 +9,7 @@ Technical Reference
   authenticators
   spawners
   services
   proxy
   rest
   upgrading
   config-examples
--- a/docs/source/reference/proxy.md
+++ b/docs/source/reference/proxy.md
@@ -0,0 +1,183 @@
 # Writing a custom Proxy implementation
 JupyterHub 0.8 introduced the ability to write a custom implementation of the proxy.
 This enables deployments with different needs than the default proxy,
 configurable-http-proxy (CHP).
 CHP is a single-process nodejs proxy that they Hub manages by default as a subprocess
 (it can be run externally, as well, and typically is in production deployments).
 The upside to CHP, and why we use it by default, is that it's easy to install and run (if you have nodejs, you are set!).
 The downsides are that it's a single process and does not support any persistence of the routing table.
 So if the proxy process dies, your whole JupyterHub instance is inaccessible until the Hub notices, restarts the proxy, and restores the routing table.
 For deployments that want to avoid such a single point of failure,
 or leverage existing proxy infrastructure in their chosen deployment (such as Kubernetes ingress objects),
 the Proxy API provides a way to do that.
 In general, for a proxy to be usable by JupyterHub, it must:
 1. support websockets without prior knowledge of the URL where websockets may occur
 2. support trie-based routing (i.e. allow different routes on `/foo` and `/foo/bar` and route based on specificity)
 3. adding or removing a route should not cause existing connections to drop
 Optionally, if the JupyterHub deployment is to use host-based routing,
 the Proxy must additionally support routing based on the Host of the request.
 ## Subclassing Proxy
 To start, any Proxy implementation should subclass the base Proxy class,
 as is done with custom Spawners and Authenticators.
 ```python
 from jupyterhub.proxy import Proxy
 class MyProxy(Proxy):
    """My Proxy implementation"""
    ...
 ```
 ## Starting and stopping the proxy
 If your proxy should be launched when the Hub starts, you must define how to start and stop your proxy:
 ```python
 from tornado import gen
 class MyProxy(Proxy):
    ...
    @gen.coroutine
    def start(self):
        """Start the proxy"""
    @gen.coroutine
    def stop(self):
        """Stop the proxy"""
 ```
 These methods **may** be  coroutines.
 `c.Proxy.should_start` is a configurable flag that determines whether the Hub should call these methods when the Hub itself starts and stops.
 ### Purely external proxies
 Probably most custom proxies will be externally managed,
 such as Kubernetes ingress-based implementations.
 In this case, you do not need to define `start` and `stop`.
 To disable the methods, you can define `should_start = False` at the class level:
 ```python
 class MyProxy(Proxy):
    should_start = False
 ```
 ## Adding and removing routes
 At its most basic, a Proxy implementation defines a mechanism to add, remove, and retrieve routes.
 A proxy that implements these three methods is complete.
 Each of these methods **may** be a coroutine.
 **Definition:** routespec
 A routespec, which will appear in these methods, is a string describing a route to be proxied,
 such as `/user/name/`. A routespec will:
 1. always end with `/`
 2. always start with `/` if it is a path-based route `/proxy/path/`
 3. precede the leading `/` with a host for host-based routing, e.g. `host.tld/proxy/path/`
 ### Adding a route
 When adding a route, JupyterHub may pass a JSON-serializable dict as a `data` argument
 that should be attacked to the proxy route.
 When that route is retrieved, the `data` argument should be returned as well.
 If your  proxy implementation doesn't support storing data attached to routes,
 then your Python wrapper may have to handle storing the `data` piece itself,
 e.g in a simple file or database.
 ```python
@gen.coroutine
 def add_route(self, routespec, target, data):
    """Proxy `routespec` to `target`.
    Store `data` associated with the routespec
    for retrieval later.
    """
 ```
 Adding a route for a user looks like this:
 ```python
 proxy.add_route('/user/pgeorgiou/', 'http://127.0.0.1:1227',
                {'user': 'pgeorgiou'})
 ```
 ### Removing routes
 `delete_route()` is given a routespec to delete.
 If there is no such route, `delete_route` should still succeed,
 but a warning may be issued.
 ```python
@gen.coroutine
 def delete_route(self, routespec):
    """Delete the route"""
 ```
 ### Retrieving routes
 For retrieval, you only *need* to implement a single method that retrieves all routes.
 The return value for this function should be a dictionary, keyed by `routespect`,
 of dicts whose keys are the same three arguments passed to `add_route`
 (`routespec`, `target`, `data`)
 ```python
@gen.coroutine
 def get_all_routes(self):
    """Return all routes, keyed by routespec""""
 ```
 ```python
 {
  '/proxy/path/': {
    'routespec': '/proxy/path/',
    'target': 'http://...',
    'data': {},
  },
 }
 ```
 #### Note on activity tracking
 JupyterHub can track activity of users, for use in services such as culling idle servers.
 As of JupyterHub 0.8, this activity tracking is the responsibility of the proxy.
 If your proxy implementation can track activity to endpoints,
 it may add a `last_activity` key to the `data` of routes retrieved in `.get_all_routes()`.
 If present, the value of `last_activity` should be an [ISO8601](https://en.wikipedia.org/wiki/ISO_8601) UTC date string:
 ```python
 {
  '/user/pgeorgiou/': {
    'routespec': '/user/pgeorgiou/',
    'target': 'http://127.0.0.1:1227',
    'data': {
      'user': 'pgeourgiou',
      'last_activity': '2017-10-03T10:33:49.570Z',
    },
  },
 }
 ```
 If the proxy does not track activity, then only activity to the Hub itself is tracked,
 and services such as cull-idle will not work.
 Now that `notebook-5.0` tracks activity internally,
 we can retrieve activity information from the single-user servers instead,
 removing the need to track activity in the proxy.
 But this is not yet implemented in JupyterHub 0.8.0.