When request uri matching with base_url in PrefixRedirectHandler,
it's better to ensure uri with tariling slash. That's will avoid
redirecting /foobar to /foobar/hub/foobar.
Currently, to check if the proxy is running, os.kill(pid,0) is used,
which doesn't work on Windows. Wrapped call into a new function that
adds a Windows case.
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
The Hub will exit if consecutive failure count reaches this threshold
Any successful spawn will reset the count to 0
useful for auto-restarting / self-healing deployments such as kubernetes/systemd/docker where restarting the Hub
default is disabled, since it would bring down the Hub if it’s not an auto-restarting deployment
raise SystemExit on sigterm instead of calling atexit directly
- ensure fresh asyncio eventloop is created (not just IOLoop)
- makes cleanup more likely to run (one source of orphaned proxies)
this is the routespec for sending requests to the hub
It is [host]/prefix/ (not /hub/) so it receives all
requests, not just those destined for the hub
When the Hub listens on all ips by default, the connection ip is the hostname.
in some cases (e.g. certain kubernetes deployments) the hub’s container’s hostname is not connectable from itself, preventing managed services from connecting to the hub.
This ensures that managed service processes talk to the hub over localhost in this case, rather than via the hostname.
JupyterHub packages are in the 'conda-forge' channel of Anaconda packages; if the Anaconda installation doesn't already have 'conda-forge' enabled, `conda install jupyterhub` fails.
Rather than adding instructions to enable 'conda-forge' in Anaconda, this patch modifies the installation command to specify that channel.
Added a command that worked for me to fix the situation that localhost:8000 is unable to reach the hub even though the published command for Docker exposes the correct port.
- Using the new template_vars setting (#1872), allow the variable
`announcement` to create a header message on all the pages in the
title, or the variables `announcement_{home,login,logout,spawn}` to
set variables on these single pages.
- This is not the most powerful method of putting an announcement into
the templates, because it requires a server restart to change. But
the invasiveness is very low, and allows minimal message
without having to touch the templates themselves.
- Closes: #1836
escape with `_` instead of `%`.
This is not technically rigorous, as collisions are possible (users foo_40 and foo@ have the same domain)
and other domain restrictions are not applied (length, starting characters, etc.).
Username normalization can be used to apply stricter, more rigorous structure.
Addresses issue #1747.
These additions aren't perfect -- it's unfortunate that I've added
mention of reverse proxies on two separate pages. I don't _think_
these can reasonably be put on the same page -- perhaps a cross
reference?
- This message is presented when the last spawn failed, along with a
HTTP 500. The current text is quite confusing, especially when the
problem may just be solvable by trying to respawn again.
On the Windows case, the configurable-http-proxy is spwaned using a
shell. To stop the proxy, we need to terminate both the main process
(shell) and its child (proxy).
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
from the sqlalchemy docs
checks if a connection is valid via `SELECT 1` prior to using it.
Since we have long-running connections, this helps us survive database restarts, disconnects, etc.
- update config to single tuple instead of two integers
- call it spawn_throttle_retry_range
- fix setting Retry-After header without disabling error pages
enables `c.JupyterHub.bind_url = 'unix+http://%2Fsrv%2Fjupyterhub%2Fjupyterhub.sock'`
for listening on a bsd socket.
Similarly, bind_url and connect_url work as overrides everywhere
Add the hub_socket option to the JupyterHub class, which takes
precedence over the hub_ip and hub_port setting. It does not forward
this setting to the Hub class though, and a few log messages still say
the hub is listening on `http://:8000` that works fine when testing with
netcat:
```
$ nc -U /tmp/jhub.sock
GET /login HTTP/1.1
HTTP/1.1 302 Found
Server: TornadoServer/4.5.1
Content-Type: text/html; charset=UTF-8
Date: Fri, 28 Jul 2017 02:05:36 GMT
X-Jupyterhub-Version: 0.8.0.dev
Content-Security-Policy: frame-ancestors 'self'; report-uri /hub/security/csp-report
Location: /hub/login
Content-Length: 0
```
Should still be better documented I guess.
- Instead of creating many options for different timeouts of users and
servers, just add a note that the whole culler can be run multiple
times with different options. See discussion in #1834.
- Closes: #1834
- This will allow, for example, cull_idle_servers to be more
intelligent when culling servers.
- This is only given to admin API users, because we don't know if all
spawners expect their state to be made available to users.
in order to use sqlalchemy's expire_on_commit=False optimization,
we need to make sure that objects are kept up to date.
This means we cannot rely on ForeignKey ondelete/onupdate behavior,
we must use sqlalchemy's local relationship cascades
The main key here is that we must use relationships to set foreign-key relations,
e.g. APIToken.user = user instead of APIToken.user_id = user.id.
It also means that we cannot use passive_deletes,
which allows sqlalchemy to defer to the database's more efficient ON DELETE behavior.
This makes deletions more expensive in particular,
but should improve db performance overall.
allows opening an IPython shell with a connection to your database
alembic moved from `python -m jupyterhub.dbutil` to `python -m jupyterhub.dbutil alembic` subcommand
function-scoped fixture for shutting down servers
avoids servers leaking into neighbor tests without having to teardown the app itself after every test
allows iterating through an async generator, yielding items until another Future resolves
if/when that deadline Future resolves, ready items will continue to be yielded until there is one that actually needs to wait
at which point the iteration will halt
- add Spawner.progress method. Must be an async generator of JSON-able progress events
- add /api/users/:user/server-progress eventstream endpoint
- use eventstream to fill progress bar on the spawn pending page
use iso8601 Z suffix for UTC timestamps
use dateutil to parse dates from proxy, as well
even though CHP uses iso8601 UTC timestamps, we no longer assume CHP, so use more general parsing
in our db we are stuck with naïve datetime objects, so use those internally.
But ensure we put 'Z' on timestamps we ship externally
Now that it's unambiguous whether the proxy should start or not,
we don't need a warning about generating tokens causing issues for hub restart.
We can raise a strict, early error if proxy s external and token is still unspecified,
rather than running into a 403 error due to a generated token
JH can now differentiate between authenticated and authorized users via PAM
This allows JH to respect PAM-accessible user access controls.
This also fixes missing PAMAuthenticator.encoding usages.
- ignore spawner state when determining redirect destination
- remove implicit spawn from login handler (rely on redirect to user.url for spawn)
- settings.redirect_to_server determines if login sends users to /user/:name vs /hub/home
- visiting `/hub/` should result in the same destination regardless of login state or spawner state
asyncio has different coroutine start mechanics than tornado
tornado starts coroutines immediately,
whereas asyncio doesn't until they are scheduled with either ensure_future or waited upon.
In part to cleanup a few remnants of early design where jupyterhub was ‘jupyter-hub’ instead of ‘jupyterhub’.
Should also clarify to some degree what the cookies are for.
- hub login cookie is now ‘jupyterhub-hub-login’ instead of ‘jupyter-hub-token’
- user server cookie is now ‘jupyterhub-user-<name>’ instead of ‘user-name’ to keep jupyterhub prefix on all cookies
All cookies at this point:
- jupyterhub-session-id on /
- jupyterhub-hub-login on /hub/ (the main login cookie)
- jupyterhub-services on /services/
- jupyterhub-user-<name> on /user/:name
- jupyterhub-user-<name>-oauth-state on /user/:name during oauth
send SIGINFO (ctrl-T) to jupyterhub and it will dump
process info
- if psutil is available, show cpu, memory, FD counts
- always show stacks of non-idle threads
avoids filling up hidden .Trash directory when files are deleted
since there's no UI for trash in a jupyterhub deployment, this mainly results in files never being deleted and possibly filling up disks
put the Spawner instance back so it can reuse the token
'real' reuse cases don't need this because the info is stored in their own storage,
e.g. a stopped container.
HasTraits are expensive to instantiate, so make Users as light as possible
Removes immediate instantiation of Spawners during User init. Spawners will only be instantiated while running
Avoids instantiating too many objects before they are used
- deletes Spawner instances after they stop to avoid lingering instances
- use user_dict cache more often instead of db queries
- check for empty spawners dict to avoid a few Spawner instantiations
puts each check for a running spawner in a coroutine and runs them all concurrently.
Note: this will only improve performance when a large number of Spawners are running and `yield spawner.poll()` takes a nontrivial amount of time.
This is because these are coroutines, not threads. If instantiating Spawners themselves takes a long time, performance will not be affected.
This is still causing problems in all fresh deployments we are doing. I am fine with another solution, but at least wanted to proposed this as a fix for now.
If true, the user can have custom templates (specified in
template_paths) that extend the default templates, by referencing them
as "BASE:filename.html". This makes it easier to add information to
exising templates.
- ._running private attribute is removed. We don't need it anymore,
since we were only using it while the application was run in a background thread.
- call blocking cleanup in a thread because asyncio doesn't allow multiple loops in one thread.
Try to hit every possible exit point from the spawn_single_server
method, with an appropriate status code.
The default histogram buckets are also meant for request latencies,
but spawning usually takes longer so we use custom buckets
This patch introduces Prometheus for exposing metrics
about JupyterHub's operation. We expose a standard /metrics
endpoint that can be queried without authentication. We
take on prometheus_client as an unconditional dependency
to both simplify code & because it is a pure python package
with no dependencies itself.
The first pass adds 'RED' style metrics for all HTTP requests.
http://rancher.com/red-method-for-prometheus-3-key-metrics-for-monitoring/
has some info on the RED method, but to summarize:
For each request type, record at least the following metrics
Rate – the number of requests, per second, your services are serving.
Errors – the number of failed requests per second.
Duration – The amount of time each request takes expressed as a time interval.
This instantly gives us a lot of useful metrics in a very
compact form.
.terminate() only sends the signal,
it doesn't wait for the process to exit.
If the process doesn't exit promptly,
the next instance may try to grab the port before the previous process has released it,
causing failure with EADDRINUSE.
and add loud warning about discarding information
this has been the cause of many debugging difficulties,
when redirecting output seems to be a better option in ~all cases.
- add `authorization` to default Access-Control-Allow-Headers
- allow overriding `Access-Control-Allow-Headers` just like everything else in case default is inappropriate
- ensure case-insensitive comparison for proper header checks
Upgrades the database (if needed) on start.
This is opt-in, for uses like the helm chart where explicit 'upgrade-db' steps are hard to insert.
This ought to be safe for sqlite users, where an automatic backup file is created *if an upgrade will occur*.
In 0.8, the jupyterhub api token can also be used to make requests to
hte jupyter notebook given some conditions. This commit updates that
documentation
Installing the loop instructs the tornado loop to point to the ayncio loop and use
that. IOLoop.configure told the tornado loop to create a new ioloop when
a loop was needed, which is not what we want.
in case of multiple simultaneous
- state arg is strictly required now
- default cookie name in case of no collision is unchanged
- in case of collision, randomize cookie name with a suffix and store cookie_name in state
- expire state cookies after 10 minutes, not 1 day
- set ONDELETE='set null' on spawner->server relation (fixes error when deleting servers that stopped)
- set `spawner.server = None`, which is not triggered when deleting orm_spawner.server
proxies may not route `/user/name` correctly, only `/user/name/...`, so make sure that `/user/name` is redirected to `/user/name/`
this manifests as a redirect loop between /user/name and /hub/user/name when a route exists but /user/name is still
being routed to the Hub
so the error message is the same, however it was arrived at.
potential downside: it could look like the current request is spawning and failing,
rather than the reality that a previous spawn failed and we are just re-presenting the earlier error.
It's possible for there to have been a long time in between spawn and error.
require users to visit /hub/home and click 'Start My Server' to get a new server
Visits to /hub/user/:name will get an error if the previous spawn failed,
rather than triggering a new spawn.
This should guarantee that a user sees an error if their spawn failed,
regardless of when the failure occurred and how long it took.
Some cases of slow errors could result in triggering a new spawn indefinitely without
the user seeing an error message.
/hub/spawn was a simple redirect to /user/:name in the absence of a spawn form,
but now clears the `_spawn_future` prior to redirect
to signal that a new spawn has been explicitly requested in the case of a prior failure.
this is the most likely cause of redirect loops when using docker,
so record the spawner version and check it when a redirect is detected.
In the event of a redirect and mismatch, fail with a message explaining the version mismatch and how to fix it.
1. set _proxy_pending before first wait to ensure that there is never a gap between setting spawn flags
2. always call `finish_user_spawn` to reduce the number of finalization cases
3. wait for proxy to finish on the slow_spawn timeout, not just start, because we are only interested in the total duration for page responsiveness
handle will_resume case correctly, where an API token *may* be re-used.
Previously, we only did it right if the token was *always* reused,
but clearing out a container would get it into a bad state.
ensures that singleuser cookies do not persist across single-user instances
relaunching a singleuser instance invalidates all cookies used with that instance
- test that .will_resume preserves tokens (worked, but wasn't tested)
If a Spawner reuses a token, validate it in the db:
- verify that it's in the db
- if it doesn't map onto the right user, revoke the token
- if it's not in the db, insert it as a user-provided token
The most likely case is prior unclean shutdown of something like DockerSpawner,
where a spawn failed and thus the token was revoked,
but the container was in fact created.
user-provided tokens are added in exactly one place,
so switch default handling of tokens to generated=True
and explicitly distrust user tokens.
Add JupyterHub.trust_user_provided_tokens flag so that users can avoid the extra hashing
if they know they are providing good keys.
This shouldn’t happen, raise if it does.
If a token API request is authenticated with no user or service, delete the token because it is invalid and return with 404 because it doesn’t correspond to an existing user.
it was made a method for handing named_servers,
but that made things way more complicated and replaced a boolean flag with a callable,
which would behave unexpectedly but without error if a boolean flag was expected.
Spawners have properties for dealing with this now, so use spawners
Restore `user.running` as an alias for `user.spawner.ready`
produces summary of active/pending/ready spawner counts
Avoids brittle bookkeeping of running counts,
computing the value upon request.
For 10k users this is still only a few milliseconds, which seems worth it
fail with informative error if version mismatches
Since we weren't always tagging before,
we have to handle no tag being present:
- database empty (use latest because we are about to create everything anew)
- if 'spawners' is present, assume 0.8.dev
- if 'services' is present, assume 0.7.x
- else: assume base revision when we started tracking this stuff
Allows authenticators to optionally enable this flag
and signal that auth_state will be used,
enabling early check and exit if encryption is not available.
Currently Spawners need to overwrite start, stop, poll. When this is not
done, it will fail at runtime.
This replicate this check at class definition time, meaning that
potential errors will be caught way earlier. It also have not runtime
cost as the check is a class definition time (ie often import time).
This takes only effect on Python 3.6+ which introduce __init_subclass__,
we could do it with metaclasses, but that's might be too complicated.
If one want to create a class the avoid these restriction they can
overwrite __init_subclass__ and not call the super() method.
only benefit of privy was KDF, but if users provide good 32B keys, this doesn't help.
Fernet already adds randomness, etc. to tokens, so is good enough on its own if keys are good.
privy is used for encryption
- db only has blob column, no knowledge of encryption
- add CryptKeeper for handling encryption
- use privy for encryption, so we have fewer choices to make
- storing/loading encrypted auth_state runs in a ThreadPool
- MultFernet allows key rotation via `AUTH_STATE_KEY=secret2;secret1;secret0`
- Failure to decrypt results in cleared state
- Attempting to set auth_state without encryption is a hard failure
- Absent encryption, auth_state will always be None
prevent `.check_routes` from firing while we wait for a new proxy to come up
We check explicitly that it comes up with no routes, so makes sure check_routes hasn't restored its state, which is causing intermittent failures
for easier re-use of login in custom handlers
Further, enable auto_login + no custom login handler to mean that auth info is already present in requests
(e.g. REMOTE_USER)
- apply jitter to first iteration
- due to jitter, double start_wait to 0.2 so that <first wait> is still 0.1
- keep scaling by start_wait, rather than previous dt
- limit last wait to deadline so timeout is not overshot
- remove use of deprecated Hub.server
- add deprecation warning to Hub.server property
- move cookie_name declaration to Hub
It should now be possible to use Hub.from_url('http://1.2.3.4:1234/hub/') without missing information
Lots of custom proxy implementations that are distributed are
eventually consistent, and it might take upto a few seconds for
all the components to start redirecting properly. If we do
exponential backoff when doing these redirects, it gives the
proxies a lot of time to catch up. We also explicitly raise an
error if it's going on too long, instead of giving the user
juts a 'redirected too many times' error.
self.authenticate can return None, in which case you can't subscript.
So move extracting data into the branch checking whether authenticate is
not `None`.
Now that extracting the username is inside the if branch, it can't be
used in the else one, so extract username from the request itself.
This can be easily reproduce with the default PAM login with a wrong
non existing/ wrong username.
- always set content security policy header, to workaround bug in notebook 5.0
- set x-jupyterhub-version on all requests, not just our own
- fix version comparison in _check_version (leftover `__version__`)
- even log version matches at debug-level (verifies that check happened)
- remove reduntant None, allow_none in Any
- remove callable check (if it's not callable, let the error raise)
- let outer error handling deal with failed pre-spawn hook
- add missing `return` in pre_spawn_hook
Waiting for servers to come up and shut down was polled at an even interval of 100ms. If things are slow and busy, this is a lot if waiting events. exponential backoff reduces the number of callbacks triggered by slow spawners.
This may improve the load a bit when there’s a bunch of outstanding spawns.
flag for waiting for the proxy to be updated
avoids User.running being True when the user's server has not yet been added to the proxy,
causing potential redirect loops.
in both directions - Hub checks singleuser server on spawn and singleuser server checks Hub on startup
if minor versions mismatch, log at warning level, otherwise debug
store id on outer User, rather than accessing orm_user.id, which seems to fail sometimes
this may fix the recent increase in intermittent test failures
namedtuple(path, host)
everywhere that accepts a RouteSpec must also accept a string
and treat it as RouteSpec(string).
RouteSpec.as_routespec(spec_or_string) handles this.
only raise ImportError on pamela if PAMAuthenticator is actually used
avoids failure to start in rare cases where pamela is not importable (e.g. broken libpam)
env is often easier to deal with for Spawners
Now, only optional args are passed on the command-line and all required args come from environment variables.
raise UserNotAllowed exception in generic `check_hub_user`
when a user or service is identified and not allowed.
turn it into `HTTPError(403)` in tornado `get_current_user` wrapper,
caching `None` so that subsequent calls don't re-trigger the same error.
pytest appears to close captured FDs prematurely,
causing huge "I/O operation on closed file" tracebacks
whenever tests stop early due to a failure.
This should quiet the extra traceback, though it could potentially silence useful log messages during cleanup in rare cases
allows specifying the connect ip/hostname for the Hub
when it differs from hub_ip (the bind address).
Used when the Hub is not on the same host as the spawners and/or proxy (e.g. docker, kubernetes, etc.)
opt-in option for deleting users that have been invalidated,
e.g. for LocalAuthenticators when system users have been removed and `create_system_users` is False.
Since it’s opt-in, log config to do so when the error is seen and option is not enabled.
OAuth access tokens can only be used to identify users, not perform actions on their behalf, which API tokens do.
Implementing OAuth scopes would allow us to achieve this limitation without separating the two items, but that would be a much bigger change, including having an OAuth "Would you like to grant permissions..." confirmation page.
Simplifies login URL, handler login
- all login redirects go to `settings['login_url']`
- `login_url` is unconditionally `/hub/login`
- `/hub/login` renders form page or 'login with...' button
- enabling auto_login redirects from /hub/login to Authenticator.login_url()
- OAuth access tokens *are* APITokens.
oauth_access_tokens table only stores extra oauth metadata.
- only store hashed client_secret in database,
using HashedCompare to allow comparison.
In some spawners you want to unset .cmd - for example, in
KubeSpawner setting it to None will use the CMD metadata that
is set in the Docker Image. Currently there's no way to set a
None value - you can't set it to [] either. Treating None and
empty values as separate is a useful thing to have.
This makes the logout link more discoverable by screen readers,
which sort links based on what they say. Since our icon was
in front of and not behind 'Logout', someone looking for Logout
will not find this
Fixes Issue #997. Also updated Traitlets to 4.3.2 since the change in singleuser.py relies on trait default values being checked through validator, which was added in traitlets 4.3.2.
Fixes#972. Currently, Jupyterhub actually has a hard requirement on the 4.3 traitlets API, otherwise you'll run into the crash described in that issue for any traitlets version older than that.
/hub/login always renders a page,
whereas `authenticator.login_url` may automatically log the user in via redirects,
causing logout to appear not to work, as redirects immediately cause login again.
- log that external services are added (helps catch accidental external services due to missing fields)
- check connectivity of services with web endpoints periodically
The documentation stated that the key `token` should be used to specify
the pregenerated token in `JupyterHub.services`. This is wrong as the key
should be `api_token`.
This changes the help on the trait, along with changing the module
docstring in `service.py`.
likely cause is `set('string')` typo instead of `set(['string'])`,
so include that in the error message:
whitelist contains single-character names: ['i', 'k', 'm', 'n', 'r']; did you mean set(['ikmnr']) instead of set('ikmnr')?
- Allows us to standardize this on the spawner base class,
so there's a consistent interface for different spawners
to implement this.
- Specify the supported suffixes and various units we accept
for memory and cpu units.
- Standardize the way we expose resource limit / guarantees
to single-user servers
Some Spawners may not need state,
and they should be allowed to resume on Hub restart as well.
Adds some detail about when .poll may be called and how it should behave in less obvious circumstances
.spawn_pending used for the *whole* window, from request to responsive (added to proxy)
.waiting_for_response is just used for the window between Spawner.start returning (process started, http endpoint known) and http endpoint becoming responsive
.waiting_for_response will never be True while .spawn_pending is False
We already added running users, but we didn't handle removing users from the proxy
if the user's server was stopped (e.g. while the Hub was restarting).
Subclasses prior to 0.6 may assume return value
of LocalProcessSpawner.start can be ignored
instead of passing it through.
For these cases, keep setting ip/port in the deprecated way
so that it still works with a warning,
rather than failing with the wrong port.
/api/ is not authenticated, and just reports JupyterHub's version for now.
/api/info is admin-only, and reports more detailed info about Python, authenticators/spawners in use, etc.
service_tokens supersedes api_tokens,
since they now map to a new services collection,
rather than regular Hub usernames.
Services in the ORM have:
- API tokens
- servers (multiple, can be 0)
- pid (0 if not managed)
since we only have generated alembic.ini, present a command that generates one and uses it.
enables generating new revisions with:
python -m jupyterhub.dbutil revision -m msg
should avoid needing to cram user-detection / intent into other endpoints.
That functionality isn't removed,
but warnings are added indicating that /user-redirect/ should be used instead.
I don't like including this, but I don't know a better way to get a starting db
other than doing a complete installation of old JupyterHub in an env.
At least it's small.
call poll_and_notify to ensure triggering of dead-server events in a few places:
- `/hub/home` page view
- user start and stop API endpoints
This should avoid the failure to stop a server that's died uncleanly because the server hasn't noticed yet
The've removed web app config, in favor of codecov.yml,
discarding our existing config,
which means coverage reports are showing up in most Jupyter PRs now.
When you have two JupyterHub windows and log out successfully in one of them. If you try to click the logout button in the other window, you will receive a 500 error.
It happened because there were operations being done in a None user object.
This is the first small part of easing the pain of services,
which is generating the API tokens,
and used to require initializing the JupyterHub database.
This allows deployments to configure environment variables
that need to be different for each user / container (such as
credentials for various services, etc).
it's often buggy and rarely necessary,
so allow it to be disabled when it's causing problems.
It's still on by default for backward-compatibility,
though maybe it shouldn't be.
- Ensures add_user is called as part of startup *for all users*.
This was previously only true for users not already in the db.
- Normalize usernames in whitelist and admin sets
- Call add_user on new users logged in when there is no whitelist.
- Otherwise does not work with MySQL
- Change JSONDict to be TEXT (Unbounded) rather than VARCHAR.
This makes most sense, since you can't index these anyway.
- The 'ip' field in Server is set to 255, since that is the
max allowed length of DNS entries.
- Most of the rest of the Unicodes have approximately high
values that most people should not mostly run into
(famous last words).
If a user, alice, requests /user/bob/notebooks/mynotebook.ipynb,
redirect her to /user/alice/notebooks/mynotebook.ipynb.
Currently, such requests get stuck in a redirect loop because
the request will be redirected to login page with a next parameter
that when followed is again redirected.
When notebook_dir is consistent across users, this will allow
users to share notebook URLs. Fixes#424.
check_routes checks for missing routes for running users.
This is meant for when the proxy has been relaunched outside the Hub.
If spawners are slow to start, it's possible for check_routes to fire in the middle of spawning,
triggering addition of the user's server (which has no defined location yet) to the proxy before it's up.
If the spawning fails, the route will remain indefinitely (because it never should have been added in the first place), and the user will see 503 until their server is launched manually again.
Checking `spawn_pending` in user.running prevents this.
relies on CHP's host-based routing (a feature I didn't add!)
requires wildcard DNS and wildcard SSL for a proper setup
still lots to workout and cleanup in terms of cookies and where to use host, domain, path, but it works locally.
moves `spawn_pending` flag to only around start, not the HTTP wait.
Some Spawners may not know how to poll until start has finished (DockerSpawner).
Let's not require that they do.
Usernames that have an `@'-separated domain component
break JupyterHub when the server expects to see query
strings that contain an `@', when browsers and other
clients send `%40'.
in check_routes, it has been reported that users without a running server are attempted to be added.
So something is wrong, either in sqlalchemy or my understanding of what it does (likely the latter),
because a filter for users with a non-None server is returning at least one result whose server is None.
If you are reporting an issue with JupyterHub, please use the [GitHub issue](https://github.com/jupyterhub/jupyterhub/issues) search feature to check if your issue has been asked already. If it has, please add your comments to the existing issue.
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
- Running `jupyter troubleshoot` from the command line, if possible, and posting
its output would also be helpful.
- Running in `--debug` mode can also be helpful for troubleshooting.
We mainly follow the [IPython Contributing Guide](https://github.com/ipython/ipython/blob/master/CONTRIBUTING.md).
Welcome! As a [Jupyter](https://jupyter.org) project, we follow the [Jupyter contributor guide](https://jupyter.readthedocs.io/en/latest/contributor/content-contributor.html).
## Set up your development system
For a development install, clone the [repository](https://github.com/jupyterhub/jupyterhub)
Notes on the `pip` command used in the installation directions below:
-`sudo` may be needed for `pip install`, depending on the user's filesystem permissions.
- JupyterHub requires Python >= 3.3, so `pip3` may be required on some machines for package installation instead of `pip` (especially when both Python 2 and Python 3 are installed on a machine). If `pip3` is not found, install it using (on Linux Debian/Ubuntu):
Basic principles for operation are:
sudo apt-get install python3-pip
- Hub launches a proxy.
- Proxy forwards all requests to Hub by default.
- Hub handles login, and spawns single-user servers on demand.
- Hub configures proxy to forward url prefixes to the single-user notebook
For example, install it on Linux (Debian/Ubuntu) using:
### Development install
```
sudo apt-get install npm nodejs-legacy
```
For a development install, clone the repository and then install from source:
The `nodejs-legacy` package installs the `node` executable and is currently
required for npm to work on Debian/Ubuntu.
git clone https://github.com/jupyter/jupyterhub
cd jupyterhub
pip3 install -r dev-requirements.txt -e .
- TLS certificate and key for HTTPS communication
- Domain name
If the `pip3 install` command fails and complains about `lessc` being unavailable, you may need to explicitly install some additional JavaScript dependencies:
### Install packages
npm install
#### Using `conda`
This will fetch client-side JavaScript dependencies necessary to compile CSS.
To install JupyterHub along with its dependencies including nodejs/npm:
You may also need to manually update JavaScript and CSS after some development updates, with:
- [Documentation for JupyterHub](https://jupyterhub.readthedocs.io/en/latest/) | [PDF (latest)](https://media.readthedocs.org/pdf/jupyterhub/latest/jupyterhub.pdf) | [PDF (stable)](https://media.readthedocs.org/pdf/jupyterhub/stable/jupyterhub.pdf)
- [Documentation for JupyterHub's REST API](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyter/jupyterhub/master/docs/rest-api.yml#/default)
- [Documentation for Project Jupyter](http://jupyter.readthedocs.io/en/latest/index.html) | [PDF](https://media.readthedocs.org/pdf/jupyter/latest/jupyter.pdf)
- [Project Jupyter website](https://jupyter.org)
- [Documentation for JupyterHub](http://jupyterhub.readthedocs.org/en/latest/) [[PDF](https://media.readthedocs.org/pdf/jupyterhub/latest/jupyterhub.pdf)]
- [Documentation for Project Jupyter](http://jupyter.readthedocs.org/en/latest/index.html) [[PDF](https://media.readthedocs.org/pdf/jupyter/latest/jupyter.pdf)]
description:Used by single-user servers to hand off cookie authentication to the Hub
description:Used by single-user notebook servers to hand off cookie authentication to the Hub
parameters:
- name:cookie_name
in:path
@@ -229,13 +509,94 @@ paths:
'200':
description:The user identified by the cookie
schema:
$ref:'#!/definitions/User'
$ref:'#/definitions/User'
'404':
description:A user is not found.
/oauth2/authorize:
get:
summary:'OAuth 2.0 authorize endpoint'
description:|
Redirect users to this URL to begin the OAuth process.
It is not an API endpoint.
parameters:
- name:client_id
description:The client id
in:query
required:true
type:string
- name:response_type
description:The response type (always 'code')
in:query
required:true
type:string
- name:state
description:A state string
in:query
required:false
type:string
- name:redirect_uri
description:The redirect url
in:query
required:true
type:string
/oauth2/token:
post:
summary:Request an OAuth2 token
description:|
Request an OAuth2 token from an authorization code.
This request completes the OAuth process.
consumes:
- application/x-www-form-urlencoded
parameters:
- name:client_id
description:The client id
in:form
required:true
type:string
- name:client_secret
description:The client secret
in:form
required:true
type:string
- name:grant_type
description:The grant type (always 'authorization_code')
in:form
required:true
type:string
- name:code
description:The code provided by the authorization redirect
in:form
required:true
type:string
- name:redirect_uri
description:The redirect url
in:form
required:true
type:string
responses:
'200':
description:JSON response including the token
schema:
type:object
properties:
access_token:
type:string
description:The new API token for the user
token_type:
type:string
description:Will always be 'Bearer'
/shutdown:
post:
summary:Shutdown the Hub
responses:
'200':
description:Hub has shutdown
parameters:
- name:proxy
in:body
type:boolean
description:Whether the proxy should be shutdown as well (default from Hub config)
- name:servers
in:body
type:boolean
description:Whether users' notebook servers should be shutdown as well (default from Hub config)
definitions:
User:
type:object
@@ -246,14 +607,133 @@ definitions:
admin:
type:boolean
description:Whether the user is an admin
groups:
type:array
description:The names of groups where this user is a member
items:
type:string
server:
type:string
description:The user's server's base URL, if running; null if not.
description:The user's notebook server's base URL, if running; null if not.
pending:
type:string
enum:["spawn","stop"]
enum:["spawn","stop",null]
description:The currently pending action, if any
last_activity:
type:string
format:ISO8601 Timestamp
format:date-time
description:Timestamp of last-seen activity from the user
servers:
type:object
description:The active servers for this user.
items:
schema:
$ref:'#/definitions/Server'
Server:
type:object
properties:
name:
type:string
description:The server's name. The user's default server has an empty name ('')
ready:
type:boolean
description:|
Whether the server is ready for traffic.
Will always be false when any transition is pending.
pending:
type:string
enum:["spawn","stop",null]
description:|
The currently pending action, if any.
A server is not ready if an action is pending.
url:
type:string
description:|
The URL where the server can be accessed
(typically /user/:name/:server.name/).
progress_url:
type:string
description:|
The URL for an event-stream to retrieve events during a spawn.
started:
type:string
format:date-time
description:UTC timestamp when the server was last started.
last_activity:
type:string
format:date-time
description:UTC timestamp last-seen activity on this server.
state:
type:object
description:Arbitrary internal state from this server's spawner. Only available on the hub's users list or get-user-by-name method, and only if a hub admin. None otherwise.
Group:
type:object
properties:
name:
type:string
description:The group's name
users:
type:array
description:The names of users who are members of this group
items:
type:string
Service:
type:object
properties:
name:
type:string
description:The service's name
admin:
type:boolean
description:Whether the service is an admin
url:
type:string
description:The internal url where the service is running
prefix:
type:string
description:The proxied URL prefix to the service's url
pid:
type:number
description:The PID of the service process (if managed)
command:
type:array
description:The command used to start the service (if managed)
items:
type:string
info:
type:object
description:|
Additional information a deployment can attach to a service.
JupyterHub does not use this field.
Token:
type:object
properties:
token:
type:string
description:The token itself. Only present in responses to requests for a new token.
id:
type:string
description:The id of the API token. Used for modifying or deleting the token.
user:
type:string
description:The user that owns a token (undefined if owned by a service)
service:
type:string
description:The service that owns the token (undefined of owned by a user)
note:
type:string
description:A note about the token, typically describing what it was created for.
created:
type:string
format:date-time
description:Timestamp when this token was created
expires_at:
type:string
format:date-time
description:Timestamp when this token expires. Null if there is no expiry.
JupyterHub also provides a REST API for administration of the Hub and users.
The documentation on `Using JupyterHub's REST API <../reference/rest.html>`_ provides
information on:
- what you can do with the API
- creating an API token
- adding API tokens to the config files
- making an API request programmatically using the requests library
- learning more about JupyterHub's API
The same JupyterHub API spec, as found here, is available in an interactive form
`here (on swagger's petstore) <http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/docs/rest-api.yml#!/default>`__.
The `OpenAPI Initiative`_ (fka Swagger™) is a project used to describe
For detailed changes from the prior release, click on the version number, and
its link will bring up a GitHub listing of changes. Use `git log` on the
command line for details.
## [Unreleased]
## 0.9
### [0.9.2] 2018-08-10
JupyterHub 0.9.2 contains small bugfixes and improvements.
- Documentation and example improvements
- Add `Spawner.consecutive_failure_limit` config for aborting the Hub if too many spawns fail in a row.
- Fix for handling SIGTERM when run with asyncio (tornado 5)
- Windows compatibility fixes
### [0.9.1] 2018-07-04
JupyterHub 0.9.1 contains a number of small bugfixes on top of 0.9.
- Use a PID file for the proxy to decrease the likelihood that a leftover proxy process will prevent JupyterHub from restarting
-`c.LocalProcessSpawner.shell_cmd` is now configurable
- API requests to stopped servers (requests to the hub for `/user/:name/api/...`) fail with 404 rather than triggering a restart of the server
- Compatibility fix for notebook 5.6.0 which will introduce further
security checks for local connections
- Managed services always use localhost to talk to the Hub if the Hub listening on all interfaces
- When using a URL prefix, the Hub route will be `JupyterHub.base_url` instead of unconditionally `/`
- additional fixes and improvements
### [0.9.0] 2018-06-15
JupyterHub 0.9 is a major upgrade of JupyterHub.
There are several changes to the database schema,
so make sure to backup your database and run:
jupyterhub upgrade-db
after upgrading jupyterhub.
The biggest change for 0.9 is the switch to asyncio coroutines everywhere
instead of tornado coroutines. Custom Spawners and Authenticators are still
free to use tornado coroutines for async methods, as they will continue to
work. As part of this upgrade, JupyterHub 0.9 drops support for Python <3.5
andtornado<5.0.
#### Changed
-RequirePython>= 3.5
- Require tornado >= 5.0
- Use asyncio coroutines throughout
- Set status 409 for conflicting actions instead of 400,
e.g. creating users or groups that already exist.
- timestamps in REST API continue to be UTC, but now include 'Z' suffix
to identify them as such.
- REST API User model always includes `servers` dict,
not just when named servers are enabled.
-`server` info is no longer available to oauth identification endpoints,
only user info and group membership.
-`User.last_activity` may be None if a user has not been seen,
rather than starting with the user creation time
which is now separately stored as `User.created`.
- static resources are now found in `$PREFIX/share/jupyterhub` instead of `share/jupyter/hub` for improved consistency.
- Deprecate `.extra_log_file` config. Use pipe redirection instead:
jupyterhub &>> /var/log/jupyterhub.log
- Add `JupyterHub.bind_url` config for setting the full bind URL of the proxy.
Sets ip, port, base_url all at once.
- Add `JupyterHub.hub_bind_url` for setting the full host+port of the Hub.
`hub_bind_url` supports unix domain sockets, e.g.
`unix+http://%2Fsrv%2Fjupyterhub.sock`
- Deprecate `JupyterHub.hub_connect_port` config in favor of `JupyterHub.hub_connect_url`. `hub_connect_ip` is not deprecated
and can still be used in the common case where only the ip address of the hub differs from the bind ip.
#### Added
- Spawners can define a `.progress` method which should be an async generator.
The generator should yield events of the form:
```python
{
"message": "some-state-message",
"progress": 50,
}
```
These messages will be shown with a progress bar on the spawn-pending page.
The `async_generator` package can be used to make async generators
compatible with Python 3.5.
- track activity of individual API tokens
- new REST API for managing API tokens at `/hub/api/user/tokens[/token-id]`
- allow viewing/revoking tokens via token page
- User creation time is available in the REST API as `User.created`
- Server start time is stored as `Server.started`
- `Spawner.start` may return a URL for connecting to a notebook instead of `(ip, port)`. This enables Spawners to launch servers that setup their own HTTPS.
- Optimize database performance by disabling sqlalchemy expire_on_commit by default.
- Add `python -m jupyterhub.dbutil shell` entrypoint for quickly
launching an IPython session connected to your JupyterHub database.
- Include `User.auth_state` in user model on single-user REST endpoints for admins only.
- Include `Server.state` in server model on REST endpoints for admins only.
- Add `Authenticator.blacklist` for blacklisting users instead of whitelisting.
- Pass `c.JupyterHub.tornado_settings['cookie_options']` down to Spawners
so that cookie options (e.g. `expires_days`) can be set globally for the whole application.
- SIGINFO (`ctrl-t`) handler showing the current status of all running threads,
coroutines, and CPU/memory/FD consumption.
- Add async `Spawner.get_options_form` alternative to `.options_form`, so it can be a coroutine.
- Add `JupyterHub.redirect_to_server` config to govern whether
users should be sent to their server on login or the JuptyerHub home page.
- html page templates can be more easily customized and extended.
- Allow registering external OAuth clients for using the Hub as an OAuth provider.
- Add basic prometheus metrics at `/hub/metrics` endpoint.
- Add session-id cookie, enabling immediate revocation of login tokens.
- Authenticators may specify that users are admins by specifying the `admin` key when return the user model as a dict.
- Added "Start All" button to admin page for launching all user servers at once.
- Services have an `info` field which is a dictionary.
This is accessible via the REST API.
- `JupyterHub.extra_handlers` allows defining additonal tornado RequestHandlers attached to the Hub.
- API tokens may now expire.
Expiry is available in the REST model as `expires_at`,
and settable when creating API tokens by specifying `expires_in`.
#### Fixed
- Remove green from theme to improve accessibility
- Fix error when proxy deletion fails due to route already being deleted
- clear `?redirects` from URL on successful launch
- disable send2trash by default, which is rarely desirable for jupyterhub
- Put PAM calls in a thread so they don't block the main application
in cases where PAM is slow (e.g. LDAP).
- Remove implicit spawn from login handler,
instead relying on subsequent request for `/user/:name` to trigger spawn.
- Fixed several inconsistencies for initial redirects,
depending on whether server is running or not and whether the user is logged in or not.
- Admin requests for `/user/:name` (when admin-access is enabled) launch the right server if it's not running instead of redirecting to their own.
- Major performance improvement starting up JupyterHub with many users,
especially when most are inactive.
- Various fixes in race conditions and performance improvements with the default proxy.
- Fixes for CORS headers
- Stop setting `.form-control` on spawner form inputs unconditionally.
- Better recovery from database errors and database connection issues
without having to restart the Hub.
- Fix handling of `~` character in usernames.
- Fix jupyterhub startup when `getpass.getuser()` would fail,
e.g. due to missing entry in passwd file in containers.
## 0.8
### [0.8.1] 2017-11-07
JupyterHub 0.8.1 is a collection of bugfixes and small improvements on 0.8.
#### Added
- Run tornado with AsyncIO by default
- Add `jupyterhub --upgrade-db` flag for automatically upgrading the database as part of startup.
This is useful for cases where manually running `jupyterhub upgrade-db`
as a separate step is unwieldy.
- Avoid creating backups of the database when no changes are to be made by
`jupyterhub upgrade-db`.
#### Fixed
- Add some further validation to usernames - `/` is not allowed in usernames.
- Fix empty logout page when using auto_login
- Fix autofill of username field in default login form.
- Fix listing of users on the admin page who have not yet started their server.
- Fix ever-growing traceback when re-raising Exceptions from spawn failures.
- Remove use of deprecated `bower` for javascript client dependencies.
### [0.8.0] 2017-10-03
JupyterHub 0.8 is a big release!
Perhaps the biggest change is the use of OAuth to negotiate authentication
between the Hub and single-user services.
Due to this change, it is important that the single-user server
and Hub are both running the same version of JupyterHub.
If you are using containers (e.g. via DockerSpawner or KubeSpawner),
this means upgrading jupyterhub in your user images at the same time as the Hub.
In most cases, a
pip install jupyterhub==version
in your Dockerfile is sufficient.
#### Added
- JupyterHub now defined a `Proxy` API for custom
proxy implementations other than the default.
The defaults are unchanged,
but configuration of the proxy is now done on the `ConfigurableHTTPProxy` class instead of the top-level JupyterHub.
TODO: docs for writing a custom proxy.
- Single-user servers and services
(anything that uses HubAuth)
can now accept token-authenticated requests via the Authentication header.
- Authenticators can now store state in the Hub's database.
To do so, the `authenticate` method should return a dict of the form
```python
{
'username': 'name',
'state': {}
}
```
This data will be encrypted and requires `JUPYTERHUB_CRYPT_KEY` environment variable to be set
and the `Authenticator.enable_auth_state` flag to be True.
If these are not set, auth_state returned by the Authenticator will not be stored.
- There is preliminary support for multiple (named) servers per user in the REST API.
Named servers can be created via API requests, but there is currently no UI for managing them.
- Add `LocalProcessSpawner.popen_kwargs` and `LocalProcessSpawner.shell_cmd`
for customizing how user server processes are launched.
- Add `Authenticator.auto_login` flag for skipping the "Login with..." page explicitly.
- Add `JupyterHub.hub_connect_ip` configuration
for the ip that should be used when connecting to the Hub.
This is promoting (and deprecating) `DockerSpawner.hub_ip_connect`
for use by all Spawners.
- Add `Spawner.pre_spawn_hook(spawner)` hook for customizing
pre-spawn events.
- Add `JupyterHub.active_server_limit` and `JupyterHub.concurrent_spawn_limit`
for limiting the total number of running user servers and the number of pending spawns, respectively.
#### Changed
- more arguments to spawners are now passed via environment variables (`.get_env()`)
rather than CLI arguments (`.get_args()`)
- internally generated tokens no longer get extra hash rounds,
significantly speeding up authentication.
The hash rounds were deemed unnecessary because the tokens were already
generated with high entropy.
- `JUPYTERHUB_API_TOKEN` env is available at all times,
rather than being removed during single-user start.
The token is now accessible to kernel processes,
enabling user kernels to make authenticated API requests to Hub-authenticated services.
- Cookie secrets should be 32B hex instead of large base64 secrets.
- pycurl is used by default, if available.
#### Fixed
So many things fixed!
- Collisions are checked when users are renamed
- Fix bug where OAuth authenticators could not logout users
due to being redirected right back through the login process.
- If there are errors loading your config files,
JupyterHub will refuse to start with an informative error.
Previously, the bad config would be ignored and JupyterHub would launch with default configuration.
- Raise 403 error on unauthorized user rather than redirect to login,
which could cause redirect loop.
- Set `httponly` on cookies because it's prudent.
- Improve support for MySQL as the database backend
- Many race conditions and performance problems under heavy load have been fixed.
- Fix alembic tagging of database schema versions.
#### Removed
- End support for Python 3.3
## 0.7
### [0.7.2] - 2017-01-09
#### Added
- Support service environment variables and defaults in `jupyterhub-singleuser`
for easier deployment of notebook servers as a Service.
- Add `--group` parameter for deploying `jupyterhub-singleuser` as a Service with group authentication.
- Include URL parameters when redirecting through `/user-redirect/`
### Fixed
- Fix group authentication for HubAuthenticated services
### [0.7.1] - 2017-01-02
#### Added
- `Spawner.will_resume` for signaling that a single-user server is paused instead of stopped.
This is needed for cases like `DockerSpawner.remove_containers = False`,
where the first API token is re-used for subsequent spawns.
- Warning on startup about single-character usernames,
caused by common `set('string')` typo in config.
#### Fixed
- Removed spurious warning about empty `next_url`, which is AOK.
### [0.7.0] - 2016-12-2
#### Added
- Implement Services API [\#705](https://github.com/jupyterhub/jupyterhub/pull/705)
- Add `/api/` and `/api/info` endpoints [\#675](https://github.com/jupyterhub/jupyterhub/pull/675)
- Add documentation for JupyterLab, pySpark configuration, troubleshooting,
and more.
- Add logging of error if adding users already in database. [\#689](https://github.com/jupyterhub/jupyterhub/pull/689)
- Add HubAuth class for authenticating with JupyterHub. This class can
be used by any application, even outside tornado.
- Add user groups.
- Add `/hub/user-redirect/...` URL for redirecting users to a file on their own server.
#### Changed
- Always install with setuptools but not eggs (effectively require
- statsd is an optional dependency, only needed if in use
- Notice more quickly when servers have crashed
- Better error pages for proxy errors
- Add Stop All button to admin panel for stopping all servers at once
### [0.6.0] - 2016-04-25
- JupyterHub has moved to a new `jupyterhub` namespace on GitHub and Docker. What was `juptyer/jupyterhub` is now `jupyterhub/jupyterhub`, etc.
- `jupyterhub/jupyterhub` image on DockerHub no longer loads the jupyterhub_config.py in an ONBUILD step. A new `jupyterhub/jupyterhub-onbuild` image does this
- Add statsd support, via `c.JupyterHub.statsd_{host,port,prefix}`
- Update to traitlets 4.1 `@default`, `@observe` APIs for traits
- Allow disabling PAM sessions via `c.PAMAuthenticator.open_sessions = False`. This may be needed on SELinux-enabled systems, where our PAM session logic often does not work properly
- Add `Spawner.environment` configurable, for defining extra environment variables to load for single-user servers
- JupyterHub API tokens can be pregenerated and loaded via `JupyterHub.api_tokens`, a dict of `token: username`.
- JupyterHub API tokens can be requested via the REST API, with a POST request to `/api/authorizations/token`.
This can only be used if the Authenticator has a username and password.
- Various fixes for user URLs and redirects
## [0.5] - 2016-03-07
- Single-user server must be run with Jupyter Notebook ≥ 4.0
- Require `--no-ssl` confirmation to allow the Hub to be run without SSL (e.g. behind SSL termination in nginx)
- Add lengths to text fields for MySQL support
- Add `Spawner.disable_user_config` for preventing user-owned configuration from modifying single-user servers.
- Fixes for MySQL support.
- Add ability to run each user's server on its own subdomain. Requires wildcard DNS and wildcard SSL to be feasible. Enable subdomains by setting `JupyterHub.subdomain_host = 'https://jupyterhub.domain.tld[:port]'`.
- Use `127.0.0.1` for local communication instead of `localhost`, avoiding issues with DNS on some systems.
- Fix race that could add users to proxy prematurely if spawning is slow.
## 0.4
### 0.4.1
### [0.4.1] - 2016-02-03
Fix removal of `/login` page in 0.4.0, breaking some OAuth providers.
### 0.4.0
### [0.4.0] - 2016-02-01
- Add `Spawner.user_options_form` for specifying an HTML form to present to users,
allowing users to influence the spawning of their own servers.
@@ -19,7 +383,7 @@ Fix removal of `/login` page in 0.4.0, breaking some OAuth providers.
- 0.4 will be the last JupyterHub release where single-user servers running IPython 3 is supported instead of Notebook ≥ 4.0.
## 0.3
## [0.3] - 2015-11-04
- No longer make the user starting the Hub an admin
- start PAM sessions on login
@@ -27,13 +391,30 @@ Fix removal of `/login` page in 0.4.0, breaking some OAuth providers.
allowing deeper interaction between Spawner/Authenticator pairs.
- login redirect fixes
## 0.2
## [0.2] - 2015-07-12
- Based on standalone traitlets instead of IPython.utils.traitlets
- [Press release on Jupyter and Cori](http://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2016/jupyter-notebooks-will-open-up-new-possibilities-on-nerscs-cori-supercomputer/)
- [Moving and sharing data](https://www.nersc.gov/assets/Uploads/03-MovingAndSharingData-Cholia.pdf)
- [Research IT](http://research-it.berkeley.edu)
- [JupyterHub server supports campus research computation](http://research-it.berkeley.edu/blog/17/01/24/free-fully-loaded-jupyterhub-server-supports-campus-research-computation)
### University of California Davis
- [Spinning up multiple Jupyter Notebooks on AWS for a tutorial](https://github.com/mblmicdiv/course2017/blob/master/exercises/sourmash-setup.md)
Although not technically a JupyterHub deployment, this tutorial setup
may be helpful to others in the Jupyter community.
Thank you C. Titus Brown for sharing this with the Software Carpentry
mailing list.
```
* I started a big Amazon machine;
* I installed Docker and built a custom image containing my software of
interest;
* I ran multiple containers, one connected to port 8000, one on 8001,
etc. and gave each student a different port;
* students could connect in and use the Terminal program in Jupyter to
execute commands, and could upload/download files via the Jupyter
console interface;
* in theory I could have used notebooks too, but for this I didn’t have
need.
I am aware that JupyterHub can probably do all of this including manage
the containers, but I’m still a bit shy of diving into that; this was
fairly straightforward, gave me disposable containers that were isolated
for each individual student, and worked almost flawlessly. Should be
easy to do with RStudio too.
```
### Cal Poly San Luis Obispo
- [jupyterhub-deploy-teaching](https://github.com/jupyterhub/jupyterhub-deploy-teaching) based on work by Brian Granger for Cal Poly's Data Science 301 Course
### Clemson University
- Advanced Computing
- [Palmetto cluster and JupyterHub](http://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
### University of Colorado Boulder
- (CU Research Computing) CURC
- [JupyterHub User Guide](https://www.rc.colorado.edu/support/user-guide/jupyterhub.html)
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
- Profiles in IPython Clusters tab
- [Parallel Processing with JupyterHub tutorial](https://www.rc.colorado.edu/support/examples-and-tutorials/parallel-processing-with-jupyterhub.html)
- [Parallel Programming with JupyterHub document](https://www.rc.colorado.edu/book/export/html/833)
- Earth Lab at CU
- [Tutorial on Parallel R on JupyterHub](https://earthdatascience.org/tutorials/parallel-r-on-jupyterhub/)
### HTCondor
- [HTCondor Python Bindings Tutorial from HTCondor Week 2017 includes information on their JupyterHub tutorials](https://research.cs.wisc.edu/htcondor/HTCondorWeek2017/presentations/TueBockelman_Python.pdf)
### University of Illinois
- https://datascience.business.illinois.edu
### MIT and Lincoln Labs
### Michigan State University
- [Setting up JupyterHub](https://mediaspace.msu.edu/media/Setting+Up+Your+JupyterHub+Password/1_hgv13aag/11980471)
- [Using Tensorflow and JupyterHub in Classrooms](https://cloud.google.com/solutions/using-tensorflow-jupyterhub-classrooms)
- [using-tensorflow-and-jupyterhub blog post](https://opensource.googleblog.com/2016/10/using-tensorflow-and-jupyterhub.html)
### Everware
[Everware](https://github.com/everware) Reproducible and reusable science powered by jupyterhub and docker. Like nbviewer, but executable. CERN, Geneva [website](http://everware.xyz/)
This document describes some of the basics of configuring JupyterHub to do what you want.
JupyterHub is highly customizable, so there's a lot to cover.
## Installation
See [the readme](https://github.com/jupyter/jupyterhub/blob/master/README.md) for help installing JupyterHub.
## Overview
JupyterHub is a set of processes that together provide a multiuser Jupyter Notebook server.
There are three main categories of processes run by the `jupyterhub` command line program:
- **Single User Server**: a dedicated, single-user, Jupyter Notebook is started for each user on the system
when they log in. The object that starts these processes is called a Spawner.
- **Proxy**: the public facing part of the server that uses a dynamic proxy to route HTTP requests
to the Hub and Single User Servers.
- **Hub**: manages user accounts and authentication and coordinates Single Users Servers using a Spawner.
## JupyterHub's default behavior
To start JupyterHub in its default configuration, type the following at the command line:
sudo jupyterhub
The default Authenticator that ships with JupyterHub authenticates users
with their system name and password (via [PAM][]).
Any user on the system with a password will be allowed to start a single-user notebook server.
The default Spawner starts servers locally as each user, one dedicated server per user.
These servers listen on localhost, and start in the given user's home directory.
By default, the **Proxy** listens on all public interfaces on port 8000.
Thus you can reach JupyterHub through either:
http://localhost:8000
or any other public IP or domain pointing to your system.
In their default configuration, the other services, the **Hub** and **Single-User Servers**,
all communicate with each other on localhost only.
**NOTE:** In its default configuration, JupyterHub runs without SSL encryption (HTTPS).
You should not run JupyterHub without SSL encryption on a public network.
See [Security documentation](#Security) for how to configure JupyterHub to use SSL.
By default, starting JupyterHub will write two files to disk in the current working directory:
-`jupyterhub.sqlite` is the sqlite database containing all of the state of the **Hub**.
This file allows the **Hub** to remember what users are running and where,
as well as other information enabling you to restart parts of JupyterHub separately.
-`jupyterhub_cookie_secret` is the encryption key used for securing cookies.
This file needs to persist in order for restarting the Hub server to avoid invalidating cookies.
Conversely, deleting this file and restarting the server effectively invalidates all login cookies.
The cookie secret file is discussed in the [Cookie Secret documentation](#Cookie secret).
The location of these files can be specified via configuration, discussed below.
## How to configure JupyterHub
JupyterHub is configured in two ways:
1. Command-line arguments
2. Configuration files
Type the following for brief information about the command line arguments:
jupyterhub -h
or:
jupyterhub --help-all
for the full command line help.
By default, JupyterHub will look for a configuration file (can be missing)
named `jupyterhub_config.py` in the current working directory.
You can create an empty configuration file with
jupyterhub --generate-config
This empty configuration file has descriptions of all configuration variables and their default
values. You can load a specific config file with:
jupyterhub -f /path/to/jupyterhub_config.py
See also: [general docs](http://ipython.org/ipython-doc/dev/development/config.html)
on the config system Jupyter uses.
## Networking
### Configuring the Proxy's IP address and port
The Proxy's main IP address setting determines where JupyterHub is available to users.
By default, JupyterHub is configured to be available on all network interfaces
(`''`) on port 8000. **Note**: Use of `'*'` is discouraged for IP configuration;
instead, use of `'0.0.0.0'` is preferred.
Changing the IP address and port can be done with the following command line
arguments:
jupyterhub --ip=192.168.1.2 --port=443
Or by placing the following lines in a configuration file:
```python
c.JupyterHub.ip='192.168.1.2'
c.JupyterHub.port=443
```
Port 443 is used as an example since 443 is the default port for SSL/HTTPS.
Configuring only the main IP and port of JupyterHub should be sufficient for most deployments of JupyterHub.
However, more customized scenarios may need additional networking details to
be configured.
### Configuring the Proxy's REST API communication IP address and port (optional)
The Hub service talks to the proxy via a REST API on a secondary port,
whose network interface and port can be configured separately.
By default, this REST API listens on port 8081 of localhost only.
If running the Proxy separate from the Hub,
configure the REST API communication IP address and port with:
```python
# ideally a private network address
c.JupyterHub.proxy_api_ip='10.0.1.4'
c.JupyterHub.proxy_api_port=5432
```
### Configuring the Hub if Spawners or Proxy are remote or isolated in containers
The Hub service also listens only on localhost (port 8080) by default.
The Hub needs needs to be accessible from both the proxy and all Spawners.
When spawning local servers, an IP address setting of localhost is fine.
If *either* the Proxy *or* (more likely) the Spawners will be remote or
isolated in containers, the Hub must listen on an IP that is accessible.
```python
c.JupyterHub.hub_ip='10.0.1.4'
c.JupyterHub.hub_port=54321
```
## Security
Security is the most important aspect of configuring Jupyter. There are three main aspects of the
security configuration:
1. SSL encryption (to enable HTTPS)
2. Cookie secret (a key for encrypting browser cookies)
3. Proxy authentication token (used for the Hub and other services to authenticate to the Proxy)
## SSL encryption
Since JupyterHub includes authentication and allows arbitrary code execution, you should not run
it without SSL (HTTPS). This will require you to obtain an official, trusted SSL certificate or
create a self-signed certificate. Once you have obtained and installed a key and certificate you
need to specify their locations in the configuration file as follows:
```python
c.JupyterHub.ssl_key='/path/to/my.key'
c.JupyterHub.ssl_cert='/path/to/my.cert'
```
It is also possible to use letsencrypt (https://letsencrypt.org/) to obtain a free, trusted SSL
certificate. If you run letsencrypt using the default options, the needed configuration is (replace `your.domain.com` by your fully qualified domain name):
In most deployments of JupyterHub, you should point this to a secure location on the file
system, such as `/srv/jupyterhub/cookie_secret`. If the cookie secret file doesn't exist when
the Hub starts, a new cookie secret is generated and stored in the file. The recommended
permissions for the cookie secret file should be 600 (owner-only rw).
If you would like to avoid the need for files, the value can be loaded in the Hub process from
the `JPY_COOKIE_SECRET` environment variable:
```bash
exportJPY_COOKIE_SECRET=`openssl rand -hex 1024`
```
For security reasons, this environment variable should only be visible to the Hub.
## Proxy authentication token
The Hub authenticates its requests to the Proxy using a secret token that the Hub and Proxy agree upon. The value of this string should be a random string (for example, generated by `openssl rand -hex 32`). You can pass this value to the Hub and Proxy using either the `CONFIGPROXY_AUTH_TOKEN` environment variable:
If you don't set the Proxy authentication token, the Hub will generate a random key itself, which
means that any time you restart the Hub you **must also restart the Proxy**. If the proxy is a
subprocess of the Hub, this should happen automatically (this is the default configuration).
Another time you must set the Proxy authentication token yourself is if you want other services, such as [nbgrader](https://github.com/jupyter/nbgrader) to also be able to connect to the Proxy.
## Configuring authentication
The default Authenticator uses [PAM][] to authenticate system users with their username and password.
The default behavior of this Authenticator is to allow any user with an account and password on the system to login.
You can restrict which users are allowed to login with `Authenticator.whitelist`:
A [Spawner][] starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
- start the process
- poll whether the process is still running
- stop the process
## Examples
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
Some examples include:
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
*`dockerspawner.DockerSpawner` for spawning identical Docker containers for
each users
*`dockerspawner.SystemUserSpawner` for spawning Docker containers with an
environment and home directory for each users
* both `DockerSpawner` and `SystemUserSpawner` also work with Docker Swarm for
launching containers on remote machines
- [SudoSpawner](https://github.com/jupyterhub/sudospawner) enables JupyterHub to
run without being root, by spawning an intermediate process via `sudo`
- [BatchSpawner](https://github.com/jupyterhub/batchspawner) for spawning remote
servers using batch systems
- [RemoteSpawner](https://github.com/zonca/remotespawner) to spawn notebooks
and a remote server and tunnel the port via SSH
## Spawner control methods
### Spawner.start
`Spawner.start` should start the single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
The return value of `Spawner.start` should be the (ip, port) of the running server.
**NOTE:** When writing coroutines, *never*`yield` in between a database change and a commit.
Most `Spawner.start` functions will look similar to this example:
```python
defstart(self):
self.ip='127.0.0.1'
self.port=random_port()
# get environment variables,
# several of which are required for configuring the single-user server
env=self.get_env()
cmd=[]
# get jupyterhub command to run,
# typically ['jupyterhub-singleuser']
cmd.extend(self.cmd)
cmd.extend(self.get_args())
yieldself._actually_start_server_somehow(cmd,env)
return(self.ip,self.port)
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
### Spawner.poll
`Spawner.poll` should check if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
single-user notebook servers. To do this task, a Spawner may need to persist
some information that can be restored later.
A JSON-able dictionary of state can be used to store persisted information.
Unlike start, stop, and poll methods, the state methods must not be coroutines.
For the single-process case, the Spawner state is only the process ID of the server:
```python
defget_state(self):
"""get the current state"""
state=super().get_state()
ifself.pid:
state['pid']=self.pid
returnstate
defload_state(self,state):
"""load state from the database"""
super().load_state(state)
if'pid'instate:
self.pid=state['pid']
defclear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid=0
```
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:

If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyterhub/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
```python
{
'integer':['5'],
'text':['some text'],
'select':['a','b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
```python
defoptions_from_form(self,formdata):
options={}
options['integer']=int(formdata['integer'][0])# single integer value
options['text']=formdata['text'][0]# single string value
options['select']=formdata['select']# list already correct
options['notinform']='extra info'# not in the form at all
returnoptions
```
which would return:
```python
{
'integer':5,
'text':'some text',
'select':['a','b'],
'notinform':'extra info',
}
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
`Spawner.start` should start the single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
When `Spawner.start` returns, it should have stored the IP and port
of the single-user server in `self.user.server`.
**NOTE:** When writing coroutines, *never*`yield` in between a database change and a commit.
Most `Spawner.start` functions will look similar to this example:
```python
defstart(self):
self.user.server.ip='localhost'# or other host or IP address, as seen by the Hub
self.user.server.port=1234# port selected somehow
self.db.commit()# always commit before yield, if modifying db values
yieldself._actually_start_server_somehow()
```
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
### Spawner.poll
`Spawner.poll` should check if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
single-user notebook servers. To do this task, a Spawner may need to persist
some information that can be restored later.
A JSON-able dictionary of state can be used to store persisted information.
Unlike start, stop, and poll methods, the state methods must not be coroutines.
For the single-process case, the Spawner state is only the process ID of the server:
```python
defget_state(self):
"""get the current state"""
state=super().get_state()
ifself.pid:
state['pid']=self.pid
returnstate
defload_state(self,state):
"""load state from the database"""
super().load_state(state)
if'pid'instate:
self.pid=state['pid']
defclear_state(self):
"""clear any state (called after shutdown)"""
super().clear_state()
self.pid=0
```
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:

If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyter/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
```python
{
'integer':['5'],
'text':['some text'],
'select':['a','b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
```python
defoptions_from_form(self,formdata):
options={}
options['integer']=int(formdata['integer'][0])# single integer value
options['text']=formdata['text'][0]# single string value
options['select']=formdata['select']# list already correct
options['notinform']='extra info'# not in the form at all
returnoptions
```
which would return:
```python
{
'integer':5,
'text':'some text',
'select':['a','b'],
'notinform':'extra info',
}
```
When `Spawner.spawn` is called, this dictionary is accessible as `self.user_options`.
[Cloudera documentation for configuring spark on YARN applications](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps)
provides additional information. The [pySpark configuration documentation](https://spark.apache.org/docs/0.9.0/configuration.html)
is also helpful for programmatic configuration examples.
### How do I use JupyterLab's prerelease version with JupyterHub?
While JupyterLab is still under active development, we have had users
ask about how to try out JupyterLab with JupyterHub.
You need to install and enable the JupyterLab extension system-wide,
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.