main effect: timeout is 30 seconds instead of 5, but the principle matches better
in that we expect things to be happening behind the scenes,
rather than asserting something about the _current_ state when the expect is called
`test_services::test_proxy_service` is failing in some cases because a request is being made to `/@/space%20word/services/test-service-NN//foo` (note the double `//`) and EchoHandler is returning that URL unchanged, instead of `/@/space%20word/services/test-service-NN/foo`
Issue #4833 proposes allowing configuration of buckets for server spawn
duration. It was resolved with PR #4967
This follows a similar pattern to support the same kind of configuration
for server stop duration
- add docs, tests
- deprecate DummyAuthenticator.password, pointing to new class
- accept no password as valid config (no login possible)
- log warnings for suspicious config (e.g. passwords not set, admin password set, but no admin users, etc.)
Currently, admin users are even more insecure than otherwise
with dummyauthenticator - anyone who knows the username of the admin
can get in if they also know the password.
This PR adds an additional layer of security - admins *must* login
using a different, more secure (longer, per NIST guidelines) password.
If they login using the regular password, no admin status for them.
This mildly helpful in local testing and improves overall security
posture. Where it really shines though, is in 'workshop' hubs. I've
been running those for years now, both at UC Berkeley and now at 2i2c
(with NASA Openscapes in particular). This was the usecase DummyAuth
was written for :D It allows an instructor to share a single password
with all the users in a secure way (they're all in a physical room,
zoom, etc). The password is then changed after the workshop. However,
admin access was not possible in this use case, as anyone guessing the
admin's username can get in as admin. With this change, admin access
is possible.
- when adding trailing slash, do so inside url_path_join, not with `+ '/'`
- don't use url_path_join to build url for handler _outside_ prefix (AddSlash on `/hub`)
the functions we use haven't changed in almost 10 years,
and are only a few lines
we should probably lose them eventually, but easier to vendor them first
wait for networkidle isn't enough for debounced name filter
clock.run_for doesn't seem to work, either, unclear why
instead, make sure the first page reflects the filtered view before clicking 'next'
- allow cancellation of outdated updates
- trigger offset changes with setOffset instead of on reply
- render pagination footer with `user_page.offset` instead of state.offset which only represents the _requested_ offset, not current view
- show login form for trying again, just like a password failure
- nicer, but more vague "try again" error for expired xsrf (original error still logged)
because users logging in don't need to know or understand xsrf stuff
- set fresh xsrf cookie when login page loads, to maximize time until expiration
- `{name}_input` for overriding full input
- `{name}_input_attrs` for overriding input element attributes (not including id).
Use `super()` to extend.
- For all `name` in username, password, otp
With this change, if we set
```
{% block username_input_attribs %}pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"{% endblock username_input_attribs %}
```
We will get the following generated code
```
<input
id="username_input"
type="text"
autocapitalize="off"
autocorrect="off"
autocomplete="username"
class="form-control"
name="username"
val=""
tabindex="1"
autofocus="autofocus"
pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"
/>
This allows to update the intersphinx url in a single location when
those move, an make it a tiny-bit easier to add existing packages than
having to figure out where their docs are.
- defer jupyter_core import that caused earlier, less informative ImportError
- point to `pip install jupyterhub[singleuser]` in the error
- use `raise from` so original import error is still reported
For security reasons, only allow-listed env vars in the parent
JupyterHub process are passed to the single-user server Python process.
This allow-list is controlled by `Spawner.env_keep`, which by default
includes common env vars that are (a) both necessary for the single-user
server process to work, (b) don't contain credentials or sensitive
information that shouldn't be revealed to users of the Notebook.
However, this allow-list was missing the `LD_LIBRARY_PATH` env var,
which causes shared library errors when using a relocated Python that
has been compiled in shared mode (`--enable-shared`). This prevents
JupyterHub from working out of the box on platforms like Heroku.
Fixes#4903.
instead of initialize, which should only create objects
improves symmetry with stop, should remove some warnings about unfinished coroutines in some tests
* The singleuser mixin is attempting to bypass jupyter_server's
interactive prompt on shutdown by stopping the IO loop.
* This does disable the interactive prompt, but also causes SIGINT
to be ignored causing SIGTERM to be issued after the timeout is hit.
* Closing the IO loop also prevents the server from closing async resources.
* This change allows jupyter_server to run its cleanup logic as
intended.
Secure contexts are a more robust way of checking that a browsing context
is authenticated and confidential. Compared to comparing the scheme this
covers cases where the connection is encrypted, but using a broken algorithm.
Notably, localhost is considered a secure context, even over HTTP.
For more detail on secure contexts, see:
https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts
cancels start rather than waiting for it to finish or timeout
also fixes cancellation when start_timeout is reached, which was previously left running forever
While doing https://github.com/jupyterhub/jupyterhub/pull/2726,
I realized we don't have a consistent way to format references
inside the docs. I now have them be formatted to match the name
of the file, but using `:` to separate them instead of `/` or `-`.
`/` makes it ambiguous when using with markdown link syntax, as
it could be a reference or a file. And using `-` is ambiguous, as
that can be the name of the file itself.
This PR does about half, I can do the other half later (unless
someone else does).
Allows limiting max expiration of tokens created via the API
Only affects the POST /api/tokens endpoint, not tokens issued by other means or created prior to config
Similar to 'kubespawner_override' in KubeSpawner, this allows
admins to selectivel override spawner configuration based on
groups a user belongs to. This allows for low maintenance but
extremely powerful customization based on group membership.
This is particularly powerful when combined with
https://github.com/jupyterhub/oauthenticator/pull/735
\#\# Dictionary vs List
Ordering is important here, but still I choose to implement this
configuration as a dictionary of dictionaries vs a list. This is
primarily to allow for easy overriding in z2jh (and similar places),
where Lists are just really hard to override. Ordering is provided
by lexicographically sorting the keys, similar to how we do it in z2jh.
\#\# Merging config
The merging code is literally copied from KubeSpawner, and provides
the exact same behavior. Documentation of how it acts is also copied.
- Missing `form-control` on a textbox gave it weird padding,
this fixes it.
- Add new server is set up as a [button addon](https://getbootstrap.com/docs/5.3/forms/input-group/#button-addons)
- Add a little right margin to the username in the navbar,
just before the logout button. Otherwise they were 'stuck'
to each other
set offset -> request page -> response sets offset is a recipe for races
instead, send request with new offset and only update offset state
made easier by consolidating page update requests into single loadPageData
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
If you believe you’ve found a security vulnerability in a Jupyter
project, please report it to security@ipython.org. If you prefer to
encrypt your security reports, you can use [this PGP public key](https://jupyter-notebook.readthedocs.io/en/stable/_downloads/1d303a645f2505a8fd283826fafc9908/ipython_security.asc).
project, please report it!
See the [security documentation](https://jupyterhub.readthedocs.org/en/latest/contributing/security.html) for how.
Our community is distributed across the world in various timezones, so please be patient if you do not get a response immediately!
```
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
## Discourse (recommended)
We use [Discourse](https://discourse.jupyter.org) for online discussions and support questions.
You can ask questions here if you are a first-time contributor to the JupyterHub project.
Everyone in the Jupyter community is welcome to bring ideas and questions there.
```{note}
[Discourse] is open source.
```
We recommend that you first use our Discourse as all past and current discussions on it are archived and searchable. Thus, all discussions remain useful and accessible to the whole community.
We use [Jupyter instance of Discourse] for online discussions and support questions.
You can ask questions at [Jupyter instance of Discourse] if you are a first-time contributor to the JupyterHub project.
Everyone is welcome to bring ideas and questions at [Jupyter instance of Discourse].
## Gitter
We recommend that you first use [Jupyter instance of Discourse] as all past and current discussions on it are archived and searchable. Thus, all discussions remain useful and accessible to the whole community.
We use [our Gitter channel](https://gitter.im/jupyterhub/jupyterhub) for online, real-time text chat; a place for more ephemeral discussions. When you're not on Discourse, you can stop here to have other discussions on the fly.
## Zulip
```{note}
[Zulip] is open source.
```
We use [Jupyter instance of Zulip] for online, real-time text chat; a place for more ephemeral discussions. When you're not on [Jupyter instance of Discourse], you can stop at [Jupyter instance of Zulip] to have other discussions on the fly.
## Github Issues
@@ -22,6 +36,7 @@ We use [our Gitter channel](https://gitter.im/jupyterhub/jupyterhub) for online,
- If you are using a specific JupyterHub distribution (such as [Zero to JupyterHub on Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) or [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub/)), you should open issues directly in their repository.
- If you cannot find a repository to open your issue in, do not worry! Open the issue in the [main JupyterHub repository](https://github.com/jupyterhub/jupyterhub/) and our community will help you figure it out.
```{note}
Our community is distributed across the world in various timezones, so please be patient if you do not get a response immediately!
```
[Discourse]: https://www.discourse.org/
[Jupyter instance of Discourse]: https://discourse.jupyter.org
[Jupyter instance of Zulip]: https://jupyter.zulipchat.com/
Documentation is often more important than code. This page helps
you get set up on how to contribute to JupyterHub's documentation.
We use [Sphinx](https://www.sphinx-doc.org) to build our documentation. It takes
our documentation source files (written in [Markedly Structured Text (MyST)](https://mystmd.org/) and
stored under the `docs/source` directory) and converts it into various
formats for people to read.
## Building documentation locally
We use [sphinx](https://www.sphinx-doc.org) to build our documentation. It takes
our documentation source files (written in [markdown](https://daringfireball.net/projects/markdown/) or [reStructuredText](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) &
stored under the `docs/source` directory) and converts it into various
formats for people to read. To make sure the documentation you write or
To make sure the documentation you write or
change renders correctly, it is good practice to test it locally.
1. Make sure you have successfully completed {ref}`contributing/setup`.
```{note}
You will need Python and Git installed. Installation details are avaiable at {ref}`contributing:setup`.
```
2. Install the packages required to build the docs.
1. Install the packages required to build the docs.
```bash
python3 -m pip install -r docs/requirements.txt
python3 -m pip install sphinx-autobuild
```
3. Build the html version of the docs. This is the most commonly used
2. Build the HTML version of the docs. This is the most commonly used
output format, so verifying it renders correctly is usually good
enough.
```bash
cd docs
make html
sphinx-autobuild docs/source/ docs/_build/html
```
This step will display any syntax or formatting errors in the documentation,
along with the filename / line number in which they occurred. Fix them,
and re-run the `make html` command to re-render the documentation.
and the HTML will be re-render automatically.
4. View the rendered documentation by opening `_build/html/index.html` in
3. View the rendered documentation by opening <http://127.0.0.1:8000> in
a web browser.
:::{tip}
**On Windows**, you can open a file from the terminal with `start <path-to-file>`.
**On macOS**, you can do the same with `open <path-to-file>`.
**On Linux**, you can do the same with `xdg-open <path-to-file>`.
After opening index.html in your browser you can just refresh the page whenever
you rebuild the docs via `make html`
:::
(contributing-docs-conventions)=
## Documentation conventions
@@ -67,10 +60,10 @@ approach:
python3 -m pip
```
This invokes pip explicitly using the python3 binary that you are
This invokes `pip` explicitly using the `python3` binary that you are
currently using. This is the **recommended way** to invoke pip
in our documentation, since it is least likely to cause problems
with python3 and pip being from different environments.
with `python3` and `pip` being from different environments.
For more information on how to invoke `pip` commands, see
JupyterHub's continuous integration runs on [Ubuntu LTS](https://ubuntu.com/).
While JupyterHub is only tested on one [Linux distribution](https://en.wikipedia.org/wiki/Linux_distribution),
it should be fairly insensitive to variations between common [POXIS](https://en.wikipedia.org/wiki/POSIX) implementation,
though we don't have the bandwidth to verify this automatically and continuously.
Feel free to try it on your platform, and be sure to {ref}`let us know <contributing:community>` about any issues you encounter.
## System requirements
JupyterHub can only run on macOS or Linux operating systems. If you are
using Windows, we recommend using [VirtualBox](https://virtualbox.org)
or a similar system to run [Ubuntu Linux](https://ubuntu.com) for
development.
Your system **must** be able to run
- Python
- NodeJS
- Git
Our small team knows JupyterHub to work perfectly on macOS or Linux operating systems.
```{admonition} What about Windows?
Some users have reported that JupyterHub runs successfully on [Windows Subsystem for Linux (WSL)](https://learn.microsoft.com/en-us/windows/wsl/). We have no plans to support Windows outside of the WSL.
```
```{admonition} What about virtualization?
Using any form of virtualization (for example, [VirtualBox](https://www.virtualbox.org/), [Docker](https://www.docker.com/), [Podman](https://podman.io/), [WSL](https://learn.microsoft.com/en-us/windows/wsl/)) is a good way to get up and running quickly, though properly configuring the networking settings can be a bit tricky.
```
### Install Python
JupyterHub is written in the [Python](https://python.org) programming language and
JupyterHub is written in the [Python](https://www.python.org) programming language and
requires you have at least version {{python_min}} installed locally. If you haven’t
installed Python before, the recommended way to install it is to use
for development & collaboration. You need to [install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to work on
JupyterHub. We also recommend getting a free account on GitHub.com.
JupyterHub uses [Git](https://git-scm.com) and [GitHub](https://github.com)
for development and collaboration. You need to [install Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to work on
JupyterHub. We also recommend getting a free account on GitHub.
## Setting up a development install
## Install JupyterHub for development
When developing JupyterHub, you would need to make changes and be able to instantly view the results of the changes. To achieve that, a developer install is required.
@@ -44,7 +63,7 @@ be achieved in many ways, for example, `tox`, `conda`, `docker`, etc. See this
a more detailed discussion.
:::
1. Clone the [JupyterHub git repository](https://github.com/jupyterhub/jupyterhub)
1. Clone the [JupyterHub Git repository](https://github.com/jupyterhub/jupyterhub)
to your computer.
```bash
@@ -65,7 +84,7 @@ a more detailed discussion.
npm -v
```
This should return a version number greater than or equal to 5.0.
This should return a version number greater than or equal to {{node_min}}.
3. Install `configurable-http-proxy` (required to run and test the default JupyterHub configuration):
@@ -92,7 +111,7 @@ a more detailed discussion.
4. Install an editable version of JupyterHub and its requirements for
development and testing. This lets you edit JupyterHub code in a text editor
& restart the JupyterHub process to see your code changes immediately.
and restart the JupyterHub process to see your code changes immediately.
```bash
python3 -m pip install --editable ".[test]"
@@ -109,7 +128,7 @@ a more detailed discussion.
Happy developing!
## Using DummyAuthenticator & SimpleLocalProcessSpawner
## Using DummyAuthenticator and SimpleLocalProcessSpawner
To simplify testing of JupyterHub, it is helpful to use
{class}`~jupyterhub.auth.DummyAuthenticator` instead of the default JupyterHub
@@ -132,17 +151,17 @@ The test configuration enables a few things to make testing easier:
- disable caching of static files
The default JupyterHub [authenticator](PAMAuthenticator)
& [spawner](LocalProcessSpawner)
and [spawner](LocalProcessSpawner)
require your system to have user accounts for each user you want to log in to
JupyterHub as.
DummyAuthenticator allows you to log in with any username & password,
DummyAuthenticator allows you to log in with any username and password,
while SimpleLocalProcessSpawner allows you to start servers without having to
create a Unix user for each JupyterHub user. Together, these make it
much easier to test JupyterHub.
Tip: If you are working on parts of JupyterHub that are common to all
authenticators & spawners, we recommend using both DummyAuthenticator &
authenticators and spawners, we recommend using both DummyAuthenticator and
SimpleLocalProcessSpawner. If you are working on just authenticator-related
parts, use only SimpleLocalProcessSpawner. Similarly, if you are working on
just spawner-related parts, use only DummyAuthenticator.
The tests live in `jupyterhub/tests` and are organized roughly into:
1. `test_api.py` tests the REST API
2. `test_pages.py` tests loading the HTML pages
1. `test_api.py`: tests the REST API
2. `test_pages.py`: tests loading the HTML pages
and other collections of tests for different components.
When writing a new test, there should usually be a test of
@@ -126,7 +134,7 @@ For more information on asyncio and event-loops, here are some resources:
### All the tests are failing
Make sure you have completed all the steps in {ref}`contributing/setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
Make sure you have completed all the steps in {ref}`contributing:setup` successfully, and are able to access JupyterHub from your browser at <http://localhost:8000> after starting `jupyterhub` in your command line.
This page could be missing cross-links to other parts of
the documentation. You can help by adding them!
```
JupyterHub is not what you think it is. Most things you think are
part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
when setting up a sufficiently complex Jupyter(Hub) setup.
This document was originally written to assist in debugging: very
often, the actual problem is not where one thinks it is and thus
people can't easily debug. In order to tell this story, we start at
JupyterHub and go all the way down to the fundamental components of
Jupyter.
In this document, we occasionally leave things out or bend the truth
where it helps in explanation, and give our explanations in terms of
Python even though Jupyter itself is language-neutral. The "(&)"
symbol highlights important points where this page leaves out or bends
the truth for simplification of explanation, but there is more if you
dig deeper.
This guide is long, but after reading it you will be know of all major
components in the Jupyter ecosystem and everything else you read
should make sense.
## What is Jupyter?
Before we get too far, let's remember what our end goal is. A
**Jupyter Notebook** is nothing more than a Python(&) process
which is getting commands from a web browser and displaying the output
via that browser. What the process actually sees is roughly like
getting commands on standard input(&) and writing to standard
output(&). There is nothing intrinsically special about this process
- it can do anything a normal Python process can do, and nothing more.
The **Jupyter kernel** handles capturing output and converting things
such as graphics to a form usable by the browser.
Everything we explain below is building up to this, going through many
different layers which give you many ways of customizing how this
process runs.
## JupyterHub
**JupyterHub** is the central piece that provides multi-user
login capabilities. Despite this, the end user only briefly interacts with
JupyterHub and most of the actual Jupyter session does not relate to
the hub at all: the hub mainly handles authentication and creating (JupyterHub calls it "spawning") the
single-user server. In short, anything which is related to _starting_
the user's workspace/environment is about JupyterHub, anything about
_running_ usually isn't.
If you have problems connecting the authentication, spawning, and the
proxy (explained below), the issue is usually with JupyterHub. To
debug, JupyterHub has extensive logs which get printed to its console
and can be used to discover most problems.
The main pieces of JupyterHub are:
### Authenticator
JupyterHub itself doesn't actually manage your users. It has a
database of users, but it is usually connected with some other system
that manages the usernames and passwords. When someone tries to log
in to JupyteHub, it asks the
**authenticator**([basics](authenticators),
[reference](../reference/authenticators)) if the
username/password is valid(&). The authenticator returns a username(&),
which is passed on to the spawner, which has to use it to start that
user's environment. The authenticator can also return user
groups and admin status of users, so that JupyterHub can do some
higher-level management.
The following authenticators are included with JupyterHub:
- **PAMAuthenticator** uses the standard Unix/Linux operating system
functions to check users. Roughly, if someone already has access to
the machine (they can log in by ssh), they will be able to log in to
JupyterHub without any other setup. Thus, JupyterHub fills the role
of a ssh server, but providing a web-browser based way to access the
machine.
There are [plenty of others to choose from](authenticators-reference).
You can connect to almost any other existing service to manage your
users. You either use all users from this other service (e.g. your
company), or enable only the allowed users (e.g. your group's
Github usernames). Some other popular authenticators include:
- **OAuthenticator** uses the standard OAuth protocol to verify users.
For example, you can easily use Github to authenticate your users -
people have a "click to login with Github" button. This is often
done with a allowlist to only allow certain users.
- **NativeAuthenticator** actually stores and validates its own
usernames and passwords, unlike most other authenticators. Thus,
you can manage all your users within JupyterHub only.
- There are authenticators for LTI (learning management systems),
Shibboleth, Kerberos - and so on.
The authenticator is configured with the
`c.JupyterHub.authenticator_class` configuration option in the
`jupyterhub_config.py` file.
The authenticator runs internally to the Hub process but communicates
with outside services.
If you have trouble logging in, this is usually a problem of the
authenticator. The authenticator logs are part of the the JupyterHub
logs, but there may also be relevant information in whatever external
services you are using.
### Spawner
The **spawner** ([basics](spawners),
[reference](../reference/spawners)) is the real core of
JupyterHub: when someone wants a notebook server, the spawner allocates
resources and starts the server. The notebook server could run on the
same machine as JupyterHub, on another machine, on some cloud service,
or more. Administrators can limit resources (CPU, memory) or isolate users
from each other - if the spawner supports it. They can also do no
limiting and allow any user to access any other user's files if they
are not configured properly.
Some basic spawners included in JupyterHub are:
- **LocalProcessSpawner** is built into JupyterHub. Upon launch it tries
to switch users to the given username (`su` (&)) and start the
notebook server. It requires that the hub be run as root (because
only root has permission to start processes as other user IDs).
LocalProcessSpawner is no different than a user logging in with
something like `ssh` and running `jupyter notebook`. PAMAuthenticator and
LocalProcessSpawner is the most basic way of using JupyterHub (and
what it does out of the box) and makes the hub not too dissimilar to
an advanced ssh server.
There are [many more advanced spawners](/reference/spawners), and to
show the diversity of spawning strategys some are listed below:
- **SudoSpawner** is like LocalProcessSpawner but lets you run
JupyterHub without root. `sudo` has to be configured to allow the
hub's user to run processes under other user IDs.
- **SystemdSpawner** uses Systemd to start other processes. It can
isolate users from each other and provide resource limiting.
- **DockerSpawner** runs stuff in Docker, a containerization system.
This lets you fully isolate users, limit CPU, memory, and provide
other container images to fully customize the environment.
- **KubeSpawner** runs on the Kubernetes, a cloud orchestration
system. The spawner can easily limit users and provide cloud
scaling - but the spawner doesn't actually do that, Kubernetes
does. The spawner just tells Kubernetes what to do. If you want to
get KubeSpawner to do something, first you would figure out how to
do it in Kubernetes, then figure out how to tell KubeSpawner to tell
Kubernetes that. Actually... this is true for most spawners.
- **BatchSpawner** runs on computer clusters with batch job scheduling
systems (e.g Slurm, HTCondor, PBS, etc). The user processes are run
as batch jobs, having access to all the data and software that the
users normally will.
In short, spawners are the interface to the rest of the operating
system, and to configure them right you need to know a bit about how
the corresponding operating system service works.
The spawner is responsible for the environment of the single-user
notebook servers (described in the next section). In the end, it just
makes a choice about how to start these processes: for example, the
Docker spawner starts a normal Docker container and runs the right
command inside of it. Thus, the spawner is responsible for setting
what kind of software and data is available to the user.
The spawner runs internally to the Hub process but communicates with
outside services. It is configured by `c.JupyterHub.spawner_class` in
`jupyterhub_config.py`.
If a user tries to launch a notebook server and it doesn't work, the
error is usually with the spawner or the notebook server (as described
in the next section). Each spawner outputs some logs to the main
JupyterHub logs, but may also have logs in other places depending on
what services it interacts with (for example, the Docker spawner
somehow puts logs in the Docker system services, Kubernetes through
the `kubectl` API).
### Proxy
The JupyterHub **proxy** relays connections between the users
and their single-user notebook servers. What this basically means is
that the hub itself can shut down and the proxy can continue to
allow users to communicate with their notebook servers. (This
further emphasizes that the hub is responsible for starting, not
running, the notebooks). By default, the hub starts the proxy
automatically
and stops the proxy when the hub stops (so that connections get
interrupted). But when you [configure the proxy to run
separately](howto:separate-proxy),
user's connections will continue to work even without the hub.
The default proxy is **ConfigurableHttpProxy** which is simple but
effective. A more advanced option is the [**Traefik Proxy**](https://blog.jupyter.org/introducing-traefikproxy-a-new-jupyterhub-proxy-based-on-traefik-4839e972faf6),
which gives you redundancy and high-availability.
When users "connect to JupyterHub", they _always_ first connect to the
proxy and the proxy relays the connection to the hub. Thus, the proxy
is responsible for SSL and accepting connections from the rest of the
internet. The user uses the hub to authenticate and start the server,
and then the hub connects back to the proxy to adjust the proxy routes
for the user's server (e.g. the web path `/user/someone` redirects to
the server of someone at a certain internal address). The proxy has
to be able to internally connect to both the hub and all the
single-user servers.
The proxy always runs as a separate process to JupyterHub (even though
JupyterHub can start it for you). JupyterHub has one set of
configuration options for the proxy addresses (`bind_url`) and one for
the hub (`hub_bind_url`). If `bind_url` is given, it is just passed to
the automatic proxy to tell it what to do.
If you have problems after users are redirected to their single-user
notebook servers, or making the first connection to the hub, it is
usually caused by the proxy. The ConfigurableHttpProxy's logs are
mixed with JupyterHub's logs if it's started through the hub (the
default case), otherwise from whatever system runs the proxy (if you
do configure it, you'll know).
### Services
JupyterHub has the concept of **services** ([basics](tutorial:services),
[reference](services-reference)), which are other web services
started by the hub, but otherwise are not necessarily related to the
hub itself. They are often used to do things related to Jupyter
(things that user interacts with, usually not the hub), but could
always be run some other way. Running from the hub provides an easy
way to get Hub API tokens and authenticate users against the hub. It
can also automatically add a proxy route to forward web requests to
@@ -108,26 +108,29 @@ Doing so generally involves:
### Default backend: SQLite
The default database backend for JupyterHub is [SQLite](https://sqlite.org).
We have chosen SQLite as JupyterHub's default because it's simple (the 'database' is a single file) and ubiquitous (it is in the Python standard library).
It works very well for testing, small deployments, and workshops.
We have chosen SQLite as JupyterHub's default because it's simple (the 'database' is a single file), ubiquitous (it is in the Python standard library), and it does not require maintaining a separate database server.
For production systems, SQLite has some disadvantages when used with JupyterHub:
The main disadvantage of SQLite is it does not support remote backup tools or replication.
You should backup your database by taking snapshots of the file (`jupyterhub.sqlite`).
-`upgrade-db` may not always work, and you may need to start with a fresh database
-`downgrade-db`**will not** work if you want to rollback to an earlier
version, so backup the `jupyterhub.sqlite` file before upgrading (JupyterHub automatically creates a date-stamped backup file when upgrading sqlite)
SQLite is ideal for testing, small deployments, workshops, and production servers where you do not require remote backup or replication.
### Picking your database backend (PostgreSQL, MySQL)
The sqlite documentation provides a helpful page about [when to use SQLite and
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
### Picking your database backend (PostgreSQL, MySQL)
When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement, which is used in some database upgrade steps.
In general, you select your database backend with [](JupyterHub.db_url), and can further configure it (usually not necessary) with [](JupyterHub.db_kwargs).
## Notes and Tips
### Upgrading the JupyterHub database
[Upgrading JupyterHub to a new major release](howto:upgrading-jupyterhub) often requires an upgrade to the database schema.
-`jupyterhub upgrade-db` will execute a schema upgrade. You should backup your database before running this.
-`jupyterhub downgrade-db` may be able to revert a schema upgrade on PostgreSQL and MySQL, but this is not guaranteed to work, and is not supported.
### SQLite
The SQLite database should not be used on NFS. SQLite uses reader/writer locks
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
@@ -5,6 +7,7 @@ _Explanation_ documentation provide big-picture descriptions of how JupyterHub w
to retrieve information about the owner of the token (the user).
This is the step where behavior diverges for different OAuth providers.
Up to this point, all OAuth providers are the same, following the OAuth specification.
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/developers/how-connect-works/) does that),
so this step may be different for each OAuth provider.
- Finally, the OAuth client stores its own record that the user is authorized in a cookie.
This could be the token itself, or any other appropriate representation of successful authentication.
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services`
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services-reference`
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
This is primarily implemented in the {class}`~.HubOAuth` class.
@@ -104,6 +104,6 @@ But technically, all JupyterHub cares about is that it is:
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services`.
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services-reference`.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
Unfortunately, for many institutional domains, wildcard DNS and SSL may not be available.
We also **strongly encourage** serving JupyterHub and user content on a domain that is _not_ a subdomain of any sensitive content.
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/2013-04-09-yummy-cookies-across-domains/).
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/engineering/yummy-cookies-across-domains/).
**If you do plan to serve untrusted users, enabling subdomains is highly encouraged**,
as it resolves many security issues, which are difficult to unavoidable when JupyterHub is on a single-domain.
@@ -186,7 +186,6 @@ For example:
-`Content-Security-Policy` header must prohibit popups and iframes from the same origin.
The following Content-Security-Policy rules are _insecure_ and readily enable users to access each others' servers:
@@ -167,7 +167,7 @@ When your whole JupyterHub sits behind an organization proxy (_not_ a reverse pr
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
{ref}`services` allow processes to interact with JupyterHub's REST API. Example use-cases include:
{ref}`services-reference` allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
@@ -198,6 +198,23 @@ With a docker container, pass in the environment variable with the run command:
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
### Jupyter Notebook/Lab can be launched, but notebooks seem to hang when trying to execute a cell
This often occurs when your browser is unable to open a websocket connection to a Jupyter kernel.
#### Diagnose
Open your browser console, e.g. [Chrome](https://developer.chrome.com/docs/devtools/console), [Firefox](https://firefox-source-docs.mozilla.org/devtools-user/web_console/).
If you see errors related to opening websockets this is likely to be the problem.
#### Solutions
This could be caused by anything related to the network between your computer/browser and the server running JupyterHub, such as:
- reverse proxies (see {ref}`howto:config:reverse-proxy` for example configurations)
- anti-virus or firewalls running on your computer or JupyterHub server
- transparent proxies running on your network
## How do I...?
### Use a chained SSL certificate
@@ -259,17 +276,6 @@ the entire filesystem and set the default to the user's home directory.
c.Spawner.notebook_dir = '/'
c.Spawner.default_url = '/home/%U' # %U will be replaced with the username
### How do I increase the number of pySpark executors on YARN?
From the command line, pySpark executors can be configured using a command
[Cloudera documentation for configuring spark on YARN applications](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps)
provides additional information. The [pySpark configuration documentation](https://spark.apache.org/docs/0.9.0/configuration.html)
is also helpful for programmatic configuration examples.
### How do I use JupyterLab's pre-release version with JupyterHub?
While JupyterLab is still under active development, we have had users
@@ -300,6 +306,52 @@ notebook servers to default to JupyterLab:
Users will need a GitHub account to log in and be authenticated by the Hub.
### I'm seeing "403 Forbidden XSRF cookie does not match POST" when users try to login
During login, JupyterHub takes the request IP into account for CSRF protection.
If proxies are not configured to properly set forwarded ips,
JupyterHub will see all requests as coming from an internal ip,
likely the ip of the proxy itself.
You can see this in the JupyterHub logs, which log the ip address of requests.
If most requests look like they are coming from a small number `10.0.x.x` or `172.16.x.x` ips, the proxy is not forwarding the true request ip properly.
If the proxy has multiple replicas,
then it is likely the ip may change from one request to the next,
leading to this error during login:
> 403 Forbidden XSRF cookie does not match POST argument
The best way to fix this is to ensure your proxies set the forwarded headers, e.g. for nginx:
Sometimes, JupyterHub is integrated into an existing application that has already handled user login, etc..
It is often preferable in these applications to be able to link users to their running JupyterHub server without _prompting_ the user to login again with the Hub when the Hub should really be an implementation detail,
and not part of the user experience.
One way to do this has been to use [API only mode](#howto:api-only), issue tokens for users, and redirect users to a URL like `/users/name/?token=abc123`.
This is [disabled by default](#HubAuth.allow_token_in_url) in JupyterHub 5, because it presents a vulnerability for users to craft links that let _other_ users login as them, which can lead to inter-user attacks.
But that leaves the question: how do I as an _application developer_ embedding JupyterHub link users to their own running server without triggering another login prompt?
The problem with `?token=...` in the URL is specifically that _users_ can get and create these tokens, and share URLs.
This wouldn't be an issue if only authorized applications could issue tokens that behave this way.
The single-user server doesn't exactly have the hooks to manage this easily, but the [Authenticator](#Authenticator) API does.
## Problem statement
We want our external application to be able to:
1. authenticate users
2. (maybe) create JupyterHub users
3. start JupyterHub servers
4. redirect users into running servers _without_ any login prompts/loading pages from JupyterHub, and without any prior JupyterHub credentials
Step 1 is up to the application and not JupyterHub's problem.
Step 2 and 3 use the JupyterHub [REST API](#jupyterhub-rest-API).
The service would need the scopes:
```
admin:users # creating users
servers # start/stop servers
```
That leaves the last step: sending users to their running server with credentials, without prompting login.
This is where things can get tricky!
### Ideal case: oauth
_Ideally_, the best way to set this up is with the external service as an OAuth provider,
though in some cases it works best to use proxy-based authentication like Shibboleth / [REMOTE_USER](https://github.com/cwaldbieser/jhub_remote_user_authenticator).
The main things to know are:
- Links to `/hub/user-redirect/some/path` will ultimately land users at `/users/theirserver/some/path` after completing login, ensuring the server is running, etc.
- Setting `Authenticator.auto_login = True` allows beginning the login process without JupyterHub's "Login with..." prompt
_If_ your OAuth provider allows logging in to external services via your oauth provider without prompting, this is enough.
Not all do, though.
If you've already ensured the server is running, this will _appear_ to the user as if they are being sent directly to their running server.
But what _actually_ happens is quite a series of redirects, state checks, and cookie-setting:
1. visiting `/hub/user-redirect/some/path` checks if the user is logged in
1. if not, begin the login process (`/hub/login?next=/hub/user-redirect/...`)
2. redirects to your oauth provider to authenticate the user
3. redirects back to `/hub/oauth_callback` to complete login
4. redirects back to `/hub/user-redirect/...`
2. once authenticated, checks that the user's server is running
1. if not running, begins launch of the server
2. redirects to `/hub/spawn-pending/?next=...`
3. once the server is running, redirects to the actual user server `/users/username/some/path`
Now we're done, right? Actually, no, because the browser doesn't have credentials for their user server!
This sequence of redirects happens all the time in JupyterHub launch, and is usually totally transparent.
4. at the user server, check for a token in cookie
1. if not present or not valid, begin oauth with the Hub (redirect to `/hub/api/oauth2/authorize/...`)
2. hub redirects back to `/users/user/oauth_callback` to complete oauth
3. redirect again to the URL that started this internal oauth
5. finally, arrive at `/users/username/some/path`, the ultimate destination, with valid JupyterHub credentials
The steps that will show users something other than the page you want them to are:
- Step 1.1 will be a prompt e.g. with "Login with..." unless you set `c.Authenticator.auto_login = True`
- Step 1.2 _may_ be a prompt from your oauth provider. This isn't controlled by JupyterHub, and may not be avoidable.
- Step 2.2 will show the spawn pending page only if the server is not already running
Otherwise, this is all transparent redirects to the final destination.
#### Using an authentication proxy (REMOTE_USER)
If you use an Authentication proxy like Shibboleth that sets e.g. the REMOTE_USER header,
you can use an Authenticator like [RemoteUserAuthenticator](https://github.com/cwaldbieser/jhub_remote_user_authenticator) to automatically login users based on headers in the request.
The same process will work, but instead of step 1.1 redirecting to the oauth provider, it logs in immediately.
If you do support an auth proxy, you also need to be extremely sure that requests only come from the auth proxy, and don't accept any requests setting the REMOTE_USER header coming from other sources.
### Custom case
But let's say you can't use OAuth or REMOTE_USER, and you still want to hide JupyterHub implementation details.
All you really want is a way to write a URL that will take users to their servers without any login prompts.
You can do this if you create an Authenticator with `auto_login=True` that logs users in based on something in the _request_, e.g. a query parameter.
We have an _example_ in the JupyterHub repo in `examples/forced-login` that does this.
It is a sample 'external service' where you type in a username and a destination path.
When you 'login' with this username:
1. a token is issued
2. the token is stored and associated with the username
3. redirect to `/hub/login?login_token=...&next=/hub/user-redirect/destination/path`
Then on the JupyterHub side, there is the `ForcedLoginAuthenticator`.
This class implements `authenticate`, which:
1. has `auto_login = True` so visiting `/hub/login` calls `authenticate()` directly instead of serving a page
2. gets the token from the `login_token` URL parameter
3. makes a POST request to the external application with the token, requesting a username
4. the external application returns the username and deletes the token, so it cannot be re-used
5. Authenticator returns the username
This doesn't _bypass_ JupyterHub authentication, as some deployments have done, but it does _hide_ it.
If your service launches servers via the API, you could run this in [API only mode](#howto:api-only) by adding `/hub/login` as well:
JupyterHub 0.8 introduced the ability to write a custom implementation of the
@@ -230,4 +232,4 @@ A list of the proxies that are currently available for JupyterHub (that we know
1. [`jupyterhub/configurable-http-proxy`](https://github.com/jupyterhub/configurable-http-proxy) The default proxy which uses node-http-proxy
2. [`jupyterhub/traefik-proxy`](https://github.com/jupyterhub/traefik-proxy) The proxy which configures traefik proxy server for jupyterhub
3. [`AbdealiJK/configurable-http-proxy`](https://github.com/AbdealiJK/configurable-http-proxy) A pure python implementation of the configurable-http-proxy
3. [`AbdealiJK/configurable-http-proxy`](https://github.com/corridor/configurable-http-proxy) A pure python implementation of the configurable-http-proxy
@@ -17,7 +17,7 @@ It has two main distributions which are developed to serve the needs of each of
1. [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub) distribution is suitable if you need a small number of users (1-100) and a single server with a simple environment.
2. [Zero to JupyterHub with Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) allows you to deploy dynamic servers on the cloud if you need even more users.
This distribution runs JupyterHub on top of [Kubernetes](https://k8s.io).
This distribution runs JupyterHub on top of [Kubernetes](https://kubernetes.io/).
```{note}
It is important to evaluate these distributions before you can continue with the
| `(no_scope)` | Identify the owner of the requesting entity. |
| `self` | The user’s own resources _(metascope for users, resolves to (no_scope) for services)_ |
| `inherit` | Everything that the token-owning entity can access _(metascope for tokens)_ |
| `admin-ui` | Access the admin page. Permission to take actions via the admin page granted separately. |
| `admin:users` | Read, modify, create, and delete users and their authentication state, not including their servers or tokens. This is an extremely privileged scope and should be considered tantamount to superuser. |
| `admin:auth_state` | Read a user’s authentication state. |
| `users` | Read and write permissions to user models (excluding servers, tokens and authentication state). |
| `read:users` | Read user models (including the URL of the default server if it is running). |
| `read:users:name` | Read names of users. |
| `read:users:groups` | Read users’ group membership. |
| `read:users:activity` | Read time of last user activity. |
| `list:users` | List users, including at least their names. |
| `read:users:name` | Read names of users. |
| `users:activity` | Update time of last user activity. |
| `read:users:activity` | Read time of last user activity. |
| `read:roles:users` | Read user role assignments. |
| `read:roles:users` | Read user role assignments. |
| `read:roles:services` | Read service role assignments. |
| `read:roles:groups` | Read group role assignments. |
| `admin:servers` | Read, start, stop, create and delete user servers and their state. |
| `admin:server_state` | Read and write users’ server state. |
| `servers` | Start and stop user servers. |
| `read:servers` | Read users’ names and their server models (excluding the server state). |
| `read:users:name` | Read names of users. |
| `delete:servers` | Stop and delete users' servers. |
| `tokens` | Read, write, create and delete user tokens. |
| `read:tokens` | Read user tokens. |
| `admin:groups` | Read and write group information, create and delete groups. |
| `groups` | Read and write group information, including adding/removing any users to/from groups. Note: adding users to groups may affect permissions. |
| `read:groups` | Read group models. |
| `read:groups:name` | Read group names. |
| `list:groups` | List groups, including at least their names. |
| `read:groups:name` | Read group names. |
| `read:roles:groups` | Read group role assignments. |
@@ -186,14 +186,14 @@ An **access scope** is used to govern _access_ to a JupyterHub service or a user
This means making API requests, or visiting via a browser using OAuth.
Without the appropriate access scope, a user or token should not be permitted to make requests of the service.
When you attempt to access a service or server authenticated with JupyterHub, it will begin the [oauth flow](jupyterhub-oauth) for issuing a token that can be used to access the service.
When you attempt to access a service or server authenticated with JupyterHub, it will begin the [oauth flow](explanation:hub-oauth) for issuing a token that can be used to access the service.
If the user does not have the access scope for the relevant service or server, JupyterHub will not permit the oauth process to complete.
If oauth completes, the token will have at least the access scope for the service.
For minimal permissions, this is the _only_ scope granted to tokens issued during oauth by default,
but can be expanded via {attr}`.Spawner.oauth_client_allowed_scopes` or a service's [`oauth_client_allowed_scopes`](service-credentials) configuration.
:::{seealso}
[Further explanation of OAuth in JupyterHub](jupyterhub-oauth)
[Further explanation of OAuth in JupyterHub](explanation:hub-oauth)
:::
If a given service or single-user server can be governed by a single boolean "yes, you can use this service" or "no, you can't," or limiting via other existing scopes, access scopes are enough to manage access to the service.
@@ -229,6 +229,32 @@ access:servers!server
access:servers!server=username/
: access to only `username`'s _default_ server.
(granting-scopes)=
### Considerations when allowing users to grant permissions via the `groups` scope
In general, permissions are fixed by role assignments in configuration (or via [Authenticator-managed roles](#authenticator-roles) in JupyterHub 5) and can only be modified by administrators who can modify the Hub configuration.
There is only one scope that allows users to modify permissions of themselves or others at runtime instead of via configuration:
the `groups` scope, which allows adding and removing users from one or more groups.
With the `groups` scope, a user can add or remove any users to/from any group.
With the `groups!group=name` filtered scope, a user can add or remove any users to/from a specific group.
There are two ways in which adding a user to a group may affect their permissions:
- if the group is assigned one or more roles, adding a user to the group may increase their permissions (this is usually the point!)
- if the group is the _target_ of a filter on this or another group, such as `access:servers!group=students`, adding a user to the group can grant _other_ users elevated access to that user's resources.
With these in mind, when designing your roles, do not grant users the `groups` scope for any groups which:
- have roles the user should not have authority over, or
- would grant them access they shouldn't have for _any_ user (e.g. don't grant `teachers` both `access:servers!group=students` and `groups!group=students` which is tantamount to the unrestricted `access:servers` because they control which users the `group=students` filter applies to).
If a group does not have role assignments and the group is not present in any `!group=` filter, there should be no permissions-related consequences for adding users to groups.
:::{note}
The legacy `admin` property of users, which grants extreme superuser permissions and is generally discouraged in favor of more specific roles and scopes, may be modified only by other users with the `admin` property (e.g. added via `admin_users`).
@@ -84,7 +84,6 @@ The passed scopes are compared to the scopes required to access the API as follo
- if the API scopes are present within the set of passed scopes, the access is granted and the API returns its "full" response
- if that is not the case, another check is utilized to determine if subscopes of the required API scopes can be found in the passed scope set:
- if found, the RBAC framework employs the {ref}`filtering <vertical-filtering-target>` procedures to refine the API response to access only resource attributes corresponding to the passed scopes. For example, providing a scope `read:users:activity!group=class-C` for the `GET /users` API will return a list of user models from group `class-C` containing only the `last_activity` attribute for each user model
@@ -11,7 +11,7 @@ No other database records are affected.
## Upgrade steps
1. All running **servers must be stopped** before proceeding with the upgrade.
2. To upgrade the Hub, follow the [Upgrading JupyterHub](upgrading-jupyterhub) instructions.
2. To upgrade the Hub, follow the [Upgrading JupyterHub](howto:upgrading-jupyterhub) instructions.
```{attention}
We advise against defining any new roles in the `jupyterhub.config.py` file right after the upgrade is completed and JupyterHub restarted for the first time. This preserves the 'current' state of the Hub. You can define and assign new roles on any other following startup.
@@ -36,16 +36,56 @@ A [generic implementation](https://github.com/jupyterhub/oauthenticator/blob/mas
## The Dummy Authenticator
When testing, it may be helpful to use the
{class}`~.jupyterhub.auth.DummyAuthenticator`. This allows for any username and
password unless a global password has been set. Once set, any username will
still be accepted but the correct password will need to be provided.
When testing, it may be helpful to use the {class}`~.jupyterhub.auth.DummyAuthenticator`:
```python
c.JupyterHub.authenticator_class="dummy"
# always a good idea to limit to localhost when testing with an insecure config
c.JupyterHub.ip="127.0.0.1"
```
This allows for any username and password to login, and is _wildly_ insecure.
To use, specify
```python
c.JupyterHub.authenticator_class="dummy"
```
:::{versionadded} 5.0
The DummyAuthenticator's default `allow_all` is True,
unlike most other Authenticators.
:::
:::{deprecated} 5.3
Setting a password on DummyAuthenticator is deprecated.
Use the new {class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator`
if you want to set a shared password for users.
:::
## Shared Password Authenticator
:::{versionadded} 5.3
{class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator` is added and [DummyAuthenticator.password](#DummyAuthenticator.password) is deprecated.
:::
For short-term deployments like workshops where there is no real user data to protect and you trust users to not abuse the system or each other,
{class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator` can be used.
Set a [user password](#SharedPasswordAuthenticator.user_password) for users to login:
You can also grant admin users access by adding them to `admin_users` and setting a separate [admin password](#SharedPasswordAuthenticator.admin_password):
@@ -469,8 +509,19 @@ which is a list of group names the user should be a member of:
- If `None` is returned, no changes are made to the user's group membership
If authenticator-managed groups are enabled,
all group-management via the API is disabled,
and roles cannot be specified with `load_groups` traitlet.
groups cannot be specified with `load_groups` traitlet.
:::{warning}
When `manage_groups` is True,
managing groups via the API is still permitted via the `admin:groups` scope (starting with 5.3),
but any time a user logs in their group membership is completely reset via the login process.
So it only really makes sense to make manual changes via the API that reflect upstream changes which are not automatically propagated, such as group deletion.
:::
:::{versionchanged} 5.3
Prior to JupyterHub 5.3, all group management via the API was disabled if `Authenticator.manage_groups` is True.
- [Press release on Jupyter and Cori](https://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2016/jupyter-notebooks-will-open-up-new-possibilities-on-nerscs-cori-supercomputer/)
- [Moving and sharing data](https://www.nersc.gov/assets/Uploads/03-MovingAndSharingData-Cholia.pdf)
- [Research IT](https://research-it.berkeley.edu)
- [JupyterHub server supports campus research computation](https://research-it.berkeley.edu/blog/17/01/24/free-fully-loaded-jupyterhub-server-supports-campus-research-computation)
- [JupyterHub server supports campus research computation](https://research-it.berkeley.edu/news/free-fully-loaded-jupyterhub-server-supports-campus-research-computation)
### University of California Davis
@@ -82,20 +78,11 @@ Within CERN, there are two noteworthy JupyterHub deployments in operation:
- Advanced Computing
- [Palmetto cluster and JupyterHub](https://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
### University of Colorado Boulder
- (CU Research Computing) CURC
- [JupyterHub User Guide](https://curc.readthedocs.io/en/latest/gateways/jupyterhub.html)
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
- Profiles in IPython Clusters tab
### ETH Zurich
[ETH Zurich](https://ethz.ch/en.html), (Federal Institute of Technology Zurich), is a public research university in Zürich, Switzerland, with focus on science, technology, engineering, and mathematics, although its 16 departments span a variety of disciplines and subjects.
The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/organisation/departments/educational-development-and-technology.html) unit provides JupyterHub exclusively for teaching and learning, integrated in the learning management system [Moodle](https://ethz.ch/staffnet/en/teaching/academic-support/it-services-teaching/teaching-applications/moodle-service.html). Each course gets its individually configured JupyterHub environment deployed on a on-premise Kubernetes cluster.
The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/organisation/departments/teaching-and-learning.html) unit provides JupyterHub exclusively for teaching and learning, integrated in the learning management system [Moodle](https://ethz.ch/staffnet/en/teaching/academic-support/it-services-teaching/teaching-applications/moodle-service.html). Each course gets its individually configured JupyterHub environment deployed on a on-premise Kubernetes cluster.
- [ETH JupyterHub](https://ethz.ch/staffnet/en/teaching/academic-support/it-services-teaching/teaching-applications/jupyterhub.html) for teaching and learning
@@ -134,16 +121,15 @@ The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/o
- [nbgraderutils](https://github.com/dice-group/nbgraderutils): Use JupyterHub + nbgrader + iJava kernel for online Java exercises. Used in lecture Statistical Natural Language Processing.
- [JavaOnlineExercises](https://github.com/dice-group/JavaOnlineExercises): Use JupyterHub + nbgrader + iJava kernel for online Java exercises. Used in lecture Statistical Natural Language Processing.
### Penn State University
- [Press release](https://news.psu.edu/story/523093/2018/05/24/new-open-source-web-apps-available-students-and-faculty): "New open-source web apps available for students and faculty"
- [Press release](https://www.psu.edu/news/academics/story/new-open-source-web-apps-available-students-and-faculty): "New open-source web apps available for students and faculty"
### University of California San Diego
- San Diego Supercomputer Center - Andrea Zonca
- [Deploy JupyterHub on a Supercomputer with SSH](https://zonca.github.io/2017/05/jupyterhub-hpc-batchspawner-ssh.html)
- [Run Jupyterhub on a Supercomputer](https://zonca.github.io/2015/04/jupyterhub-hpc.html)
- [Deploy JupyterHub on a VM for a Workshop](https://zonca.github.io/2016/04/jupyterhub-sdsc-cloud.html)
@@ -163,7 +149,7 @@ The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/o
### Elucidata
- What's new in Jupyter Notebooks @[Elucidata](https://elucidata.io/):
- What's new in Jupyter Notebooks @[Elucidata](https://www.elucidata.io/):
- [Using Jupyter Notebooks with Jupyterhub on GCP, managed by GKE](https://medium.com/elucidata/why-you-should-be-using-a-jupyter-notebook-8385a4ccd93d)
## Service Providers
@@ -183,7 +169,7 @@ The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/o
### Microsoft Azure
- [Azure Data Science Virtual Machine release notes](https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-linux-dsvm-intro)
- [Azure Data Science Virtual Machine release notes](https://learn.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-linux-dsvm-intro)
### Rackspace Carina
@@ -211,5 +197,5 @@ The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/o
- [Spark Cluster on OpenStack with Multi-User Jupyter Notebook](https://arnesund.com/2015/09/21/spark-cluster-on-openstack-with-multi-user-jupyter-notebook/)
would result in the metric `jupyterhub_prod_active_users`, etc.
(monitoring_bucket_sizes)=
## Customizing bucket sizes
As of JupyterHub 5.3, the following environment variables in the Hub's environment can be overridden to support custom bucket sizes - below are the defaults:
A [Spawner][] starts each single-user notebook server.
A [Spawner](#Spawner) starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
@@ -37,7 +37,7 @@ Some examples include:
### Spawner.start
`Spawner.start` should start a single-user server for a single user.
[](#Spawner.start) should start a single-user server for a single user.
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
@@ -68,11 +68,11 @@ async def start(self):
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
via relaxing the [](#Spawner.start_timeout) config value.
#### Note on IPs and ports
`Spawner.ip` and `Spawner.port` attributes set the _bind_ URL,
[](#Spawner.ip) and [](#Spawner.port) attributes set the _bind_ URL,
which the single-user server should listen on
(passed to the single-user process via the `JUPYTERHUB_SERVICE_URL` environment variable).
The _return_ value is the IP and port (or full URL) the Hub should _connect to_.
@@ -124,7 +124,7 @@ If both attributes are not present, the Exception will be shown to the user as u
### Spawner.poll
`Spawner.poll` checks if the spawner is still running.
[](#Spawner.poll) checks if the spawner is still running.
It should return `None` if it is still running,
and an integer exit status, otherwise.
@@ -133,7 +133,7 @@ to check if the local process is still running. On Windows, it uses `psutil.pid_
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
[](#Spawner.stop) should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
@@ -166,17 +166,18 @@ def clear_state(self):
self.pid=0
```
(spawner_user_options)=
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
or docker-based deployments where users can select from a list of base images.
This may include cluster-based deployments, where users specify what memory or cpu resources should be available,
or container-based deployments where users can select from a list of base images,
or more complex configurations where users select a "profile" representing a bundle of settings to be applied together.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
This feature is enabled by setting [](#Spawner.options_form), which is an HTML form snippet
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
If the `Spawner.options_form` is defined, when a user tries to start their server they will be directed to a form page, like this:

@@ -186,28 +187,40 @@ See [this example](https://github.com/jupyterhub/jupyterhub/blob/HEAD/examples/s
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
Inputs from an HTML form always arrive as a dictionary of lists of strings, e.g.:
```python
{
formdata={
'integer':['5'],
'checkbox':['on'],
'text':['some text'],
'select':['a','b'],
}
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
When `formdata` arrives, it is passed through [](#Spawner.options_from_form):
[](#Spawner.options_from_form) is a configurable function to turn the HTTP form data into the correct structure for [](#Spawner.user_options).
`options_from_form` must return a dictionary, _may_ be async, and is meant to interpret the lists-of-strings a web form produces into the correct types.
For example, the `options_from_form` for the above form might look like:
```python
defoptions_from_form(formdata,spawner=None):
options={}
options['integer']=int(formdata['integer'][0])# single integer value
options['checkbox']=formdata['checkbox']==['on']
options['text']=formdata['text'][0]# single string value
options['select']=formdata['select']# list already correct
options['notinform']='extra info'# not in the form at all
returnoptions
c.Spawner.options_from_form=options_from_form
```
which would return:
@@ -215,15 +228,115 @@ which would return:
```python
{
'integer':5,
'checkbox':True,
'text':'some text',
'select':['a','b'],
'notinform':'extra info',
}
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
The default Authenticator uses [PAM][] (Pluggable Authentication Module) to authenticate system users with
their usernames and passwords. With the default Authenticator, any user
with an account and password on the system will be allowed to login.
The default Authenticator uses [PAM][] (Pluggable Authentication Module) to authenticate users already defined on the system with their usernames and passwords.
With the default Authenticator,
any user with an account and password on the system will be able to login.
But that does not mean they will be **allowed** to access JupyterHub.
:::{important}
Only _explicitly allowed_ users can login to JupyterHub
(a user who can login but is not allowed will see a permission error after successful login).
:::
## Deciding who is allowed
@@ -93,6 +99,25 @@ A set of initial admin users, `admin_users` can be configured as follows:
c.Authenticator.admin_users = {'mal', 'zoe'}
```
:::{warning}
`admin_users` config can only be used to _grant_ admin permissions.
Removing users from this set **does not** remove their admin permissions,
which must be done via the admin page or API.
Role assignments via `load_roles` are the only way to _revoke_ past permissions from configuration:
```python
c.JupyterHub.load_roles = [
{
"name": "admin",
"users": ["admin1", "..."],
}
]
```
or, better yet, [specify your own roles](define-role-target) with only the permissions your admins actually need.
:::
Users in the admin set are automatically added to the user `allowed_users` set,
@@ -11,7 +11,6 @@ Before installing JupyterHub, you will need:
installing Python packages is helpful.
- [Node.js {{node_min}}](https://www.npmjs.com/) or greater, along with npm. [Install Node.js/npm](https://docs.npmjs.com/getting-started/installing-node),
using your operating system's package manager.
- If you are using **`conda`**, the nodejs and npm dependencies will be installed for
you by conda.
@@ -72,6 +71,35 @@ jupyterhub -h
configurable-http-proxy -h
```
## Configuration
At this point, we could start jupyterhub, but nobody would be able to use it!
Only users who are explicitly **allowed** can use JupyterHub.
To allow users, we need to create a configuration file.
JupyterHub uses a configuration file called `jupyterhub_config.py`,
which is a regular Python script with one function `get_config()` pre-defined, returning the "config object".
Assigning attributes to this object is how we configure JupyterHub.
At this point, we have two choices:
1. allow any user who can successfully login with our Authenticator (often a good choice for local machines with PAM)
2. allow one or more users by name.
We'll start with the first one.
Create the file `jupyerhub_config.py` with the content:
Sometimes, when working with applications such as [BinderHub](https://binderhub.readthedocs.io), it may be necessary to launch Jupyter-based services on behalf of your users.
Doing so can be achieved through JupyterHub's [REST API](using-jupyterhub-rest-api), which allows one to launch and manage servers on behalf of users through API calls instead of the JupyterHub UI.
Doing so can be achieved through JupyterHub's [REST API](howto:rest-api), which allows one to launch and manage servers on behalf of users through API calls instead of the JupyterHub UI.
This way, you can take advantage of other user/launch/lifecycle patterns that are not natively supported by the JupyterHub UI, all without the need to develop the server management features of JupyterHub Spawners and/or Authenticators.
This tutorial goes through working with the JupyterHub API to manage servers for users.
The nice thing about this approach is that only users who already have those permissions will get a token which can take these actions.
The downside (in terms of convenience) is that the browser token is only accessible to the javascript (e.g. JupyterLab) and/or jupyter-server request handlers,
but not notebooks or terminals.
The downside is that the browser token is only accessible to the javascript (e.g. JupyterLab) and/or jupyter-server request handlers, but not notebooks or terminals.
The second way, which is less secure, but perhaps more convenient for demonstration purposes,
is to grant the _server itself_ permission to grant access to itself.
@@ -159,11 +165,14 @@ which will have a JSON response:
Example for forcing user login via URL without disabling token-in-url protection.
An external application issues tokens associated with usernames.
A JupyterHub Authenticator only allows login via these tokens in a URL parameter (`/hub/login?login_token=....`),
which are then exchanged for a username, which is used to login the user.
Each token can be used for login only once, and must be used within 30 seconds of issue.
To run:
in one shell:
```
python3 external_app.py
```
in another:
```
jupyterhub
```
Then visit http://127.0.0.1:9000
Sometimes, JupyterHub is integrated into an existing application,
which has already handled login, etc.
It is often preferable in these applications to be able to link users to their running JupyterHub server without _prompting_ the user for login to the Hub when the Hub should really be an implementation detail.
One way to do this has been to use "API only mode", issue tokens for users, and redirect users to a URL like `/users/name/?token=abc123`.
This is [disabled by default]() in JupyterHub 5, because it presents a vulnerability for users to craft links that let _other_ users login as them, which can lead to inter-user attacks.
But that leaves the question: how do I as an _application developer_ generate a link that can login a user?
_Ideally_, the best way to set this up is with the external service as an OAuth provider,
though in some cases it works best to use proxy-based authentication like Shibboleth / [REMOTE_USER]().
If your service is an OAuth provider, sharing links to `/hub/user-redirect/lab/tree/path/to/notebook...` should work just fine.
JupyterHub will:
1. authenticate the user
2. redirect to your identity provider via oauth (you can set `Authenticator.auto_login = True` if you want to skip prompting the user)
3. complete oauth
4. start their single-user server if it's not running (show the launch progress page while it's waiting)
5. redirect to their server once it's up
6. oauth (again), this time between the single-user server and the Hub
If your application chooses to launch the server and wait for it to be ready before redirecting
2. Visit http://127.0.0.1:8000/services/fastapi/docs. When going through the OAuth flow or getting a token from the control panel, you can log in with `testuser` / `passwd`.
2. Visit http://127.0.0.1:8000/services/fastapi/docs. When going through the OAuth flow or getting a token from the control panel, you can log in with 'test-user' and any password.
leterrorDialog=screen.getByText("Failed to delete group.");
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.