Issue #4833 proposes allowing configuration of buckets for server spawn
duration. It was resolved with PR #4967
This follows a similar pattern to support the same kind of configuration
for server stop duration
- add docs, tests
- deprecate DummyAuthenticator.password, pointing to new class
- accept no password as valid config (no login possible)
- log warnings for suspicious config (e.g. passwords not set, admin password set, but no admin users, etc.)
Currently, admin users are even more insecure than otherwise
with dummyauthenticator - anyone who knows the username of the admin
can get in if they also know the password.
This PR adds an additional layer of security - admins *must* login
using a different, more secure (longer, per NIST guidelines) password.
If they login using the regular password, no admin status for them.
This mildly helpful in local testing and improves overall security
posture. Where it really shines though, is in 'workshop' hubs. I've
been running those for years now, both at UC Berkeley and now at 2i2c
(with NASA Openscapes in particular). This was the usecase DummyAuth
was written for :D It allows an instructor to share a single password
with all the users in a secure way (they're all in a physical room,
zoom, etc). The password is then changed after the workshop. However,
admin access was not possible in this use case, as anyone guessing the
admin's username can get in as admin. With this change, admin access
is possible.
- when adding trailing slash, do so inside url_path_join, not with `+ '/'`
- don't use url_path_join to build url for handler _outside_ prefix (AddSlash on `/hub`)
the functions we use haven't changed in almost 10 years,
and are only a few lines
we should probably lose them eventually, but easier to vendor them first
wait for networkidle isn't enough for debounced name filter
clock.run_for doesn't seem to work, either, unclear why
instead, make sure the first page reflects the filtered view before clicking 'next'
- allow cancellation of outdated updates
- trigger offset changes with setOffset instead of on reply
- render pagination footer with `user_page.offset` instead of state.offset which only represents the _requested_ offset, not current view
- show login form for trying again, just like a password failure
- nicer, but more vague "try again" error for expired xsrf (original error still logged)
because users logging in don't need to know or understand xsrf stuff
- set fresh xsrf cookie when login page loads, to maximize time until expiration
- `{name}_input` for overriding full input
- `{name}_input_attrs` for overriding input element attributes (not including id).
Use `super()` to extend.
- For all `name` in username, password, otp
With this change, if we set
```
{% block username_input_attribs %}pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"{% endblock username_input_attribs %}
```
We will get the following generated code
```
<input
id="username_input"
type="text"
autocapitalize="off"
autocorrect="off"
autocomplete="username"
class="form-control"
name="username"
val=""
tabindex="1"
autofocus="autofocus"
pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"
/>
This allows to update the intersphinx url in a single location when
those move, an make it a tiny-bit easier to add existing packages than
having to figure out where their docs are.
- defer jupyter_core import that caused earlier, less informative ImportError
- point to `pip install jupyterhub[singleuser]` in the error
- use `raise from` so original import error is still reported
For security reasons, only allow-listed env vars in the parent
JupyterHub process are passed to the single-user server Python process.
This allow-list is controlled by `Spawner.env_keep`, which by default
includes common env vars that are (a) both necessary for the single-user
server process to work, (b) don't contain credentials or sensitive
information that shouldn't be revealed to users of the Notebook.
However, this allow-list was missing the `LD_LIBRARY_PATH` env var,
which causes shared library errors when using a relocated Python that
has been compiled in shared mode (`--enable-shared`). This prevents
JupyterHub from working out of the box on platforms like Heroku.
Fixes#4903.
instead of initialize, which should only create objects
improves symmetry with stop, should remove some warnings about unfinished coroutines in some tests
* The singleuser mixin is attempting to bypass jupyter_server's
interactive prompt on shutdown by stopping the IO loop.
* This does disable the interactive prompt, but also causes SIGINT
to be ignored causing SIGTERM to be issued after the timeout is hit.
* Closing the IO loop also prevents the server from closing async resources.
* This change allows jupyter_server to run its cleanup logic as
intended.
Secure contexts are a more robust way of checking that a browsing context
is authenticated and confidential. Compared to comparing the scheme this
covers cases where the connection is encrypted, but using a broken algorithm.
Notably, localhost is considered a secure context, even over HTTP.
For more detail on secure contexts, see:
https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts
cancels start rather than waiting for it to finish or timeout
also fixes cancellation when start_timeout is reached, which was previously left running forever
While doing https://github.com/jupyterhub/jupyterhub/pull/2726,
I realized we don't have a consistent way to format references
inside the docs. I now have them be formatted to match the name
of the file, but using `:` to separate them instead of `/` or `-`.
`/` makes it ambiguous when using with markdown link syntax, as
it could be a reference or a file. And using `-` is ambiguous, as
that can be the name of the file itself.
This PR does about half, I can do the other half later (unless
someone else does).
Allows limiting max expiration of tokens created via the API
Only affects the POST /api/tokens endpoint, not tokens issued by other means or created prior to config
Similar to 'kubespawner_override' in KubeSpawner, this allows
admins to selectivel override spawner configuration based on
groups a user belongs to. This allows for low maintenance but
extremely powerful customization based on group membership.
This is particularly powerful when combined with
https://github.com/jupyterhub/oauthenticator/pull/735
\#\# Dictionary vs List
Ordering is important here, but still I choose to implement this
configuration as a dictionary of dictionaries vs a list. This is
primarily to allow for easy overriding in z2jh (and similar places),
where Lists are just really hard to override. Ordering is provided
by lexicographically sorting the keys, similar to how we do it in z2jh.
\#\# Merging config
The merging code is literally copied from KubeSpawner, and provides
the exact same behavior. Documentation of how it acts is also copied.
- Missing `form-control` on a textbox gave it weird padding,
this fixes it.
- Add new server is set up as a [button addon](https://getbootstrap.com/docs/5.3/forms/input-group/#button-addons)
- Add a little right margin to the username in the navbar,
just before the logout button. Otherwise they were 'stuck'
to each other
set offset -> request page -> response sets offset is a recipe for races
instead, send request with new offset and only update offset state
made easier by consolidating page update requests into single loadPageData
- the asynccontextmanager object is available in the standard contextlib
module since Pyhton 3.7
- the aclosing object is available in the standard contextlib module
since Pyhton 3.10
- JupyterHub currently requires Python 3.8 or newer
we had some joins to trigger eager loading,
but then `query.count()` returns the count of (user, group) and (user, spawner) pairs, not the count of users
but the `joinedload` options added later are the right way to do that,
so these joins are unnecessary
GitHub Actions starts a log expansion group when it sees the string `[group]`,
which happens when that is a parametrize argument
which results in collapsing all subsequent test outputs
pytest.param lets us assign an id that is used in output, but not the value itself
Four tests were not using a mock authenticator:
- two get `reset_managed_roles_on_startup` toggled
- two get a custom implementation of `load_managed_roles`
'No template for 404' looks like something's wrong, when all it means to convey is that it doesn't get _special_ treatment
and the default error page is enough.
sevices/auth prevents calling check_xsrf_cookie,
but if the Handler itself called it the newly strict check would still be applied
this ensures the check is actually allowed for navigate GET requests
rather than attempting to clear multiple tokens (too complicated, breaks named servers)
look for and accept first valid token
have to do our own cookie parsing because existing cookie implementations only return a single value for each key
and default to selecting the _least_ likely to be correct, according to RFCs.
set updated xsrf cookie on login to avoid needing two requests to get the right cookie
# Conflicts:
# jupyterhub/tests/test_services_auth.py
need to inject our override into the base class,
rather than at the instance level,
to avoid clobbering any overrides in extensions like jupyter-server-proxy
almost every time installing docs/requirements.txt happens, JupyterHub is already installed
adding an `--editable` here ensures a full rebuild happens every time, which is very slow
- `expected_refresh_groups` was not used in this test case,
my guess is that it was accidentally copied over from
`test_auth_managed_groups`
- `expected_authenticated_groups` was defined only to define
the unused `expected_refresh_groups`
- `getRoleNames` was not following snake_case
and test coverage for allow_all and allow_existing_users interactions
PAMAuthenticator.allowed_groups is no longer mutually exclusive with allowed_users
Configuring `Authenticator.allowed_users` truthy makes other existing
users in JupyterHub's database be allowed access, this could come as a
surprise. This new config is meant to help avoid such surprise. With
this new config, a JupyterHub admin is able to directly declare if the
existing users in JupyterHub's database is to be granted access or not.
If `allow_existing_users` isn't explicity set, the default value will
be computed to True or False depending on if `allowed_users` is Truthy,
which makes the introduction of this config a non-breaking change.
This configuration was initially introduced in jupyterhub/oauthenticator
via https://github.com/jupyterhub/oauthenticator/pull/631, and is with
this PR being upstreamed to the base Authenticator class.
next to name filter, so it's not in the table headings
merges Running & Actions columns,
since it's really just Actions now (server actions & user actions)
removes in-page sort, which removes sort by server name, sort by running
Running column switches from sort to filter, matching the `?state` query parameter in the API
needs some CSS on the column widths to avoid jumps when toggling active servers
- persist offset, limit, name_filter in URL parameters,
so they are stable across page reload
- add UI element to specify items per page
This allows specifying a URL, which will show a specific view of a page of users
- accept 0 meaning no expiration, since folks have tried to use it that way
- clear error message for invalid (e.g. negative) values
- specify example in rest api doc so it doesn't default to invalid `0`
- better error if orm token fails to be retrieved
In my testing, Flask 3.0.0 doesn't accept returning only an integer
(as an error code) in a handler. A (content, status) tuple does
work. I don't know if this is a recent change, or if this has always
been broken, but the tuple return should be good for older Flask
versions as well.
For ordinary users to access the service, they need an appropriate
scope added to the user role. This adds that role in the
jupyterhub_config.py, as well as a note about this in the README.
It also updates the ouptut that comes form the whoami service.
$host is the hostname, $http_host is `hostname[:port]`, which is what's needed here
$host works fine in the example because it uses the default port 80, but if it's on a different port
it will differ from the http Host header, resulting in cross-origin check errors.
I just went through these with @jmunroe, and found the
db step a little confusing - there is no action to really be
taken here, as pretty much everyone just uses sqlite for
development (and even production). So I've just removed that
step, as python almost always ships with sqlite built into it.
JupyterHub uses semantic versioning and has been >1.0.0 for a long time. It should be fine for the hub and singleuser versions to differ in their minor component.
only use a random token as the actual oauth state,
and use a local cache dict to store the extra info like cookie_name, next_url
this avoids the state field getting too big and passing local browser-server info to anyone else
In the unusual situation of:
1. both sides having a filter on `servers`
2. those filters being different
3. _and_ some permissions are granted via higher-level group or user membership
Setting the full bind_url seems increasingly cumbersome,
as often one only wants to change the url prefix or the ip,
rather than setting the whole thing.
Add debug messages and timers for start and end waiting for servers
and improve logic for awaiting proxy endpoints using concurrency primitives instead of a for-loop
- Roles need to be explicitly granted, otherwise you get a
403. This example predates roles.
- Explicitly set bind_url - without this, JupyterHub itself doesn't
seem to bind anywhere, and so you just get a 404 when you visit
whatever port configurable-http-proxy lands on. This is probably
a separate bug to be investigated, but in the meantime copying
this from testing/jupyterhub_config.py makes this example actually
work
- Set DummyAuthenticator as the default, so users can get started
with this example
While it seems trivial, this can be a bit convoluted to debug on macOS
because some of the services might not be visible to the user logged in.
The solution is simple however knowing why it is needed is a good thing.
add eager loading of several relationships that are ~always used when the given objects are requested
add specific eager loading of spawners to the users query
- roles, groups (always needed to resolve permissions)
- APIToken.user, service
query on users filtered by spawner, with joinedload for relationships that will be checked
gets the same results, but in a single query with more efficient lookups
- avoid lookup-by-name of user and admin roles when assigning them
- filter users-to-update to only those that need updating, which should usually be empty
No change in behavior
rather than using multi-level subdomains, which are nicer,
use `--user` and `--service` so it's only one DNS level below hub.
This is not as nice, but is compatible with wildcard SSL which only allows one level of separation.
- "Missing or invalid credentials" if no credentials at all
- fix HTTP method name on actual xsrf check failures
- show scopes if authenticated but not authorized (no change, but now tested)
adding users via config anywhere makes them allowed
previously, this was _required_, so that it was always true for working config,
but config which allowed some users but declared others in groups or roles was forbidden.
Now, declaring a user anywhere _ensures_ the user is allowed rather than _enforcing_ it.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
[](https://github.com/jupyterhub/jupyterhub/actions)
[](https://github.com/jupyterhub/jupyterhub/actions)
document.body.innerText="Rendered API specification doesn't work with file: protocol. Use sphinx-autobuild to do local builds of the docs, served over HTTP."
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
@@ -11,7 +11,7 @@ can find them under the [jupyterhub/tests](https://github.com/jupyterhub/jupyter
## Running the tests
## Running the tests
1. Make sure you have completed {ref}`contributing/setup`.
1. Make sure you have completed {ref}`contributing:setup`.
Once you are done, you would be able to run `jupyterhub` from the command line and access it from your web browser.
Once you are done, you would be able to run `jupyterhub` from the command line and access it from your web browser.
This ensures that the dev environment is properly set up for tests to run.
This ensures that the dev environment is properly set up for tests to run.
@@ -126,7 +126,7 @@ For more information on asyncio and event-loops, here are some resources:
### All the tests are failing
### All the tests are failing
Make sure you have completed all the steps in {ref}`contributing/setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
Make sure you have completed all the steps in {ref}`contributing:setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
This page could be missing cross-links to other parts of
the documentation. You can help by adding them!
```
JupyterHub is not what you think it is. Most things you think are
part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
when setting up a sufficiently complex Jupyter(Hub) setup.
This document was originally written to assist in debugging: very
often, the actual problem is not where one thinks it is and thus
people can't easily debug. In order to tell this story, we start at
JupyterHub and go all the way down to the fundamental components of
Jupyter.
In this document, we occasionally leave things out or bend the truth
where it helps in explanation, and give our explanations in terms of
Python even though Jupyter itself is language-neutral. The "(&)"
symbol highlights important points where this page leaves out or bends
the truth for simplification of explanation, but there is more if you
dig deeper.
This guide is long, but after reading it you will be know of all major
components in the Jupyter ecosystem and everything else you read
should make sense.
## What is Jupyter?
Before we get too far, let's remember what our end goal is. A
**Jupyter Notebook** is nothing more than a Python(&) process
which is getting commands from a web browser and displaying the output
via that browser. What the process actually sees is roughly like
getting commands on standard input(&) and writing to standard
output(&). There is nothing intrinsically special about this process
- it can do anything a normal Python process can do, and nothing more.
The **Jupyter kernel** handles capturing output and converting things
such as graphics to a form usable by the browser.
Everything we explain below is building up to this, going through many
different layers which give you many ways of customizing how this
process runs.
## JupyterHub
**JupyterHub** is the central piece that provides multi-user
login capabilities. Despite this, the end user only briefly interacts with
JupyterHub and most of the actual Jupyter session does not relate to
the hub at all: the hub mainly handles authentication and creating (JupyterHub calls it "spawning") the
single-user server. In short, anything which is related to _starting_
the user's workspace/environment is about JupyterHub, anything about
_running_ usually isn't.
If you have problems connecting the authentication, spawning, and the
proxy (explained below), the issue is usually with JupyterHub. To
debug, JupyterHub has extensive logs which get printed to its console
and can be used to discover most problems.
The main pieces of JupyterHub are:
### Authenticator
JupyterHub itself doesn't actually manage your users. It has a
database of users, but it is usually connected with some other system
that manages the usernames and passwords. When someone tries to log
in to JupyteHub, it asks the
**authenticator**([basics](authenticators),
[reference](../reference/authenticators)) if the
username/password is valid(&). The authenticator returns a username(&),
which is passed on to the spawner, which has to use it to start that
user's environment. The authenticator can also return user
groups and admin status of users, so that JupyterHub can do some
higher-level management.
The following authenticators are included with JupyterHub:
- **PAMAuthenticator** uses the standard Unix/Linux operating system
functions to check users. Roughly, if someone already has access to
the machine (they can log in by ssh), they will be able to log in to
JupyterHub without any other setup. Thus, JupyterHub fills the role
of a ssh server, but providing a web-browser based way to access the
machine.
There are [plenty of others to choose from](authenticators-reference).
You can connect to almost any other existing service to manage your
users. You either use all users from this other service (e.g. your
company), or enable only the allowed users (e.g. your group's
Github usernames). Some other popular authenticators include:
- **OAuthenticator** uses the standard OAuth protocol to verify users.
For example, you can easily use Github to authenticate your users -
people have a "click to login with Github" button. This is often
done with a allowlist to only allow certain users.
- **NativeAuthenticator** actually stores and validates its own
usernames and passwords, unlike most other authenticators. Thus,
you can manage all your users within JupyterHub only.
- There are authenticators for LTI (learning management systems),
Shibboleth, Kerberos - and so on.
The authenticator is configured with the
`c.JupyterHub.authenticator_class` configuration option in the
`jupyterhub_config.py` file.
The authenticator runs internally to the Hub process but communicates
with outside services.
If you have trouble logging in, this is usually a problem of the
authenticator. The authenticator logs are part of the the JupyterHub
logs, but there may also be relevant information in whatever external
services you are using.
### Spawner
The **spawner** ([basics](spawners),
[reference](../reference/spawners)) is the real core of
JupyterHub: when someone wants a notebook server, the spawner allocates
resources and starts the server. The notebook server could run on the
same machine as JupyterHub, on another machine, on some cloud service,
or more. Administrators can limit resources (CPU, memory) or isolate users
from each other - if the spawner supports it. They can also do no
limiting and allow any user to access any other user's files if they
are not configured properly.
Some basic spawners included in JupyterHub are:
- **LocalProcessSpawner** is built into JupyterHub. Upon launch it tries
to switch users to the given username (`su` (&)) and start the
notebook server. It requires that the hub be run as root (because
only root has permission to start processes as other user IDs).
LocalProcessSpawner is no different than a user logging in with
something like `ssh` and running `jupyter notebook`. PAMAuthenticator and
LocalProcessSpawner is the most basic way of using JupyterHub (and
what it does out of the box) and makes the hub not too dissimilar to
an advanced ssh server.
There are [many more advanced spawners](/reference/spawners), and to
show the diversity of spawning strategys some are listed below:
- **SudoSpawner** is like LocalProcessSpawner but lets you run
JupyterHub without root. `sudo` has to be configured to allow the
hub's user to run processes under other user IDs.
- **SystemdSpawner** uses Systemd to start other processes. It can
isolate users from each other and provide resource limiting.
- **DockerSpawner** runs stuff in Docker, a containerization system.
This lets you fully isolate users, limit CPU, memory, and provide
other container images to fully customize the environment.
- **KubeSpawner** runs on the Kubernetes, a cloud orchestration
system. The spawner can easily limit users and provide cloud
scaling - but the spawner doesn't actually do that, Kubernetes
does. The spawner just tells Kubernetes what to do. If you want to
get KubeSpawner to do something, first you would figure out how to
do it in Kubernetes, then figure out how to tell KubeSpawner to tell
Kubernetes that. Actually... this is true for most spawners.
- **BatchSpawner** runs on computer clusters with batch job scheduling
systems (e.g Slurm, HTCondor, PBS, etc). The user processes are run
as batch jobs, having access to all the data and software that the
users normally will.
In short, spawners are the interface to the rest of the operating
system, and to configure them right you need to know a bit about how
the corresponding operating system service works.
The spawner is responsible for the environment of the single-user
notebook servers (described in the next section). In the end, it just
makes a choice about how to start these processes: for example, the
Docker spawner starts a normal Docker container and runs the right
command inside of it. Thus, the spawner is responsible for setting
what kind of software and data is available to the user.
The spawner runs internally to the Hub process but communicates with
outside services. It is configured by `c.JupyterHub.spawner_class` in
`jupyterhub_config.py`.
If a user tries to launch a notebook server and it doesn't work, the
error is usually with the spawner or the notebook server (as described
in the next section). Each spawner outputs some logs to the main
JupyterHub logs, but may also have logs in other places depending on
what services it interacts with (for example, the Docker spawner
somehow puts logs in the Docker system services, Kubernetes through
the `kubectl` API).
### Proxy
The JupyterHub **proxy** relays connections between the users
and their single-user notebook servers. What this basically means is
that the hub itself can shut down and the proxy can continue to
allow users to communicate with their notebook servers. (This
further emphasizes that the hub is responsible for starting, not
running, the notebooks). By default, the hub starts the proxy
automatically
and stops the proxy when the hub stops (so that connections get
interrupted). But when you [configure the proxy to run
separately](howto:separate-proxy),
user's connections will continue to work even without the hub.
The default proxy is **ConfigurableHttpProxy** which is simple but
effective. A more advanced option is the [**Traefik Proxy**](https://blog.jupyter.org/introducing-traefikproxy-a-new-jupyterhub-proxy-based-on-traefik-4839e972faf6),
which gives you redundancy and high-availability.
When users "connect to JupyterHub", they _always_ first connect to the
proxy and the proxy relays the connection to the hub. Thus, the proxy
is responsible for SSL and accepting connections from the rest of the
internet. The user uses the hub to authenticate and start the server,
and then the hub connects back to the proxy to adjust the proxy routes
for the user's server (e.g. the web path `/user/someone` redirects to
the server of someone at a certain internal address). The proxy has
to be able to internally connect to both the hub and all the
single-user servers.
The proxy always runs as a separate process to JupyterHub (even though
JupyterHub can start it for you). JupyterHub has one set of
configuration options for the proxy addresses (`bind_url`) and one for
the hub (`hub_bind_url`). If `bind_url` is given, it is just passed to
the automatic proxy to tell it what to do.
If you have problems after users are redirected to their single-user
notebook servers, or making the first connection to the hub, it is
usually caused by the proxy. The ConfigurableHttpProxy's logs are
mixed with JupyterHub's logs if it's started through the hub (the
default case), otherwise from whatever system runs the proxy (if you
do configure it, you'll know).
### Services
JupyterHub has the concept of **services** ([basics](tutorial:services),
[reference](services-reference)), which are other web services
started by the hub, but otherwise are not necessarily related to the
hub itself. They are often used to do things related to Jupyter
(things that user interacts with, usually not the hub), but could
always be run some other way. Running from the hub provides an easy
way to get Hub API tokens and authenticate users against the hub. It
can also automatically add a proxy route to forward web requests to
@@ -82,7 +82,7 @@ Additionally, there is usually _very_ little load on the database itself.
By far the most taxing activity on the database is the 'list all users' endpoint, primarily used by the [idle-culling service](https://github.com/jupyterhub/jupyterhub-idle-culler).
By far the most taxing activity on the database is the 'list all users' endpoint, primarily used by the [idle-culling service](https://github.com/jupyterhub/jupyterhub-idle-culler).
Database-based optimizations have been added to make even these operations feasible for large numbers of users:
Database-based optimizations have been added to make even these operations feasible for large numbers of users:
1. State filtering on [GET /hub/api/users?state=active](../reference/rest-api.html#/default/get_users){.external},
1. State filtering on [GET /hub/api/users?state=active](rest-api-get-users),
which limits the number of results in the query to only the relevant subset (added in JupyterHub 1.3), rather than all users.
which limits the number of results in the query to only the relevant subset (added in JupyterHub 1.3), rather than all users.
2. [Pagination](api-pagination) of all list endpoints, allowing the request of a large number of resources to be more fairly balanced with other Hub activities across multiple requests (added in 2.0).
2. [Pagination](api-pagination) of all list endpoints, allowing the request of a large number of resources to be more fairly balanced with other Hub activities across multiple requests (added in 2.0).
@@ -143,14 +143,14 @@ We recommend using PostgreSQL for production if you are unsure whether to use
MySQL or PostgreSQL or if you do not have a strong preference.
MySQL or PostgreSQL or if you do not have a strong preference.
There is additional configuration required for MySQL that is not needed for PostgreSQL.
There is additional configuration required for MySQL that is not needed for PostgreSQL.
For example, to connect to a postgres database with psycopg2:
For example, to connect to a PostgreSQL database with psycopg2:
1. install psycopg2: `pip instal psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
1. install psycopg2: `pip install psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
@@ -5,6 +7,7 @@ _Explanation_ documentation provide big-picture descriptions of how JupyterHub w
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services`
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services-reference`
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
This is primarily implemented in the {class}`~.HubOAuth` class.
This is primarily implemented in the {class}`~.HubOAuth` class.
@@ -104,6 +104,6 @@ But technically, all JupyterHub cares about is that it is:
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services`.
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services-reference`.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
JupyterHub is designed to be a _simple multi-user server for modestly sized
JupyterHub is designed to be a _simple multi-user server for modestly sized
groups_ of **semi-trusted** users. While the design reflects serving
groups_ of **semi-trusted** users. While the design reflects serving
semi-trusted users, JupyterHub can also be suitable for serving **untrusted** users.
semi-trusted users, JupyterHub can also be suitable for serving **untrusted** users,
but **is not suitable for untrusted users** in its default configuration.
As a result, using JupyterHub with **untrusted** users means more work by the
As a result, using JupyterHub with **untrusted** users means more work by the
administrator, since much care is required to secure a Hub, with extra caution on
administrator, since much care is required to secure a Hub, with extra caution on
@@ -52,33 +53,69 @@ ensure that:
their single-user server;
their single-user server;
- the modification of the configuration of the notebook server
- the modification of the configuration of the notebook server
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
- unrestricted selection of the base environment (e.g. the image used in container-based Spawners)
If any additional services are run on the same domain as the Hub, the services
If any additional services are run on the same domain as the Hub, the services
**must never** display user-authored HTML that is neither _sanitized_ nor _sandboxed_
**must never** display user-authored HTML that is neither _sanitized_ nor _sandboxed_
(e.g. IFramed) to any user that lacks authentication as the author of a file.
to any user that lacks authentication as the author of a file.
### Sharing access to servers
Because sharing access to servers (via `access:servers` scopes or the sharing feature in JupyterHub 5) by definition means users can serve each other files, enabling sharing is not suitable for untrusted users without also enabling per-user domains.
JupyterHub does not enable any sharing by default.
## Mitigate security issues
## Mitigate security issues
The several approaches to mitigating security issues with configuration
The several approaches to mitigating security issues with configuration
options provided by JupyterHub include:
options provided by JupyterHub include:
### Enable subdomains
(subdomains)=
### Enable user subdomains
JupyterHub provides the ability to run single-user servers on their own
JupyterHub provides the ability to run single-user servers on their own
subdomains. This means the cross-origin protections between servers has the
domains. This means the cross-origin protections between servers has the
desired effect, and user servers and the Hub are protected from each other. A
desired effect, and user servers and the Hub are protected from each other.
user's single-user server will be at `username.jupyter.mydomain.com`. This also
requires all user subdomains to point to the same address, which is most easily
**Subdomains are the only way to reliably isolate user servers from each other.**
accomplished with wildcard DNS. Since this spreads the service across multiple
domains, you will need wildcard SSL as well. Unfortunately, for many
To enable subdomains, set:
institutional domains, wildcard DNS and SSL are not available. **If you do plan
to serve untrusted users, enabling subdomains is highly encouraged**, as it
When subdomains are enabled, each user's single-user server will be at e.g. `https://username.jupyter.example.org`.
This also requires all user subdomains to point to the same address,
which is most easily accomplished with wildcard DNS, where a single A record points to your server and a wildcard CNAME record points to your A record:
```
A jupyter.example.org 192.168.1.123
CNAME *.jupyter.example.org jupyter.example.org
```
Since this spreads the service across multiple domains, you will likely need wildcard SSL as well,
matching `*.jupyter.example.org`.
Unfortunately, for many institutional domains, wildcard DNS and SSL may not be available.
We also **strongly encourage** serving JupyterHub and user content on a domain that is _not_ a subdomain of any sensitive content.
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/2013-04-09-yummy-cookies-across-domains/).
**If you do plan to serve untrusted users, enabling subdomains is highly encouraged**,
as it resolves many security issues, which are difficult to unavoidable when JupyterHub is on a single-domain.
:::{important}
JupyterHub makes no guarantees about protecting users from each other unless subdomains are enabled.
If you want to protect users from each other, you **_must_** enable per-user domains.
:::
### Disable user config
### Disable user config
If subdomains are unavailable or undesirable, JupyterHub provides a
If subdomains are unavailable or undesirable, JupyterHub provides a
configuration option `Spawner.disable_user_config`, which can be set to prevent
configuration option `Spawner.disable_user_config = True`, which can be set to prevent
the user-owned configuration files from being loaded. After implementing this
the user-owned configuration files from being loaded. After implementing this
option, `PATH`s and package installation are the other things that the
option, `PATH`s and package installation are the other things that the
admin must enforce.
admin must enforce.
@@ -88,21 +125,24 @@ admin must enforce.
For most Spawners, `PATH` is not something users can influence, but it's important that
For most Spawners, `PATH` is not something users can influence, but it's important that
the Spawner should _not_ evaluate shell configuration files prior to launching the server.
the Spawner should _not_ evaluate shell configuration files prior to launching the server.
### Isolate packages using virtualenv
### Isolate packages in a read-only environment
Package isolation is most easily handled by running the single-userserver in
The user must not have permission to install packages into the environment where the singleuser-server runs.
a virtualenv with disabled system-site-packages. The user should not have
On a shared system, package isolation is most easily handled by running the single-user server in
permission to install packages into this environment.
a root-owned virtualenv with disabled system-site-packages.
The user must not have permission to install packages into this environment.
The same principle extends to the images used by container-based deployments.
If users can select the images in which their servers run, they can disable all security for their own servers.
It is important to note that the control over the environment only affects the
It is important to note that the control over the environment is only required for the
single-user server, and not the environment(s) in which the user's kernel(s)
single-user server, and not the environment(s) in which the users' kernel(s)
may run. Installing additional packages in the kernel environment does not
may run. Installing additional packages in the kernel environment does not
pose additional risk to the web application's security.
pose additional risk to the web application's security.
### Encrypt internal connections with SSL/TLS
### Encrypt internal connections with SSL/TLS
By default, all communications on the server, between the proxy, hub, and single
By default, all communications within JupyterHub—between the proxy, hub, and single
-user notebooksare performed unencrypted. Setting the `internal_ssl` flag in
-user notebooks—are performed unencrypted. Setting the `internal_ssl` flag in
`jupyterhub_config.py` secures the aforementioned routes. Turning this
`jupyterhub_config.py` secures the aforementioned routes. Turning this
feature on does require that the enabled `Spawner` can use the certificates
feature on does require that the enabled `Spawner` can use the certificates
generated by the `Hub` (the default `LocalProcessSpawner` can, for instance).
generated by the `Hub` (the default `LocalProcessSpawner` can, for instance).
@@ -116,6 +156,104 @@ Unix permissions to the communication sockets thereby restricting
communication to the socket owner. The `internal_ssl` option will eventually
communication to the socket owner. The `internal_ssl` option will eventually
extend to securing the `tcp` sockets as well.
extend to securing the `tcp` sockets as well.
### Mitigating same-origin deployments
While per-user domains are **required** for robust protection of users from each other,
you can mitigate many (but not all) cross-user issues.
First, it is critical that users cannot modify their server environments, as described above.
Second, it is important that users do not have `access:servers` permission to any server other than their own.
If users can access each others' servers, additional security measures must be enabled, some of which come with distinct user-experience costs.
Without the [Same-Origin Policy] (SOP) protecting user servers from each other,
each user server is considered a trusted origin for requests to each other user server (and the Hub itself).
Servers _cannot_ meaningfully distinguish requests originating from other user servers,
because SOP implies a great deal of trust, losing many restrictions applied to cross-origin requests.
That means pages served from each user server can:
1. arbitrarily modify the path in the Referer
2. make fully authorized requests with cookies
3. access full page contents served from the hub or other servers via popups
JupyterHub uses distinct xsrf tokens stored in cookies on each server path to attempt to limit requests across.
This has limitations because not all requests are protected by these XSRF tokens,
and unless additional measures are taken, the XSRF tokens from other user prefixes may be retrieved.
`allow-popups` is not disabled by default because disabling it breaks legitimate functionality, like "Open this in a new tab", and the "JupyterHub Control Panel" menu item.
To reiterate, the right way to avoid these issues is to enable per-user domains, where none of these concerns come up.
Note: even this level of protection requires administrators maintaining full control over the user server environment.
If users can modify their server environment, these methods are ineffective, as users can readily disable them.
### Cookie tossing
Cookie tossing is a technique where another server on a subdomain or peer subdomain can set a cookie
which will be read on another domain.
This is not relevant unless there are other user-controlled servers on a peer domain.
"Domain-locked" cookies avoid this issue, but have their own restrictions:
- JupyterHub must be served over HTTPS
- All secure cookies must be set on `/`, not on sub-paths, which means they are shared by all JupyterHub components in a single-domain deployment.
As a result, this option is only recommended when per-user subdomains are enabled,
to prevent sending all jupyterhub cookies to all user servers.
To enable domain-locked cookies, set:
```python
c.JupyterHub.cookie_host_prefix_enabled=True
```
```{versionadded} 4.1
```
### Forced-login
Jupyter servers can share links with `?token=...`.
JupyterHub prior to 5.0 will accept this request and persist the token for future requests.
This is useful for enabling admins to create 'fully authenticated' links bypassing login.
However, it also means users can share their own links that will log other users into their own servers,
enabling them to serve each other notebooks and other arbitrary HTML, depending on server configuration.
```{versionadded} 4.1
Setting environment variable `JUPYTERHUB_ALLOW_TOKEN_IN_URL=0` in the single-user environment can opt out of accepting token auth in URL parameters.
```
```{versionadded} 5.0
Accepting tokens in URLs is disabled by default, and `JUPYTERHUB_ALLOW_TOKEN_IN_URL=1` environment variable must be set to _allow_ token auth in URL parameters.
```
## Security audits
## Security audits
We recommend that you do periodic reviews of your deployment's security. It's
We recommend that you do periodic reviews of your deployment's security. It's
### How can I kill ports from JupyterHub-managed services that have been orphaned?
### How can I kill ports from JupyterHub-managed services that have been orphaned?
@@ -167,7 +167,7 @@ When your whole JupyterHub sits behind an organization proxy (_not_ a reverse pr
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
{ref}`services` allow processes to interact with JupyterHub's REST API. Example use-cases include:
{ref}`services-reference` allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
@@ -198,6 +198,23 @@ With a docker container, pass in the environment variable with the run command:
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
### Jupyter Notebook/Lab can be launched, but notebooks seem to hang when trying to execute a cell
This often occurs when your browser is unable to open a websocket connection to a Jupyter kernel.
#### Diagnose
Open your browser console, e.g. [Chrome](https://developer.chrome.com/docs/devtools/console), [Firefox](https://firefox-source-docs.mozilla.org/devtools-user/web_console/).
If you see errors related to opening websockets this is likely to be the problem.
#### Solutions
This could be caused by anything related to the network between your computer/browser and the server running JupyterHub, such as:
- reverse proxies (see {ref}`howto:config:reverse-proxy` for example configurations)
- anti-virus or firewalls running on your computer or JupyterHub server
- transparent proxies running on your network
## How do I...?
## How do I...?
### Use a chained SSL certificate
### Use a chained SSL certificate
@@ -259,17 +276,6 @@ the entire filesystem and set the default to the user's home directory.
c.Spawner.notebook_dir = '/'
c.Spawner.notebook_dir = '/'
c.Spawner.default_url = '/home/%U' # %U will be replaced with the username
c.Spawner.default_url = '/home/%U' # %U will be replaced with the username
### How do I increase the number of pySpark executors on YARN?
From the command line, pySpark executors can be configured using a command
[Cloudera documentation for configuring spark on YARN applications](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps)
provides additional information. The [pySpark configuration documentation](https://spark.apache.org/docs/0.9.0/configuration.html)
is also helpful for programmatic configuration examples.
### How do I use JupyterLab's pre-release version with JupyterHub?
### How do I use JupyterLab's pre-release version with JupyterHub?
While JupyterLab is still under active development, we have had users
While JupyterLab is still under active development, we have had users
@@ -300,6 +306,52 @@ notebook servers to default to JupyterLab:
Users will need a GitHub account to log in and be authenticated by the Hub.
Users will need a GitHub account to log in and be authenticated by the Hub.
### I'm seeing "403 Forbidden XSRF cookie does not match POST" when users try to login
During login, JupyterHub takes the request IP into account for CSRF protection.
If proxies are not configured to properly set forwarded ips,
JupyterHub will see all requests as coming from an internal ip,
likely the ip of the proxy itself.
You can see this in the JupyterHub logs, which log the ip address of requests.
If most requests look like they are coming from a small number `10.0.x.x` or `172.16.x.x` ips, the proxy is not forwarding the true request ip properly.
If the proxy has multiple replicas,
then it is likely the ip may change from one request to the next,
leading to this error during login:
> 403 Forbidden XSRF cookie does not match POST argument
The best way to fix this is to ensure your proxies set the forwarded headers, e.g. for nginx:
If you were using subdomains before, some user servers and all services will be on different hosts in the default configuration.
JupyterHub 5 allows complete customization of the subdomain scheme via the new {attr}`.JupyterHub.subdomain_hook`,
and changes the default subdomain scheme.
.
You can provide a completely custom subdomain scheme, or select one of two default implementations by name: `idna` or `legacy`. `idna` is the default.
The new default behavior can be selected explicitly via:
```python
c.JupyterHub.subdomain_hook="idna"
```
Or to delay any changes to URLs for your users, you can opt-in to the pre-5.0 behavior with:
```python
c.JupyterHub.subdomain_hook="legacy"
```
The key differences of the new `idna` scheme:
- It should always produce valid domains, regardless of username (not true for the legacy scheme when using characters that might need escaping or usernames that are long)
- each Service gets its own subdomain on `service--` rather than sharing `services.`
Below is a table of examples of users and services with their domains with the old and new scheme, assuming the configuration:
| user | laudna | `laudna.jupyter.example.org` | `laudna.jupyter.example.org` |
| service | bells | `services.jupyter.example.org` | `bells--service.jupyter.example.org` |
| user | jester@mighty.nein | `jester_40mighty.nein.jupyter.example.org` (may not work!) | `u-jestermi--8037680.jupyter.example.org` (not as pretty, but guaranteed to be valid and not collide) |
## Tokens in URLs
JupyterHub 5 does not accept `?token=...` URLs by default in single-user servers.
These URLs allow one user to force another to login as them,
which can be the start of an inter-user attack.
There is a valid use case for producing links which allow starting a fully authenticated session,
so you may still opt in to this behavior by setting:
if you are not concerned about protecting your users from each other.
If you have subdomains enabled, the threat is substantially reduced.
## Sharing
The big new feature in JupyterHub 5.0 is sharing.
Check it out in [the sharing docs](sharing-tutorial).
## Authenticator.allow_all and allow_existing_users
Prior to JupyterHub 5, JupyterHub Authenticators had the _implicit_ default behavior to allow any user who successfully authenticates to login **if no users are explicitly allowed** (i.e. `allowed_users` is empty on the base class).
This behavior was considered a too-permissive default in Authenticators that source large user pools like OAuthenticator, which would accept e.g. all users with a Google account by default.
As a result, OAuthenticator 16 introduced two configuration options: `allow_all` and `allow_existing_users`.
JupyterHub 5 adopts these options for all Authenticators:
1.`Authenticator.allow_all` (default: False)
2.`Authenticator.allow_existing_users` (default: True if allowed_users is non-empty, False otherwise)
having the effect that _some_ allow configuration is required for anyone to be able to login.
If you want to preserve the pre-5.0 behavior with no explicit `allow` configuration, set:
```python
c.Authenticator.allow_all=True
```
`allow_existing_users` defaults are meant to be backward-compatible, but you can now _explicitly_ allow or not based on presence in the database by setting `Authenticator.allow_existing_users` to True or False.
:::{seealso}
[Authenticator config docs](authenticators) for details on these and other Authenticator options.
:::
## Bootstrap 5
JupyterHub uses the CSS framework [bootstrap](https://getbootstrap.com), which is upgraded from 3.4 to 5.3.
If you don't have any custom HTML templates, you are likely to only see relatively minor aesthetic changes.
If you have custom HTML templates or spawner options forms, they may need some updating to look right.
See the bootstrap documentation. Since we upgraded two major versions, you might need to look at both v4 and v5 documentation for what has changed since 3.x:
- [migrating to v4](https://getbootstrap.com/docs/4.6/migration/)
- [migrating to v5](https://getbootstrap.com/docs/5.3/migration/)
If you customized the JupyterHub CSS by recompiling from LESS files, bootstrap migrated to SCSS.
You can start by autoconverting your LESS to SCSS (it's not that different) with [less2sass](https://github.com/ekryski/less2sass):
```bash
npm install --global less2scss
# converts less/foo.less to scss/foo.scss
less2scss --src ./less --dst ./scss
```
Bootstrap also allows configuring things with [CSS variables](https://getbootstrap.com/docs/5.3/customize/css-variables/), so depending on what you have customized, you may be able to get away with just adding a CSS file defining variables without rebuilding the whole SCSS.
## groups required with Authenticator.manage_groups
Setting `Authenticator.manage_groups = True` allows the Authenticator to manage group membership by returning `groups` from the authentication model.
However, this option is available even on Authenticators that do not support it, which led to confusion.
Starting with JupyterHub 5, if `manage_groups` is True `authenticate`_must_ return a groups field, otherwise an error is raised.
This prevents confusion when users enable managed groups that is not implemented.
If an Authenticator _does_ support managing groups but was not providing a `groups` field in order to leave membership unmodified, it must specify `"groups": None` to make this explicit instead of implicit (this is backward-compatible).
| `(no_scope)` | Identify the owner of the requesting entity. |
| `self` | The user’s own resources _(metascope for users, resolves to (no_scope) for services)_ |
| `inherit` | Everything that the token-owning entity can access _(metascope for tokens)_ |
| `admin-ui` | Access the admin page. Permission to take actions via the admin page granted separately. |
| `admin:users` | Read, modify, create, and delete users and their authentication state, not including their servers or tokens. This is an extremely privileged scope and should be considered tantamount to superuser. |
| `admin:auth_state` | Read a user’s authentication state. |
| `users` | Read and write permissions to user models (excluding servers, tokens and authentication state). |
| `read:users` | Read user models (including the URL of the default server if it is running). |
| `read:users:name` | Read names of users. |
| `read:users:groups` | Read users’ group membership. |
| `read:users:activity` | Read time of last user activity. |
| `list:users` | List users, including at least their names. |
| `read:users:name` | Read names of users. |
| `users:activity` | Update time of last user activity. |
| `read:users:activity` | Read time of last user activity. |
| `read:roles:users` | Read user role assignments. |
| `read:roles:users` | Read user role assignments. |
| `read:roles:services` | Read service role assignments. |
| `read:roles:groups` | Read group role assignments. |
| `admin:servers` | Read, start, stop, create and delete user servers and their state. |
| `admin:server_state` | Read and write users’ server state. |
| `servers` | Start and stop user servers. |
| `read:servers` | Read users’ names and their server models (excluding the server state). |
| `read:users:name` | Read names of users. |
| `delete:servers` | Stop and delete users' servers. |
| `tokens` | Read, write, create and delete user tokens. |
| `read:tokens` | Read user tokens. |
| `admin:groups` | Read and write group information, create and delete groups. |
| `groups` | Read and write group information, including adding/removing any users to/from groups. Note: adding users to groups may affect permissions. |
| `read:groups` | Read group models. |
| `read:groups:name` | Read group names. |
| `list:groups` | List groups, including at least their names. |
| `read:groups:name` | Read group names. |
| `read:roles:groups` | Read group role assignments. |
@@ -178,6 +178,83 @@ Note that only the {ref}`horizontal filtering <horizontal-filtering-target>` can
Metascopes `self` and `all`, `<resource>`, `<resource>:<subresource>`, `read:<resource>`, `admin:<resource>`, and `access:<resource>` scopes are predefined and cannot be changed otherwise.
Metascopes `self` and `all`, `<resource>`, `<resource>:<subresource>`, `read:<resource>`, `admin:<resource>`, and `access:<resource>` scopes are predefined and cannot be changed otherwise.
```
```
(access-scopes)=
### Access scopes
An **access scope** is used to govern _access_ to a JupyterHub service or a user's single-user server.
This means making API requests, or visiting via a browser using OAuth.
Without the appropriate access scope, a user or token should not be permitted to make requests of the service.
When you attempt to access a service or server authenticated with JupyterHub, it will begin the [oauth flow](explanation:hub-oauth) for issuing a token that can be used to access the service.
If the user does not have the access scope for the relevant service or server, JupyterHub will not permit the oauth process to complete.
If oauth completes, the token will have at least the access scope for the service.
For minimal permissions, this is the _only_ scope granted to tokens issued during oauth by default,
but can be expanded via {attr}`.Spawner.oauth_client_allowed_scopes` or a service's [`oauth_client_allowed_scopes`](service-credentials) configuration.
:::{seealso}
[Further explanation of OAuth in JupyterHub](explanation:hub-oauth)
:::
If a given service or single-user server can be governed by a single boolean "yes, you can use this service" or "no, you can't," or limiting via other existing scopes, access scopes are enough to manage access to the service.
But you can also further control granular access to servers or services with [custom scopes](custom-scopes), to limit access to particular APIs within the service, e.g. read-only access.
#### Example access scopes
Some example access scopes for services:
access:services
: access to all services
access:services!service=somename
: access to the service named `somename`
and for user servers:
access:servers
: access to all user servers
access:servers!user
: access to all of a user's _own_ servers (never in _resolved_ scopes, but may be used in configuration)
access:servers!user=name
: access to all of `name`'s servers
access:servers!group=groupname
: access to all servers owned by a user in the group `groupname`
access:servers!server
: access to only the issuing server (only relevant when applied to oauth tokens associated with a particular server, e.g. via the {attr}`Spawner.oauth_client_allowed_scopes` configuration.
access:servers!server=username/
: access to only `username`'s _default_ server.
(granting-scopes)=
### Considerations when allowing users to grant permissions via the `groups` scope
In general, permissions are fixed by role assignments in configuration (or via [Authenticator-managed roles](#authenticator-roles) in JupyterHub 5) and can only be modified by administrators who can modify the Hub configuration.
There is only one scope that allows users to modify permissions of themselves or others at runtime instead of via configuration:
the `groups` scope, which allows adding and removing users from one or more groups.
With the `groups` scope, a user can add or remove any users to/from any group.
With the `groups!group=name` filtered scope, a user can add or remove any users to/from a specific group.
There are two ways in which adding a user to a group may affect their permissions:
- if the group is assigned one or more roles, adding a user to the group may increase their permissions (this is usually the point!)
- if the group is the _target_ of a filter on this or another group, such as `access:servers!group=students`, adding a user to the group can grant _other_ users elevated access to that user's resources.
With these in mind, when designing your roles, do not grant users the `groups` scope for any groups which:
- have roles the user should not have authority over, or
- would grant them access they shouldn't have for _any_ user (e.g. don't grant `teachers` both `access:servers!group=students` and `groups!group=students` which is tantamount to the unrestricted `access:servers` because they control which users the `group=students` filter applies to).
If a group does not have role assignments and the group is not present in any `!group=` filter, there should be no permissions-related consequences for adding users to groups.
:::{note}
The legacy `admin` property of users, which grants extreme superuser permissions and is generally discouraged in favor of more specific roles and scopes, may be modified only by other users with the `admin` property (e.g. added via `admin_users`).
:::
(custom-scopes)=
(custom-scopes)=
### Custom scopes
### Custom scopes
@@ -298,8 +375,24 @@ class MyHandler(HubOAuthenticated, BaseHandler):
Existing scope filters (`!user=`, etc.) may be applied to custom scopes.
Existing scope filters (`!user=`, etc.) may be applied to custom scopes.
Custom scope _filters_ are NOT supported.
Custom scope _filters_ are NOT supported.
:::{warning}
JupyterHub allows you to define custom scopes,
but it does not enforce that your services apply them.
For example, if you enable read-only access to servers via custom JupyterHub
(as seen in the `read-only` example),
it is the administrator's responsibility to enforce that they are applied.
If you allow users to launch servers without that custom Authorizer,
read-only permissions will not be enforced, and the default behavior of unrestricted access via the `access:servers` scope will be applied.
:::
### Scopes and APIs
### Scopes and APIs
The scopes are also listed in the [](jupyterhub-rest-API) documentation. Each API endpoint has a list of scopes which can be used to access the API; if no scopes are listed, the API is not authenticated and can be accessed without any permissions (i.e., no scopes).
The scopes are also listed in the [](jupyterhub-rest-API) documentation.
Each API endpoint has a list of scopes which can be used to access the API;
if no scopes are listed, the API is not authenticated and can be accessed without any permissions (i.e., no scopes).
Listed scopes by each API endpoint reflect the "lowest" permissions required to gain any access to the corresponding API. For example, posting user's activity (_POST /users/:name/activity_) needs `users:activity` scope. If scope `users` is passed during the request, the access will be granted as the required scope is a subscope of the `users` scope. If, on the other hand, `read:users:activity` scope is passed, the access will be denied.
Listed scopes by each API endpoint reflect the "lowest" permissions required to gain any access to the corresponding API.
For example, posting user's activity (_POST /users/:name/activity_) needs `users:activity` scope.
If scope `users` is held by the request, the access will be granted as the required scope is a subscope of the `users` scope.
If, on the other hand, `read:users:activity` scope is the only scope held, the request will be denied.
@@ -11,7 +11,7 @@ No other database records are affected.
## Upgrade steps
## Upgrade steps
1. All running **servers must be stopped** before proceeding with the upgrade.
1. All running **servers must be stopped** before proceeding with the upgrade.
2. To upgrade the Hub, follow the [Upgrading JupyterHub](upgrading-jupyterhub) instructions.
2. To upgrade the Hub, follow the [Upgrading JupyterHub](howto:upgrading-jupyterhub) instructions.
```{attention}
```{attention}
We advise against defining any new roles in the `jupyterhub.config.py` file right after the upgrade is completed and JupyterHub restarted for the first time. This preserves the 'current' state of the Hub. You can define and assign new roles on any other following startup.
We advise against defining any new roles in the `jupyterhub.config.py` file right after the upgrade is completed and JupyterHub restarted for the first time. This preserves the 'current' state of the Hub. You can define and assign new roles on any other following startup.
A [generic implementation](https://github.com/jupyterhub/oauthenticator/blob/master/oauthenticator/generic.py), which you can use for OAuth authentication with any provider, is also available.
A [generic implementation](https://github.com/jupyterhub/oauthenticator/blob/master/oauthenticator/generic.py), which you can use for OAuth authentication with any provider, is also available.
## The Dummy Authenticator
## The Dummy Authenticator
When testing, it may be helpful to use the
When testing, it may be helpful to use the {class}`~.jupyterhub.auth.DummyAuthenticator`:
{class}`jupyterhub.auth.DummyAuthenticator`. This allows for any username and
password unless if a global password has been set. Once set, any username will
```python
still be accepted but the correct password will need to be provided.
c.JupyterHub.authenticator_class="dummy"
# always a good idea to limit to localhost when testing with an insecure config
c.JupyterHub.ip="127.0.0.1"
```
This allows for any username and password to login, and is _wildly_ insecure.
To use, specify
```python
c.JupyterHub.authenticator_class="dummy"
```
:::{versionadded} 5.0
The DummyAuthenticator's default `allow_all` is True,
unlike most other Authenticators.
:::
:::{deprecated} 5.3
Setting a password on DummyAuthenticator is deprecated.
Use the new {class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator`
if you want to set a shared password for users.
:::
## Shared Password Authenticator
:::{versionadded} 5.3
{class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator` is added and [DummyAuthenticator.password](#DummyAuthenticator.password) is deprecated.
:::
For short-term deployments like workshops where there is no real user data to protect and you trust users to not abuse the system or each other,
{class}`~.jupyterhub.authenticators.shared.SharedPasswordAuthenticator` can be used.
Set a [user password](#SharedPasswordAuthenticator.user_password) for users to login:
You can also grant admin users access by adding them to `admin_users` and setting a separate [admin password](#SharedPasswordAuthenticator.admin_password):
### Registering custom Authenticators via entry points
### Registering custom Authenticators via entry points
As of JupyterHub 1.0, custom authenticators can register themselves via
As of JupyterHub 1.0, custom authenticators can register themselves via
@@ -183,6 +235,168 @@ Additionally, configurable attributes for your authenticator will
appear in jupyterhub help output and auto-generated configuration files
appear in jupyterhub help output and auto-generated configuration files
via `jupyterhub --generate-config`.
via `jupyterhub --generate-config`.
(authenticator-allow)=
### Allowing access
When dealing with logging in, there are generally two _separate_ steps:
authentication
: identifying who is trying to log in, and
authorization
: deciding whether an authenticated user is allowed to access your JupyterHub
{meth}`Authenticator.authenticate` is responsible for authenticating users.
It is perfectly fine in the simplest cases for `Authenticator.authenticate` to be responsible for authentication _and_ authorization,
in which case `authenticate` may return `None` if the user is not authorized.
However, Authenticators also have two methods, {meth}`~.Authenticator.check_allowed` and {meth}`~.Authenticator.check_blocked_users`, which are called after successful authentication to further check if the user is allowed.
If `check_blocked_users()` returns False, authorization stops and the user is not allowed.
If `Authenticator.allow_all` is True OR `check_allowed()` returns True, authorization proceeds.
:::{versionadded} 5.0
{attr}`.Authenticator.allow_all` and {attr}`.Authenticator.allow_existing_users` are new in JupyterHub 5.0.
By default, `allow_all` is False,
which is a change from pre-5.0, where `allow_all` was implicitly True if `allowed_users` was empty.
:::
### Overriding `check_allowed`
:::{versionchanged} 5.0
`check_allowed()` is **not called** if `allow_all` is True.
:::
:::{versionchanged} 5.0
Starting with 5.0, `check_allowed()` should **NOT** return True if no allow config
is specified (`allow_all` should be used instead).
:::
The base implementation of {meth}`~.Authenticator.check_allowed` checks:
- if username is in the `allowed_users` set, return True
- else return False
:::{versionchanged} 5.0
Prior to 5.0, this would also return True if `allowed_users` was empty.
For clarity, this is no longer the case. A new `allow_all` property (default False) has been added which is checked _before_ calling `check_allowed`.
If `allow_all` is True, this takes priority over `check_allowed`, which will be ignored.
If your Authenticator subclass similarly returns True when no allow config is defined,
this is fully backward compatible for your users, but means `allow_all = False` has no real effect.
You can make your Authenticator forward-compatible with JupyterHub 5 by defining `allow_all` as a boolean config trait on your class:
```python
classMyAuthenticator(Authenticator):
# backport allow_all from JupyterHub 5
allow_all=Bool(False,config=True)
defcheck_allowed(self,username,authentication):
ifself.allow_all:
# replaces previous "if no auth config"
returnTrue
...
```
:::
If an Authenticator defines additional sources of `allow` configuration,
such as membership in a group or other information,
it should override `check_allowed` to account for this.
:::{note}
`allow_` configuration should generally be _additive_,
i.e. if access is granted by _any_ allow configuration,
a user should be authorized.
JupyterHub recommends that Authenticators applying _restrictive_ configuration should use names like `block_` or `require_`,
and check this during `check_blocked_users` or `authenticate`, not `check_allowed`.
:::
In general, an Authenticator's skeleton should look like:
```python
classMyAuthenticator(Authenticator):
# backport allow_all for compatibility with JupyterHub < 5
The default is False for Authenticators that ship with JupyterHub,
The default is False for Authenticators that ship with JupyterHub,
but may be True for custom Authenticators.
but may be True for custom Authenticators.
Check your Authenticator's documentation for manage_groups support.
Check your Authenticator's documentation for `manage_groups` support.
If True, {meth}`.Authenticator.authenticate` and {meth}`.Authenticator.refresh_user` may include a field `groups`
If True, {meth}`.Authenticator.authenticate` and {meth}`.Authenticator.refresh_user` may include a field `groups`
which is a list of group names the user should be a member of:
which is a list of group names the user should be a member of:
@@ -295,7 +509,62 @@ which is a list of group names the user should be a member of:
- If `None` is returned, no changes are made to the user's group membership
- If `None` is returned, no changes are made to the user's group membership
If authenticator-managed groups are enabled,
If authenticator-managed groups are enabled,
all group-management via the API is disabled.
groups cannot be specified with `load_groups` traitlet.
:::{warning}
When `manage_groups` is True,
managing groups via the API is still permitted via the `admin:groups` scope (starting with 5.3),
but any time a user logs in their group membership is completely reset via the login process.
So it only really makes sense to make manual changes via the API that reflect upstream changes which are not automatically propagated, such as group deletion.
:::
:::{versionchanged} 5.3
Prior to JupyterHub 5.3, all group management via the API was disabled if `Authenticator.manage_groups` is True.
:::
(authenticator-roles)=
## Authenticator-managed roles
:::{versionadded} 5.0
:::
Some identity providers may have their own concept of role membership that you would like to preserve in JupyterHub.
This is now possible with {attr}`.Authenticator.manage_roles`.
You can set the config:
```python
c.Authenticator.manage_roles = True
```
to enable this behavior.
The default is False for Authenticators that ship with JupyterHub,
but may be True for custom Authenticators.
Check your Authenticator's documentation for `manage_roles` support.
If True, {meth}`.Authenticator.authenticate` and {meth}`.Authenticator.refresh_user` may include a field `roles`
which is a list of roles that user should be assigned to:
- User will be assigned each role in the list
- User will be revoked roles not in the list (but they may still retain the role privileges if they inherit the role from their group)
- Any roles not already present in the database will be created
- Attributes of the roles (`description`, `scopes`, `groups`, `users`, and `services`) will be updated if given
- If `None` is returned, no changes are made to the user's roles
If authenticator-managed roles are enabled,
all role-management via the API is disabled,
and roles cannot be assigned to groups nor users via `load_roles` traitlet
(roles can still be created via `load_roles` or assigned to services).
When an authenticator manages roles, the initial roles and role assignments
can be loaded from role specifications returned by the {meth}`.Authenticator.load_managed_roles()` method.
The authenticator-manged roles and role assignment will be deleted after restart if:
- {attr}`.Authenticator.reset_managed_roles_on_startup` is set to `True`, and
- the roles and role assignments are not included in the initial set of roles returned by the {meth}`.Authenticator.load_managed_roles()` method.
JupyterHub can be configured to record structured events from a running server using Jupyter's [Telemetry System]. The types of events that JupyterHub emits are defined by [JSON schemas] listed at the bottom of this page.
JupyterHub can be configured to record structured events from a running server using Jupyter's [Events System]. The types of events that JupyterHub emits are defined by [JSON schemas] listed at the bottom of this page.
## How to emit events
## How to emit events
Event logging is handled by its `Eventlog` object. This leverages Python's standing [logging] library to emit, filter, and collect event data.
Event logging is handled by its `EventLogger` object. This leverages Python's standing [logging] library to emit, filter, and collect event data.
To begin recording events, you'll need to set two configurations:
To begin recording events, you'll need to set at least one configuration option:
> 1. `handlers`: tells the EventLog _where_ to route your events. This trait is a list of Python logging handlers that route events to the event log file.
> `EventLogger.handlers`: tells the EventLogger _where_ to route your events. This trait is a list of Python logging handlers that route events to e.g. an event log file.
> 2. `allows_schemas`: tells the EventLog _which_ events should be recorded. No events are emitted by default; all recorded events must be listed here.
Here's a basic example:
Here's a basic example:
```
```python
importlogging
importlogging
c.EventLog.handlers = [
c.EventLogger.handlers=[
logging.FileHandler('event.log'),
logging.FileHandler('event.log'),
]
]
c.EventLog.allowed_schemas = [
'hub.jupyter.org/server-action'
]
```
```
The output is a file, `"event.log"`, with events recorded as JSON data.
The output is a file, `"event.log"`, with events recorded as JSON data.
@@ -37,6 +32,15 @@ The output is a file, `"event.log"`, with events recorded as JSON data.
server-actions
server-actions
```
```
:::{versionchanged} 5.0
JupyterHub 5.0 changes from the deprecated jupyter-telemetry to jupyter-events.
The main changes are:
- `EventLog` configuration is now called `EventLogger`
- The `hub.jupyter.org/server-action` schema is now called `https://schema.jupyter.org/jupyterhub/events/server-action`
@@ -84,14 +82,13 @@ Within CERN, there are two noteworthy JupyterHub deployments in operation:
- Advanced Computing
- Advanced Computing
- [Palmetto cluster and JupyterHub](https://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
- [Palmetto cluster and JupyterHub](https://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
### University of Colorado Boulder
### ETH Zurich
- (CU Research Computing) CURC
[ETH Zurich](https://ethz.ch/en.html), (Federal Institute of Technology Zurich), is a public research university in Zürich, Switzerland, with focus on science, technology, engineering, and mathematics, although its 16 departments span a variety of disciplines and subjects.
- [JupyterHub User Guide](https://curc.readthedocs.io/en/latest/gateways/jupyterhub.html)
The [Educational Development and Technology](https://ethz.ch/en/the-eth-zurich/organisation/departments/educational-development-and-technology.html) unit provides JupyterHub exclusively for teaching and learning, integrated in the learning management system [Moodle](https://ethz.ch/staffnet/en/teaching/academic-support/it-services-teaching/teaching-applications/moodle-service.html). Each course gets its individually configured JupyterHub environment deployed on a on-premise Kubernetes cluster.
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
- [ETH JupyterHub](https://ethz.ch/staffnet/en/teaching/academic-support/it-services-teaching/teaching-applications/jupyterhub.html) for teaching and learning
- Profiles in IPython Clusters tab
### George Washington University
### George Washington University
@@ -188,6 +185,12 @@ Within CERN, there are two noteworthy JupyterHub deployments in operation:
- [Deploying JupyterHub on Hadoop](https://jupyterhub-on-hadoop.readthedocs.io)
- [Deploying JupyterHub on Hadoop](https://jupyterhub-on-hadoop.readthedocs.io)
### Sirepo
- Sirepo is an online Computer-Aided Engineering gateway that contains a JupyterHub instance. Sirepo is provided at no cost for community use, but users must request login access.
@@ -18,3 +18,42 @@ tool like [Grafana](https://grafana.com).
/reference/metrics
/reference/metrics
```
```
## Customizing the metrics prefix
JupyterHub metrics all have a `jupyterhub_` prefix.
As of JupyterHub 5.0, this can be overridden with `$JUPYTERHUB_METRICS_PREFIX` environment variable
in the Hub's environment.
For example,
```bash
export JUPYTERHUB_METRICS_PREFIX=jupyterhub_prod
```
would result in the metric `jupyterhub_prod_active_users`, etc.
(monitoring_bucket_sizes)=
## Customizing bucket sizes
As of JupyterHub 5.3, the following environment variables in the Hub's environment can be overridden to support custom bucket sizes - below are the defaults:
Required if the redirect URI differs from the default or the service is not to be added to the proxy at `/services/:name`
(i.e. `url` is not set, but there is still a public web service using OAuth).
If a service is also to be managed by the Hub, it has a few extra options:
If a service is also to be managed by the Hub, it has a few extra options:
@@ -55,19 +62,19 @@ If a service is also to be managed by the Hub, it has a few extra options:
externally. - If a command is specified for launching the Service, the Service will
externally. - If a command is specified for launching the Service, the Service will
be started and managed by the Hub.
be started and managed by the Hub.
-`environment: dict` - additional environment variables for the Service.
-`environment: dict` - additional environment variables for the Service.
-`user: str` - the name of a system user to manage the Service. If
-`user: str` - the name of a system user to manage the Service.
unspecified, run as the same user as the Hub.
If unspecified, run as the same user as the Hub.
## Hub-Managed Services
## Hub-Managed Services
A **Hub-Managed Service** is started by the Hub, and the Hub is responsible
A **Hub-Managed Service** is started by the Hub, and the Hub is responsible
for the Service's actions. A Hub-Managed Service can only be a local
for the Service's operation. A Hub-Managed Service can only be a local
subprocess of the Hub. The Hub will take care of starting the process and
subprocess of the Hub. The Hub will take care of starting the process and
restart the service if the service stops.
restart the service if the service stops.
While Hub-Managed Services share some similarities with notebook Spawners,
While Hub-Managed Services share some similarities with single-user server Spawners,
there are no plans for Hub-Managed Services to support the same spawning
there are no plans for Hub-Managed Services to support the same spawning
abstractions as a notebook Spawner.
abstractions as a Spawner.
If you wish to run a Service in a Docker container or other deployment
If you wish to run a Service in a Docker container or other deployment
environments, the Service can be registered as an
environments, the Service can be registered as an
@@ -80,7 +87,7 @@ the Service. For example, a 'cull idle' notebook server task configured as a
Hub-Managed Service would include:
Hub-Managed Service would include:
- the Service name,
- the Service name,
- admin permissions, and
- permissions to see when users are active, and to stop servers
- the `command` to launch the Service which will cull idle servers after a
- the `command` to launch the Service which will cull idle servers after a
timeout interval
timeout interval
@@ -131,6 +138,14 @@ JUPYTERHUB_OAUTH_SCOPES: JSON-serialized list of scopes to use for allowing ac
(deprecated in 3.0, use JUPYTERHUB_OAUTH_ACCESS_SCOPES).
(deprecated in 3.0, use JUPYTERHUB_OAUTH_ACCESS_SCOPES).
JUPYTERHUB_OAUTH_ACCESS_SCOPES: JSON-serialized list of scopes to use for allowing access to the service (new in 3.0).
JUPYTERHUB_OAUTH_ACCESS_SCOPES: JSON-serialized list of scopes to use for allowing access to the service (new in 3.0).
JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES: JSON-serialized list of scopes that can be requested by the oauth client on behalf of users (new in 3.0).
JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES: JSON-serialized list of scopes that can be requested by the oauth client on behalf of users (new in 3.0).
JUPYTERHUB_PUBLIC_URL: the public URL of the service,
e.g. `https://jupyterhub.example.org/services/name/`.
Empty if no public URL is specified (default).
Will be available if subdomains are configured.
JUPYTERHUB_PUBLIC_HUB_URL: the public URL of JupyterHub as a whole,
e.g. `https://jupyterhub.example.org/`.
Empty if no public URL is specified (default).
Will be available if subdomains are configured.
```
```
For the previous 'cull idle' Service example, these environment variables
For the previous 'cull idle' Service example, these environment variables
@@ -156,8 +171,8 @@ to perform its API requests. Each Externally-Managed Service will need a
unique API token, because the Hub authenticates each API request and the API
unique API token, because the Hub authenticates each API request and the API
token is used to identify the originating Service or user.
token is used to identify the originating Service or user.
A configuration example of an Externally-Managed Service with admin access and
A configuration example of an Externally-Managed Service running its own web
running its own web server is:
server is:
```python
```python
c.JupyterHub.services=[
c.JupyterHub.services=[
@@ -174,6 +189,149 @@ c.JupyterHub.services = [
In this case, the `url` field will be passed along to the Service as
In this case, the `url` field will be passed along to the Service as
`JUPYTERHUB_SERVICE_URL`.
`JUPYTERHUB_SERVICE_URL`.
(service-credentials)=
## Service credentials
A service has direct access to the Hub API via its `api_token`.
Exactly what actions the service can take are governed by the service's [role assignments](define-role-target):
```python
c.JupyterHub.services=[
{
"name":"user-lister",
"command":["python3","/path/to/user-lister"],
}
]
c.JupyterHub.load_roles=[
{
"name":"list-users",
"scopes":["list:users","read:users"],
"services":["user-lister"]
}
]
```
When a service has a configured URL or explicit `oauth_client_id` or `oauth_redirect_uri`, it can operate as an [OAuth client](explanation:hub-oauth).
When a user visits an oauth-authenticated service,
completion of authentication results in issuing an oauth token.
This token is:
- owned by the authenticated user
- associated with the oauth client of the service
- governed by the service's `oauth_client_allowed_scopes` configuration
This token enables the service to act _on behalf of_ the user.
When an oauthenticated service makes a request to the Hub (or other Hub-authenticated service), it has two credentials available to authenticate the request:
- the service's own `api_token`, which acts _as_ the service,
and is governed by the service's own role assignments.
- the user's oauth token issued to the service during the oauth flow,
which acts _as_ the user.
Choosing which one to use depends on "who" should be considered taking the action represented by the request.
A service's own permissions governs how it can act without any involvement of a user.
The service's `oauth_client_allowed_scopes` configuration allows individual users to _delegate_ permission for the service to act on their behalf.
This allows services to have little to no permissions of their own,
but allow users to take actions _via_ the service,
using their own credentials.
An example of such a service would be a web application for instructors,
presenting a dashboard of actions which can be taken for students in their courses.
The service would need no permission to do anything with the JupyterHub API on its own,
but it could employ the user's oauth credentials to list users,
In this example, the `grader-dashboard` service does not have permission to take any actions with the Hub API on its own because it has not been assigned any role.
But when a grader accesses the service,
the dashboard will have a token with permission to list and read information about any users that the grader can access.
The dashboard will _not_ have permission to do additional things as the grader.
The dashboard will be able to:
- list users in class A (`list:users!group=class-a`)
- read information about users in class A (`read:users!group=class-a`)
The dashboard will _not_ be able to:
- start, stop, or access user servers (`servers`, `access:servers`), even though the grader has this permission (it's not in `oauth_client_allowed_scopes`)
- take any action without the grader granting permission via oauth
## Adding or removing services at runtime
Only externally-managed services can be added at runtime by using JupyterHub’s REST API.
### Add a new service
To add a new service, send a POST request to this endpoint
```
POST /hub/api/services/:servicename
```
**Required scope: `admin:services`**
**Payload**: The payload should contain the definition of the service to be created. The endpoint supports the same properties as externally-managed services defined in the config file.
**Possible responses**
-`201 Created`: The service and related objects are created (and started in case of a Hub-managed one) successfully.
-`400 Bad Request`: The payload is invalid or JupyterHub can not create the service.
-`409 Conflict`: The service with the same name already exists.
### Remove an existing service
To remove an existing service, send a DELETE request to this endpoint
```
DELETE /hub/api/services/:servicename
```
**Required scope: `admin:services`**
**Payload**: `None`
**Possible responses**
-`200 OK`: The service and related objects are removed (and stopped in case of a Hub-managed one) successfully.
-`400 Bad Request`: JupyterHub can not remove the service.
-`404 Not Found`: The requested service does not exist.
-`405 Not Allowed`: The requested service is created from the config file, it can not be removed at runtime.
## Writing your own Services
## Writing your own Services
When writing your own services, you have a few decisions to make (in addition
When writing your own services, you have a few decisions to make (in addition
@@ -237,16 +395,14 @@ There are two levels of authentication with the Hub:
This should be used for any service that serves pages that should be visited with a browser.
This should be used for any service that serves pages that should be visited with a browser.
To use HubAuth, you must set the `.api_token` instance variable. This can be
To use HubAuth, you must set the `.api_token` instance variable. This can be
done either programmatically when constructing the class, or via the
done via the HubAuth constructor, direct assignment to a HubAuth object, or via the
`JUPYTERHUB_API_TOKEN` environment variable. A number of the examples in the
`JUPYTERHUB_API_TOKEN` environment variable. A number of the examples in the
root of the jupyterhub git repository set the `JUPYTERHUB_API_TOKEN` variable
root of the jupyterhub git repository set the `JUPYTERHUB_API_TOKEN` variable
so consider having a look at those for futher reading
so consider having a look at those for further reading
In order to make use of features like JupyterLab's real-time collaboration (RTC), multiple users must have access to a single server.
There are a few ways to do this, but ultimately both users must have the appropriate `access:servers` scope.
Prior to JupyterHub 5.0, this could only be granted via static role assignments in JupyterHub configuration.
JupyterHub 5.0 adds the concept of a 'share', allowing _users_ to grant each other limited access to their servers.
:::{seealso}
Documentation on [roles and scopes](rbac) for more details on how permissions work in JupyterHub, and in particular [access scopes](access-scopes).
:::
In JupyterHub, shares:
1. are 'granted' to a user or group
2. grant only limited permissions (e.g. only 'access' or access and start/stop)
3. may be revoked by anyone with the `shares` permissions
4. may always be revoked by the shared-with user or group
Additionally a "share code" is a random string, which has all the same properties as a Share aside from the user or group.
The code can be exchanged for actual sharing permission, to enable the pattern of sharing permissions without needing to know the username(s) of who you'd like to share with (e.g. email a link).
There is not yet _UI_ to create shares, but they can be managed via JupyterHub's [REST API](jupyterhub-rest-api).
In general, with shares you can:
1. access other users' servers
2. grant access to your servers
3. see servers shared with you
4. review and revoke permissions for servers you manage
## Enable sharing
For safety, users do not have permission to share access to their servers by default.
To grant this permission, a user must have the `shares` scope for their servers.
To grant all users permission to share access to their servers:
```python
c.JupyterHub.load_roles=[
{
"name":"user",
"scopes":["self","shares!user"],
},
]
```
With this, only the sharing via invitation code described below will be available.
Additionally, to share access with a **specific user or group** (more below),
a user must have permission to read that user or group's name.
Note that this exposes the ability for all users to _discover_ existing user and group names,
which is part of why we have the share-by-code pattern,
so users don't need this ability to share with each other.
## Share or revoke access to a server
To modify who has access to a server, you need the permission `shares` with the appropriate _server_ filter,
and access to read the name of the target user or group (`read:users:name` or `read:groups:name`).
You can only modify access to one server at a time.
### Granting access to a server
To grant access to a particular user, in addition to `shares`, the granter must have at least `read:user:name` permission for the target user (or `read:group:name` if it's a group).
Send a POST request to `/api/shares/:username/:servername` to grant permissions.
This is a paginated endpoint, so responses has `items` as a list of Share models, and `_pagination` for information about retrieving all shares if there are many:
```python
{
"items": [
{
"server": {...},
"scopes": ["access:servers!server=sharer/"],
"user": {
"name": "shared-with",
},
"group": None, # or {"name": "groupname"},
...
},
...
],
"_pagination": {
"total": 5,
"limit": 50,
"offset": 0,
"next": None,
},
}
```
see the [rest-api](rest-api-get-shares-server) for full details of the response models.
### View servers shared with user or group
To review servers shared with a given user or group, you need the permission `read:users:shares` or `read:groups:shares` with the appropriate _user_ or _group_ filter.
see the [rest-api](rest-api) for full details of the response models.
### Share code model
<!-- refresh from examples/user-sharing/rest-api.ipynb -->
A Share Code returned in the REST API has most of the same fields as a Share, but lacks the association with a user or group, and adds information about exchanges of the share code,
A [Spawner][] starts each single-user notebook server.
A [Spawner](#Spawner) starts each single-user notebook server.
The Spawner represents an abstract interface to a process,
The Spawner represents an abstract interface to a process,
and a custom Spawner needs to be able to take three actions:
and a custom Spawner needs to be able to take three actions:
@@ -12,7 +12,7 @@ and a custom Spawner needs to be able to take three actions:
## Examples
## Examples
Custom Spawners for JupyterHub can be found on the [JupyterHub wiki](https://github.com/jupyterhub/jupyterhub/wiki/Spawners).
Additional Spawners can be installed from separate packages.
Some examples include:
Some examples include:
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
- [DockerSpawner](https://github.com/jupyterhub/dockerspawner) for spawning user servers in Docker containers
@@ -31,12 +31,13 @@ Some examples include:
- [SSHSpawner](https://github.com/NERSC/sshspawner) to spawn notebooks
- [SSHSpawner](https://github.com/NERSC/sshspawner) to spawn notebooks
on a remote server using SSH
on a remote server using SSH
- [KubeSpawner](https://github.com/jupyterhub/kubespawner) to spawn notebook servers on kubernetes cluster.
- [KubeSpawner](https://github.com/jupyterhub/kubespawner) to spawn notebook servers on kubernetes cluster.
- [NomadSpawner](https://github.com/mxab/jupyterhub-nomad-spawner) to spawn a notebook server as a Nomad job inside HashiCorp's Nomad cluster
## Spawner control methods
## Spawner control methods
### Spawner.start
### Spawner.start
`Spawner.start` should start a single-user server for a single user.
[](#Spawner.start) should start a single-user server for a single user.
Information about the user can be retrieved from `self.user`,
Information about the user can be retrieved from `self.user`,
an object encapsulating the user's name, authentication, and server info.
an object encapsulating the user's name, authentication, and server info.
@@ -67,11 +68,11 @@ async def start(self):
When `Spawner.start` returns, the single-user server process should actually be running,
When `Spawner.start` returns, the single-user server process should actually be running,
not just requested. JupyterHub can handle `Spawner.start` being very slow
not just requested. JupyterHub can handle `Spawner.start` being very slow
(such as PBS-style batch queues, or instantiating whole AWS instances)
(such as PBS-style batch queues, or instantiating whole AWS instances)
via relaxing the `Spawner.start_timeout` config value.
via relaxing the [](#Spawner.start_timeout) config value.
#### Note on IPs and ports
#### Note on IPs and ports
`Spawner.ip` and `Spawner.port` attributes set the _bind_ URL,
[](#Spawner.ip) and [](#Spawner.port) attributes set the _bind_ URL,
which the single-user server should listen on
which the single-user server should listen on
(passed to the single-user process via the `JUPYTERHUB_SERVICE_URL` environment variable).
(passed to the single-user process via the `JUPYTERHUB_SERVICE_URL` environment variable).
The _return_ value is the IP and port (or full URL) the Hub should _connect to_.
The _return_ value is the IP and port (or full URL) the Hub should _connect to_.
@@ -123,7 +124,7 @@ If both attributes are not present, the Exception will be shown to the user as u
### Spawner.poll
### Spawner.poll
`Spawner.poll` checks if the spawner is still running.
[](#Spawner.poll) checks if the spawner is still running.
It should return `None` if it is still running,
It should return `None` if it is still running,
and an integer exit status, otherwise.
and an integer exit status, otherwise.
@@ -132,7 +133,7 @@ to check if the local process is still running. On Windows, it uses `psutil.pid_
### Spawner.stop
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
[](#Spawner.stop) should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
## Spawner state
@@ -165,17 +166,18 @@ def clear_state(self):
self.pid=0
self.pid=0
```
```
(spawner_user_options)=
## Spawner options form
## Spawner options form
(new in 0.4)
Some deployments may want to offer options to users to influence how their servers are started.
Some deployments may want to offer options to users to influence how their servers are started.
This may include cluster-based deployments, where users specify what resources should be available,
This may include cluster-based deployments, where users specify what memory or cpu resources should be available,
or docker-based deployments where users can select from a list of base images.
or container-based deployments where users can select from a list of base images,
or more complex configurations where users select a "profile" representing a bundle of settings to be applied together.
This feature is enabled by setting `Spawner.options_form`, which is an HTML form snippet
This feature is enabled by setting [](#Spawner.options_form), which is an HTML form snippet
inserted unmodified into the spawn form.
inserted unmodified into the spawn form.
If the `Spawner.options_form` is defined, when a user tries to start their server, they will be directed to a form page, like this:
If the `Spawner.options_form` is defined, when a user tries to start their server they will be directed to a form page, like this:


@@ -185,28 +187,40 @@ See [this example](https://github.com/jupyterhub/jupyterhub/blob/HEAD/examples/s
### `Spawner.options_from_form`
### `Spawner.options_from_form`
Options from this form will always be a dictionary of lists of strings, e.g.:
Inputs from an HTML form always arrive as a dictionary of lists of strings, e.g.:
```python
```python
{
formdata={
'integer':['5'],
'integer':['5'],
'checkbox':['on'],
'text':['some text'],
'text':['some text'],
'select':['a','b'],
'select':['a','b'],
}
}
```
```
When `formdata` arrives, it is passed through `Spawner.options_from_form(formdata)`,
When `formdata` arrives, it is passed through [](#Spawner.options_from_form):
which is a method to turn the form data into the correct structure.
This method must return a dictionary, and is meant to interpret the lists-of-strings into the correct types. For example, the `options_from_form` for the above form would look like:
[](#Spawner.options_from_form) is a configurable function to turn the HTTP form data into the correct structure for [](#Spawner.user_options).
`options_from_form` must return a dictionary, _may_ be async, and is meant to interpret the lists-of-strings a web form produces into the correct types.
For example, the `options_from_form` for the above form might look like:
```python
defoptions_from_form(formdata,spawner=None):
options={}
options={}
options['integer']=int(formdata['integer'][0])# single integer value
options['integer']=int(formdata['integer'][0])# single integer value
options['checkbox']=formdata['checkbox']==['on']
options['text']=formdata['text'][0]# single string value
options['text']=formdata['text'][0]# single string value
options['select']=formdata['select']# list already correct
options['select']=formdata['select']# list already correct
options['notinform']='extra info'# not in the form at all
options['notinform']='extra info'# not in the form at all
returnoptions
returnoptions
c.Spawner.options_from_form=options_from_form
```
```
which would return:
which would return:
@@ -214,15 +228,115 @@ which would return:
```python
```python
{
{
'integer':5,
'integer':5,
'checkbox':True,
'text':'some text',
'text':'some text',
'select':['a','b'],
'select':['a','b'],
'notinform':'extra info',
'notinform':'extra info',
}
}
```
```
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
It is recommended to use at least JupyterLab 3.6 with JupyterHub >= 3.1.1 for this.
It is recommended to use at least JupyterLab 3.6 with JupyterHub >= 3.1.1 for this.
:::
:::
:::{note}
Starting with JupyterLab >=4.0, installing the [jupyter-collaboration](https://github.com/jupyterlab/jupyter-collaboration) package in your single-user environment enables collaborative mode, instead of passing the `--collaborative` flag at runtime.
:::
JupyterLab has support for real-time collaboration (RTC), where multiple users are working with the same Jupyter server and see each other's edits.
JupyterLab has support for real-time collaboration (RTC), where multiple users are working with the same Jupyter server and see each other's edits.
Beyond other collaborative-editing environments, Jupyter includes _execution_.
Beyond other collaborative-editing environments, Jupyter includes _execution_.
So granting someone access to your server also means granting them access to **run code as you**.
So granting someone access to your server also means granting them access to **run code as you**.
@@ -74,7 +78,7 @@ c.JupyterHub.load_roles = []
c.JupyterHub.load_groups={
c.JupyterHub.load_groups={
# collaborative accounts get added to this group
# collaborative accounts get added to this group
# so it's easy to see which accounts are collaboration accounts
# so it's easy to see which accounts are collaboration accounts
"collaborative":[],
"collaborative":{"users":[]},
}
}
```
```
@@ -98,12 +102,12 @@ for project_name, project in project_config["projects"].items():
members=project.get("members",[])
members=project.get("members",[])
print(f"Adding project {project_name} with members {members}")
print(f"Adding project {project_name} with members {members}")
A [generic implementation](https://oauthenticator.readthedocs.io/en/latest/reference/api/gen/oauthenticator.generic.html), which you can use for OAuth authentication
A [generic implementation](https://oauthenticator.readthedocs.io/en/latest/reference/api/gen/oauthenticator.generic.html), which you can use for OAuth authentication
The JupyterHub [docker image](https://hub.docker.com/r/jupyterhub/jupyterhub/) is the fastest way to set up Jupyterhub in your local development environment.
The JupyterHub [docker image](https://quay.io/repository/jupyterhub/jupyterhub) is the fastest way to set up Jupyterhub in your local development environment.
:::{note}
:::{note}
This `jupyterhub/jupyterhub` docker image is only an image for running
This `quay.io/jupyterhub/jupyterhub` docker image is only an image for running
the Hub service itself. It does not provide the other Jupyter components,
the Hub service itself. It does not provide the other Jupyter components,
such as Notebook installation, which are needed by the single-user servers.
such as Notebook installation, which are needed by the single-user servers.
To run the single-user servers, which may be on the same system as the Hub or
To run the single-user servers, which may be on the same system as the Hub or
@@ -24,7 +24,7 @@ You should have [Docker] installed on a Linux/Unix based system.
To pull the latest JupyterHub image and start the `jupyterhub` container, run this command in your terminal.
To pull the latest JupyterHub image and start the `jupyterhub` container, run this command in your terminal.
```
```
docker run -d -p 8000:8000 --name jupyterhub jupyterhub/jupyterhub jupyterhub
docker run -d -p 8000:8000 --name jupyterhub quay.io/jupyterhub/jupyterhub jupyterhub
```
```
This command exposes the Jupyter container on port:8000. Navigate to `http://localhost:8000` in a web browser to access the JupyterHub console.
This command exposes the Jupyter container on port:8000. Navigate to `http://localhost:8000` in a web browser to access the JupyterHub console.
- [Node.js {{node_min}}](https://www.npmjs.com/) or greater, along with npm. [Install Node.js/npm](https://docs.npmjs.com/getting-started/installing-node),
using your operating system's package manager.
using your operating system's package manager.
- If you are using **`conda`**, the nodejs and npm dependencies will be installed for
- If you are using **`conda`**, the nodejs and npm dependencies will be installed for
@@ -24,7 +24,7 @@ Before installing JupyterHub, you will need:
```
```
[nodesource][] is a great resource to get more recent versions of the nodejs runtime,
[nodesource][] is a great resource to get more recent versions of the nodejs runtime,
if your system package manager only has an old version of Node.js (e.g. 10 or older).
if your system package manager only has an old version of Node.js.
- A [pluggable authentication module (PAM)](https://en.wikipedia.org/wiki/Pluggable_authentication_module)
- A [pluggable authentication module (PAM)](https://en.wikipedia.org/wiki/Pluggable_authentication_module)
to use the [default Authenticator](authenticators).
to use the [default Authenticator](authenticators).
@@ -90,6 +90,6 @@ To **allow multiple users to sign in** to the Hub server, you must start
sudo jupyterhub
sudo jupyterhub
```
```
The [wiki](https://github.com/jupyterhub/jupyterhub/wiki/Using-sudo-to-run-JupyterHub-without-root-privileges)
[](howto:config:no-sudo)
describes how to run the server as a _less privileged user_. This requires
describes how to run the server as a _less privileged user_. This requires
Sometimes, when working with applications such as [BinderHub](https://binderhub.readthedocs.io), it may be necessary to launch Jupyter-based services on behalf of your users.
Sometimes, when working with applications such as [BinderHub](https://binderhub.readthedocs.io), it may be necessary to launch Jupyter-based services on behalf of your users.
Doing so can be achieved through JupyterHub's [REST API](using-jupyterhub-rest-api), which allows one to launch and manage servers on behalf of users through API calls instead of the JupyterHub UI.
Doing so can be achieved through JupyterHub's [REST API](howto:rest-api), which allows one to launch and manage servers on behalf of users through API calls instead of the JupyterHub UI.
This way, you can take advantage of other user/launch/lifecycle patterns that are not natively supported by the JupyterHub UI, all without the need to develop the server management features of JupyterHub Spawners and/or Authenticators.
This way, you can take advantage of other user/launch/lifecycle patterns that are not natively supported by the JupyterHub UI, all without the need to develop the server management features of JupyterHub Spawners and/or Authenticators.
This tutorial goes through working with the JupyterHub API to manage servers for users.
This tutorial goes through working with the JupyterHub API to manage servers for users.
JupyterHub's `user-sharing` example does it this way.
The nice thing about this approach is that only users who already have those permissions will get a token which can take these actions.
The downside (in terms of convenience) is that the browser token is only accessible to the javascript (e.g. JupyterLab) and/or jupyter-server request handlers,
but not notebooks or terminals.
The second way, which is less secure, but perhaps more convenient for demonstration purposes,
is to grant the _server itself_ permission to grant access to itself.
```python
c.Spawner.server_token_scopes=[
"users:activity!user",
"shares!server",
]
```
The security downside of this approach is that anyone who can access the server generally can assume the permissions of the server token.
Effectively, this means anyone who the server is shared _with_ will gain permission to further share the server with others.
This is not the case for the first approach, but this token is accessible to terminals and notebook kernels, making it easier to illustrate.
## Get a token
Now, assuming the _user_ has permission to share their server (step 0), we need a token to make the API requests in this tutorial.
You can do this at the token page, or inherit it from the single-user server environment if one of the above configurations has been selected by admins.
To request a token with only the permissions required (`shares!user`) on the token page:

This token will be in the `Authorization` header.
To create a {py:class}`requests.Session` that will send this header on every request:
would be a link that can be shared with any JupyterHub user that will take them directly to the file `mynotebook.ipynb` in JupyterLab on barb's server after granting them access to the server.
## Reviewing shared access
When you have shared access to your server, it's a good idea to check out who has access.
You can see who has access with:
```python
session.get()
```
which produces a paginated list of who has shared access:
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.