- add docs, tests
- deprecate DummyAuthenticator.password, pointing to new class
- accept no password as valid config (no login possible)
- log warnings for suspicious config (e.g. passwords not set, admin password set, but no admin users, etc.)
Currently, admin users are even more insecure than otherwise
with dummyauthenticator - anyone who knows the username of the admin
can get in if they also know the password.
This PR adds an additional layer of security - admins *must* login
using a different, more secure (longer, per NIST guidelines) password.
If they login using the regular password, no admin status for them.
This mildly helpful in local testing and improves overall security
posture. Where it really shines though, is in 'workshop' hubs. I've
been running those for years now, both at UC Berkeley and now at 2i2c
(with NASA Openscapes in particular). This was the usecase DummyAuth
was written for :D It allows an instructor to share a single password
with all the users in a secure way (they're all in a physical room,
zoom, etc). The password is then changed after the workshop. However,
admin access was not possible in this use case, as anyone guessing the
admin's username can get in as admin. With this change, admin access
is possible.
- when adding trailing slash, do so inside url_path_join, not with `+ '/'`
- don't use url_path_join to build url for handler _outside_ prefix (AddSlash on `/hub`)
the functions we use haven't changed in almost 10 years,
and are only a few lines
we should probably lose them eventually, but easier to vendor them first
wait for networkidle isn't enough for debounced name filter
clock.run_for doesn't seem to work, either, unclear why
instead, make sure the first page reflects the filtered view before clicking 'next'
- allow cancellation of outdated updates
- trigger offset changes with setOffset instead of on reply
- render pagination footer with `user_page.offset` instead of state.offset which only represents the _requested_ offset, not current view
- show login form for trying again, just like a password failure
- nicer, but more vague "try again" error for expired xsrf (original error still logged)
because users logging in don't need to know or understand xsrf stuff
- set fresh xsrf cookie when login page loads, to maximize time until expiration
- `{name}_input` for overriding full input
- `{name}_input_attrs` for overriding input element attributes (not including id).
Use `super()` to extend.
- For all `name` in username, password, otp
With this change, if we set
```
{% block username_input_attribs %}pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"{% endblock username_input_attribs %}
```
We will get the following generated code
```
<input
id="username_input"
type="text"
autocapitalize="off"
autocorrect="off"
autocomplete="username"
class="form-control"
name="username"
val=""
tabindex="1"
autofocus="autofocus"
pattern="[a-z0-9]+"
placeholder="do not use email address, use your username"
/>
This allows to update the intersphinx url in a single location when
those move, an make it a tiny-bit easier to add existing packages than
having to figure out where their docs are.
- defer jupyter_core import that caused earlier, less informative ImportError
- point to `pip install jupyterhub[singleuser]` in the error
- use `raise from` so original import error is still reported
For security reasons, only allow-listed env vars in the parent
JupyterHub process are passed to the single-user server Python process.
This allow-list is controlled by `Spawner.env_keep`, which by default
includes common env vars that are (a) both necessary for the single-user
server process to work, (b) don't contain credentials or sensitive
information that shouldn't be revealed to users of the Notebook.
However, this allow-list was missing the `LD_LIBRARY_PATH` env var,
which causes shared library errors when using a relocated Python that
has been compiled in shared mode (`--enable-shared`). This prevents
JupyterHub from working out of the box on platforms like Heroku.
Fixes#4903.
instead of initialize, which should only create objects
improves symmetry with stop, should remove some warnings about unfinished coroutines in some tests
* The singleuser mixin is attempting to bypass jupyter_server's
interactive prompt on shutdown by stopping the IO loop.
* This does disable the interactive prompt, but also causes SIGINT
to be ignored causing SIGTERM to be issued after the timeout is hit.
* Closing the IO loop also prevents the server from closing async resources.
* This change allows jupyter_server to run its cleanup logic as
intended.
Secure contexts are a more robust way of checking that a browsing context
is authenticated and confidential. Compared to comparing the scheme this
covers cases where the connection is encrypted, but using a broken algorithm.
Notably, localhost is considered a secure context, even over HTTP.
For more detail on secure contexts, see:
https://developer.mozilla.org/en-US/docs/Web/Security/Secure_Contexts
cancels start rather than waiting for it to finish or timeout
also fixes cancellation when start_timeout is reached, which was previously left running forever
While doing https://github.com/jupyterhub/jupyterhub/pull/2726,
I realized we don't have a consistent way to format references
inside the docs. I now have them be formatted to match the name
of the file, but using `:` to separate them instead of `/` or `-`.
`/` makes it ambiguous when using with markdown link syntax, as
it could be a reference or a file. And using `-` is ambiguous, as
that can be the name of the file itself.
This PR does about half, I can do the other half later (unless
someone else does).
Allows limiting max expiration of tokens created via the API
Only affects the POST /api/tokens endpoint, not tokens issued by other means or created prior to config
Similar to 'kubespawner_override' in KubeSpawner, this allows
admins to selectivel override spawner configuration based on
groups a user belongs to. This allows for low maintenance but
extremely powerful customization based on group membership.
This is particularly powerful when combined with
https://github.com/jupyterhub/oauthenticator/pull/735
\#\# Dictionary vs List
Ordering is important here, but still I choose to implement this
configuration as a dictionary of dictionaries vs a list. This is
primarily to allow for easy overriding in z2jh (and similar places),
where Lists are just really hard to override. Ordering is provided
by lexicographically sorting the keys, similar to how we do it in z2jh.
\#\# Merging config
The merging code is literally copied from KubeSpawner, and provides
the exact same behavior. Documentation of how it acts is also copied.
- Missing `form-control` on a textbox gave it weird padding,
this fixes it.
- Add new server is set up as a [button addon](https://getbootstrap.com/docs/5.3/forms/input-group/#button-addons)
- Add a little right margin to the username in the navbar,
just before the logout button. Otherwise they were 'stuck'
to each other
set offset -> request page -> response sets offset is a recipe for races
instead, send request with new offset and only update offset state
made easier by consolidating page update requests into single loadPageData
- the asynccontextmanager object is available in the standard contextlib
module since Pyhton 3.7
- the aclosing object is available in the standard contextlib module
since Pyhton 3.10
- JupyterHub currently requires Python 3.8 or newer
we had some joins to trigger eager loading,
but then `query.count()` returns the count of (user, group) and (user, spawner) pairs, not the count of users
but the `joinedload` options added later are the right way to do that,
so these joins are unnecessary
GitHub Actions starts a log expansion group when it sees the string `[group]`,
which happens when that is a parametrize argument
which results in collapsing all subsequent test outputs
pytest.param lets us assign an id that is used in output, but not the value itself
Four tests were not using a mock authenticator:
- two get `reset_managed_roles_on_startup` toggled
- two get a custom implementation of `load_managed_roles`
'No template for 404' looks like something's wrong, when all it means to convey is that it doesn't get _special_ treatment
and the default error page is enough.
sevices/auth prevents calling check_xsrf_cookie,
but if the Handler itself called it the newly strict check would still be applied
this ensures the check is actually allowed for navigate GET requests
rather than attempting to clear multiple tokens (too complicated, breaks named servers)
look for and accept first valid token
have to do our own cookie parsing because existing cookie implementations only return a single value for each key
and default to selecting the _least_ likely to be correct, according to RFCs.
set updated xsrf cookie on login to avoid needing two requests to get the right cookie
# Conflicts:
# jupyterhub/tests/test_services_auth.py
need to inject our override into the base class,
rather than at the instance level,
to avoid clobbering any overrides in extensions like jupyter-server-proxy
almost every time installing docs/requirements.txt happens, JupyterHub is already installed
adding an `--editable` here ensures a full rebuild happens every time, which is very slow
- `expected_refresh_groups` was not used in this test case,
my guess is that it was accidentally copied over from
`test_auth_managed_groups`
- `expected_authenticated_groups` was defined only to define
the unused `expected_refresh_groups`
- `getRoleNames` was not following snake_case
and test coverage for allow_all and allow_existing_users interactions
PAMAuthenticator.allowed_groups is no longer mutually exclusive with allowed_users
Configuring `Authenticator.allowed_users` truthy makes other existing
users in JupyterHub's database be allowed access, this could come as a
surprise. This new config is meant to help avoid such surprise. With
this new config, a JupyterHub admin is able to directly declare if the
existing users in JupyterHub's database is to be granted access or not.
If `allow_existing_users` isn't explicity set, the default value will
be computed to True or False depending on if `allowed_users` is Truthy,
which makes the introduction of this config a non-breaking change.
This configuration was initially introduced in jupyterhub/oauthenticator
via https://github.com/jupyterhub/oauthenticator/pull/631, and is with
this PR being upstreamed to the base Authenticator class.
next to name filter, so it's not in the table headings
merges Running & Actions columns,
since it's really just Actions now (server actions & user actions)
removes in-page sort, which removes sort by server name, sort by running
Running column switches from sort to filter, matching the `?state` query parameter in the API
needs some CSS on the column widths to avoid jumps when toggling active servers
- persist offset, limit, name_filter in URL parameters,
so they are stable across page reload
- add UI element to specify items per page
This allows specifying a URL, which will show a specific view of a page of users
- accept 0 meaning no expiration, since folks have tried to use it that way
- clear error message for invalid (e.g. negative) values
- specify example in rest api doc so it doesn't default to invalid `0`
- better error if orm token fails to be retrieved
In my testing, Flask 3.0.0 doesn't accept returning only an integer
(as an error code) in a handler. A (content, status) tuple does
work. I don't know if this is a recent change, or if this has always
been broken, but the tuple return should be good for older Flask
versions as well.
For ordinary users to access the service, they need an appropriate
scope added to the user role. This adds that role in the
jupyterhub_config.py, as well as a note about this in the README.
It also updates the ouptut that comes form the whoami service.
$host is the hostname, $http_host is `hostname[:port]`, which is what's needed here
$host works fine in the example because it uses the default port 80, but if it's on a different port
it will differ from the http Host header, resulting in cross-origin check errors.
I just went through these with @jmunroe, and found the
db step a little confusing - there is no action to really be
taken here, as pretty much everyone just uses sqlite for
development (and even production). So I've just removed that
step, as python almost always ships with sqlite built into it.
JupyterHub uses semantic versioning and has been >1.0.0 for a long time. It should be fine for the hub and singleuser versions to differ in their minor component.
only use a random token as the actual oauth state,
and use a local cache dict to store the extra info like cookie_name, next_url
this avoids the state field getting too big and passing local browser-server info to anyone else
In the unusual situation of:
1. both sides having a filter on `servers`
2. those filters being different
3. _and_ some permissions are granted via higher-level group or user membership
Setting the full bind_url seems increasingly cumbersome,
as often one only wants to change the url prefix or the ip,
rather than setting the whole thing.
Add debug messages and timers for start and end waiting for servers
and improve logic for awaiting proxy endpoints using concurrency primitives instead of a for-loop
- Roles need to be explicitly granted, otherwise you get a
403. This example predates roles.
- Explicitly set bind_url - without this, JupyterHub itself doesn't
seem to bind anywhere, and so you just get a 404 when you visit
whatever port configurable-http-proxy lands on. This is probably
a separate bug to be investigated, but in the meantime copying
this from testing/jupyterhub_config.py makes this example actually
work
- Set DummyAuthenticator as the default, so users can get started
with this example
While it seems trivial, this can be a bit convoluted to debug on macOS
because some of the services might not be visible to the user logged in.
The solution is simple however knowing why it is needed is a good thing.
add eager loading of several relationships that are ~always used when the given objects are requested
add specific eager loading of spawners to the users query
- roles, groups (always needed to resolve permissions)
- APIToken.user, service
query on users filtered by spawner, with joinedload for relationships that will be checked
gets the same results, but in a single query with more efficient lookups
- avoid lookup-by-name of user and admin roles when assigning them
- filter users-to-update to only those that need updating, which should usually be empty
No change in behavior
rather than using multi-level subdomains, which are nicer,
use `--user` and `--service` so it's only one DNS level below hub.
This is not as nice, but is compatible with wildcard SSL which only allows one level of separation.
- "Missing or invalid credentials" if no credentials at all
- fix HTTP method name on actual xsrf check failures
- show scopes if authenticated but not authorized (no change, but now tested)
adding users via config anywhere makes them allowed
previously, this was _required_, so that it was always true for working config,
but config which allowed some users but declared others in groups or roles was forbidden.
Now, declaring a user anywhere _ensures_ the user is allowed rather than _enforcing_ it.
- suggest token page first
- remove caveat about JupyterHub 0.8, which can be assumed now
- undocument `jupyterhub token`
- refresh token page screenshots, and remove duplicate screenshot of the token page
- minor improvements to language in token page
`Configuration Reference` sounds like it's the place to go to see the full list of JupyterHub config options.
However it's not very readable as it's a plain-text dump of the output of `jupyterhub --generate-config`.
This links to some of the API doc pages instead, which present most of the information in an easier to read format. Unfortunately it also includes a lot of non-traitlets documentation.
Added Playwright tests which are covered Login, Spawning, Home and Token pages
Removed Selenium cases which are covered Login, Spawning, Home and Token pages
consistent behavior with oauth_client_allowed_scopes,
where the _intersection_ of requested and owner-held permissions is granted,
instead of failing
Enables different users to have different permissions in $JUPTYERHUB_API_TOKEN,
either via callables or via requesting as much as you may want and only granting the subset.
Additionally, the !server filter can now be correctly applied to the server token
default behavior is unchanged
According to the review updated cases covered the Admin UI:
- rename the function,
- add docstring,
- add parametrization,
- use the fixture admin_user instead of the function
mostly a copy (fork) of singleuser app
using public APIs instead of lots of patching.
opt-in via `JUPYTERHUB_SINGLEUSER_EXTENSION=1`
related changes:
- stop running a test single-user server in a thread. It's complicated and fragile.
Instead, run it normally, and get the info we need from a custom handler registered via an extension
via the `full_spawn` fixture
use async fixtures for simpler event-loop integration
several of these fixtures were written before fixtures themselves could be async,
but now they can, which means we can use async/await instead of run_sync.
removes some workarounds needed for sqlalchemy 1.1 + 2.0 support
1.4 backports most 2.0 behavior, keeping it off-by-default for an easier opt-in transition
opt-in with `session.future = True` flag
- avoid backref warnings by adding objects to session explicitly before creating any relationships
- remove unnecessary `[]` around scalar query
- use `text()` wrapper on connection.execute
- engine.execute is removed
- update import of declarative_base
- ensure RemovedIn20Warning is available for warnings filters on sqlalchemy < 1.4 (needs editable install to avoid pytest path mismatch)
- explicitly relay password in engine.url to alembic
Removes all Referer checks, which have proven unreliable and have never been particularly strong
We can use XSRF on paths for more robust inter-path protections.
- `_xsrf` is added for forms via hidden input
- xsrf check is additionally applied to GET requests on API endpoints
- jupyter.chameleoncloud SSL is failing (I can reproduce with conda curl, but not /usr/bin/curl, so seems to be a CA issue)
- remove dead arnesund tag link (keep single article link)
- Add html language attribute
- Rename logo's alt text so it clearly states the image's purpose
- Fix missing first level heading for Login, Home and Token page
- Fix missing header level 1 of Login page
- Fix low contrast issue of navbar
Co-authored-by: Min RK <benjaminrk@gmail.com>
Added table-as-dict function instead of few functions for working with the tokens table
replaced static value from locators.py by locator itself in test_browser
simplified menu-bar case
allows us to wait for the javascript to finish loading,
since clicking buttons won't do anything if we click before the js has registered click handlers
- several http->https
- a few page moves
- miniconda->miniforge
- remove rochester from gallery, which doesn't apepar to be publicly documented (may be accessible internally, but that's not for a public gallery)
adding checks token from db, adding some explanations to async sleep(still wip), renamed few cases (spawn_panding, token), adding "cleanup_after" fixture into def browser()
Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.
Thanks you for being an active member of our community! :heart:
Thank you for being an active member of our community! :heart:
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Please note that this repository is participating in a study into the sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-11.
Data collected will include the number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.
For more information, please visit
[our informational page](https://sustainable-open-science-and-software.github.io/) or download our [participant information sheet](https://sustainable-open-science-and-software.github.io/assets/PIS_sustainable_software.pdf).
[](https://github.com/jupyterhub/jupyterhub/actions)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
document.body.innerText="Rendered API specification doesn't work with file: protocol. Use sphinx-autobuild to do local builds of the docs, served over HTTP."
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
@@ -19,7 +21,7 @@ We use [our Gitter channel](https://gitter.im/jupyterhub/jupyterhub) for online,
[Github issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues) are used for most long-form project discussions, bug reports and feature requests.
- Issues related to a specific authenticator or spawner should be opened in the appropriate repository for the authenticator or spawner.
- If you are using a specific JupyterHub distribution (such as [Zero to JupyterHub on Kubernetes](http://github.com/jupyterhub/zero-to-jupyterhub-k8s) or [The Littlest JupyterHub](http://github.com/jupyterhub/the-littlest-jupyterhub/)), you should open issues directly in their repository.
- If you are using a specific JupyterHub distribution (such as [Zero to JupyterHub on Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) or [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub/)), you should open issues directly in their repository.
- If you cannot find a repository to open your issue in, do not worry! Open the issue in the [main JupyterHub repository](https://github.com/jupyterhub/jupyterhub/) and our community will help you figure it out.
Documentation is often more important than code. This page helps
you get set up on how to contribute to JupyterHub's documentation.
## Building documentation locally
We use [sphinx](https://www.sphinx-doc.org) to build our documentation. It takes
our documentation source files (written in [markdown](https://daringfireball.net/projects/markdown/) or [reStructuredText](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) &
stored under the `docs/source` directory) and converts it into various
formats for people to read. To make sure the documentation you write or
change renders correctly, it is good practice to test it locally.
1. Make sure you have successfully completed {ref}`contributing:setup`.
2. Install the packages required to build the docs.
```bash
python3 -m pip install -r docs/requirements.txt
```
3. Build the html version of the docs. This is the most commonly used
output format, so verifying it renders correctly is usually good
enough.
```bash
cd docs
make html
```
This step will display any syntax or formatting errors in the documentation,
along with the filename / line number in which they occurred. Fix them,
and re-run the `make html` command to re-render the documentation.
4. View the rendered documentation by opening `_build/html/index.html` in
a web browser.
:::{tip}
**On Windows**, you can open a file from the terminal with `start <path-to-file>`.
**On macOS**, you can do the same with `open <path-to-file>`.
**On Linux**, you can do the same with `xdg-open <path-to-file>`.
After opening index.html in your browser you can just refresh the page whenever
you rebuild the docs via `make html`
:::
(contributing-docs-conventions)=
## Documentation conventions
This section lists various conventions we use in our documentation. This is a
living document that grows over time, so feel free to add to it / change it!
Our entire documentation does not yet fully conform to these conventions yet,
so help in making it so would be appreciated!
### `pip` invocation
There are many ways to invoke a `pip` command, we recommend the following
approach:
```bash
python3 -m pip
```
This invokes pip explicitly using the python3 binary that you are
currently using. This is the **recommended way** to invoke pip
in our documentation, since it is least likely to cause problems
with python3 and pip being from different environments.
For more information on how to invoke `pip` commands, see
We want you to contribute to JupyterHub in ways that are most exciting
and useful to you. We value documentation, testing, bug reporting & code equally,
and are glad to have your contributions in whatever form you wish.
Be sure to first check our [Code of Conduct](https://github.com/jupyter/governance/blob/HEAD/conduct/code_of_conduct.md)
([reporting guidelines](https://github.com/jupyter/governance/blob/HEAD/conduct/reporting_online.md)), which help keep our community welcoming to as many people as possible.
This section covers information about our community, as well as ways that you can connect and get involved.
for development & collaboration. You need to [install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to work on
JupyterHub. We also recommend getting a free account on GitHub.com.
## Setting up a development install
When developing JupyterHub, you would need to make changes and be able to instantly view the results of the changes. To achieve that, a developer install is required.
:::{note}
This guide does not attempt to dictate _how_ development
environments should be isolated since that is a personal preference and can
be achieved in many ways, for example, `tox`, `conda`, `docker`, etc. See this
[forum thread](https://discourse.jupyter.org/t/thoughts-on-using-tox/3497) for
a more detailed discussion.
:::
1. Clone the [JupyterHub git repository](https://github.com/jupyterhub/jupyterhub)
for development & collaboration. You need to `install git
<https://git-scm.com/book/en/v2/Getting-Started-Installing-Git>`_ to work on
JupyterHub. We also recommend getting a free account on GitHub.com.
Setting up a development install
================================
When developing JupyterHub, you would need to make changes and be able to instantly view the results of the changes. To achieve that, a developer install is required.
..note:: This guide does not attempt to dictate *how* development
environments should be isolated since that is a personal preference and can
be achieved in many ways, for example, `tox`, `conda`, `docker`, etc. See this
`forum thread <https://discourse.jupyter.org/t/thoughts-on-using-tox/3497>`_ for
a more detailed discussion.
1. Clone the `JupyterHub git repository <https://github.com/jupyterhub/jupyterhub>`_
2. Make sure the ``python`` you installed and the ``npm`` you installed
are available to you on the command line.
..code::bash
python -V
This should return a version number greater than or equal to 3.6.
..code::bash
npm -v
This should return a version number greater than or equal to 5.0.
3. Install ``configurable-http-proxy`` (required to run and test the default JupyterHub configuration) and ``yarn`` (required to build some components):
.. code:: bash
npm install -g configurable-http-proxy yarn
If you get an error that says ``Error: EACCES: permission denied``, you might need to prefix the command with ``sudo``.
``sudo`` may be required to perform a system-wide install.
If you do not have access to sudo, you may instead run the following commands:
..code::bash
npm install configurable-http-proxy yarn
exportPATH=$PATH:$(pwd)/node_modules/.bin
The second line needs to be run every time you open a new terminal.
If you are using conda you can instead run:
..code::bash
conda install configurable-http-proxy yarn
4. Install an editable version of JupyterHub and its requirements for
development and testing. This lets you edit JupyterHub code in a text editor
& restart the JupyterHub process to see your code changes immediately.
..code::bash
python3 -m pip install --editable ".[test]"
5. Set up a database.
The default database engine is ``sqlite`` so if you are just trying
to get up and running quickly for local development that should be
available via `Python <https://docs.python.org/3.5/library/sqlite3.html>`__.
See :doc:`/reference/database` for details on other supported databases.
6. You are now ready to start JupyterHub!
.. code:: bash
jupyterhub
7. You can access JupyterHub from your browser at
``http://localhost:8000`` now.
Happy developing!
Using DummyAuthenticator & SimpleLocalProcessSpawner
- `admin_user`: creates a new temporary admin user
- single user servers
\- `cleanup_after`: allows cleanup of single user servers between tests
- mocked service
\- `MockServiceSpawner`: a spawner that mocks services for testing with a short poll interval
\- `` mockservice` ``: mocked service with no external service url
\- `mockservice_url`: mocked service with a url to test external services
And fixtures to add functionality or spawning behavior:
- `admin_access`: grants admin access
- `` no_patience` ``: sets slow-spawning timeouts to zero
- `slow_spawn`: enables the SlowSpawner (a spawner that takes a few seconds to start)
- `never_spawn`: enables the NeverSpawner (a spawner that will never start)
- `bad_spawn`: enables the BadSpawner (a spawner that fails immediately)
- `slow_bad_spawn`: enables the SlowBadSpawner (a spawner that fails after a short delay)
Refer to the [pytest fixtures documentation](https://pytest.readthedocs.io/en/latest/fixture.html) to learn how to use fixtures that exists already and to create new ones.
### The Pytest-Asyncio Plugin
When testing the various JupyterHub components and their various implementations, it sometimes becomes necessary to have a running instance of JupyterHub to test against.
The [`app`](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/conftest.py#L60) fixture mocks a JupyterHub application for use in testing by:
- enabling ssl if internal certificates are available
- creating an instance of [MockHub](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/mocking.py#L221) using any provided configurations as arguments
- initializing the mocked instance
- starting the mocked instance
- finally, a registered finalizer function performs a cleanup and stops the mocked instance
The JupyterHub test suite uses the [pytest-asyncio plugin](https://pytest-asyncio.readthedocs.io/en/latest/) that handles [event-loop](https://docs.python.org/3/library/asyncio-eventloop.html) integration in [Tornado](https://www.tornadoweb.org/en/stable/) applications. This allows for the use of top-level awaits when calling async functions or [fixtures](https://docs.pytest.org/en/6.2.x/fixture.html#what-fixtures-are) during testing. All test functions and fixtures labelled as `async` will run on the same event loop.
```{note}
With the introduction of [top-level awaits](https://piccolo-orm.com/blog/top-level-await-in-python/), the use of the `io_loop` fixture of the [pytest-tornado plugin](https://www.tornadoweb.org/en/stable/ioloop.html) is no longer necessary. It was initially used to call coroutines. With the upgrades made to `pytest-asyncio`, this usage is now deprecated. It is now, only utilized within the JupyterHub test suite to ensure complete cleanup of resources used during testing such as open file descriptors. This is demonstrated in this [pull request](https://github.com/jupyterhub/jupyterhub/pull/4332).
More information is provided below.
```
One of the general goals of the [JupyterHub Pytest Plugin project](https://github.com/jupyterhub/pytest-jupyterhub) is to ensure the MockHub cleanup fully closes and stops all utilized resources during testing so the use of the `io_loop` fixture for teardown is not necessary. This was highlighted in this [issue](https://github.com/jupyterhub/pytest-jupyterhub/issues/30)
For more information on asyncio and event-loops, here are some resources:
- **Read**: [Introduction to the Python event loop](https://www.pythontutorial.net/python-concurrency/python-event-loop)
- **Read**: [Overview of Async IO in Python 3.7](https://stackabuse.com/overview-of-async-io-in-python-3-7)
- **Watch**: [Asyncio: Understanding Async / Await in Python](https://www.youtube.com/watch?v=bs9tlDFWWdQ)
- **Watch**: [Learn Python's AsyncIO #2 - The Event Loop](https://www.youtube.com/watch?v=E7Yn5biBZ58)
## Troubleshooting Test Failures
### All the tests are failing
Make sure you have completed all the steps in {ref}`contributing:setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
## Code formatting and linting
JupyterHub automatically enforces code formatting. This means that pull requests
with changes breaking this formatting will receive a commit from pre-commit.ci
automatically.
To automatically format code locally, you can install pre-commit and register a
_git hook_ to automatically check with pre-commit before you make a commit if
the formatting is okay.
```bash
pip install pre-commit
pre-commit install --install-hooks
```
To run pre-commit manually you would do:
```bash
# check for changes to code not yet committed
pre-commit run
# check for changes also in already committed code
pre-commit run --all-files
```
You may also install [black integration](https://github.com/psf/black#editor-integration)
into your text editor to format code automatically.
-``admin_user``: creates a new temporary admin user
- single user servers
- ``cleanup_after``: allows cleanup of single user servers between tests
- mocked service
- ``MockServiceSpawner``: a spawner that mocks services for testing with a short poll interval
- ``mockservice```: mocked service with no external service url
- ``mockservice_url``: mocked service with a url to test external services
And fixtures to add functionality or spawning behavior:
-``admin_access``: grants admin access
-``no_patience```: sets slow-spawning timeouts to zero
-``slow_spawn``: enables the SlowSpawner (a spawner that takes a few seconds to start)
-``never_spawn``: enables the NeverSpawner (a spawner that will never start)
-``bad_spawn``: enables the BadSpawner (a spawner that fails immediately)
-``slow_bad_spawn``: enables the SlowBadSpawner (a spawner that fails after a short delay)
Refer to the `pytest fixtures documentation <https://pytest.readthedocs.io/en/latest/fixture.html>`_ to learn how to use fixtures that exists already and to create new ones.
Troubleshooting Test Failures
=============================
All the tests are failing
-------------------------
Make sure you have completed all the steps in :ref:`contributing/setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting ``jupyterhub`` in your command line.
Code formatting and linting
===========================
JupyterHub automatically enforces code formatting. This means that pull requests
with changes breaking this formatting will receive a commit from pre-commit.ci
automatically.
To automatically format code locally, you can install pre-commit and register a
*git hook* to automatically check with pre-commit before you make a commit if
the formatting is okay.
..code::bash
pip install pre-commit
pre-commit install --install-hooks
To run pre-commit manually you would do:
..code::bash
# check for changes to code not yet committed
pre-commit run
# check for changes also in already committed code
pre-commit run --all-files
You may also install `black integration <https://github.com/psf/black#editor-integration>`_
into your text editor to format code automatically.
JupyterHub can be configured to record structured events from a running server using Jupyter's `Telemetry System`_. The types of events that JupyterHub emits are defined by `JSON schemas`_ listed at the bottom of this page_.
Event logging is handled by its ``Eventlog`` object. This leverages Python's standing logging_ library to emit, filter, and collect event data.
To begin recording events, you'll need to set two configurations:
1.``handlers``: tells the EventLog *where* to route your events. This trait is a list of Python logging handlers that route events to the event log file.
2.``allows_schemas``: tells the EventLog *which* events should be recorded. No events are emitted by default; all recorded events must be listed here.
Here's a basic example:
..code-block::
import logging
c.EventLog.handlers = [
logging.FileHandler('event.log'),
]
c.EventLog.allowed_schemas = [
'hub.jupyter.org/server-action'
]
The output is a file, ``"event.log"``, with events recorded as JSON data.
This page could be missing cross-links to other parts of
the documentation. You can help by adding them!
```
JupyterHub is not what you think it is. Most things you think are
part of JupyterHub are actually handled by some other component, for
example the spawner or notebook server itself, and it's not always
obvious how the parts relate. The knowledge contained here hasn't
been assembled in one place before, and is essential to understand
when setting up a sufficiently complex Jupyter(Hub) setup.
This document was originally written to assist in debugging: very
often, the actual problem is not where one thinks it is and thus
people can't easily debug. In order to tell this story, we start at
JupyterHub and go all the way down to the fundamental components of
Jupyter.
In this document, we occasionally leave things out or bend the truth
where it helps in explanation, and give our explanations in terms of
Python even though Jupyter itself is language-neutral. The "(&)"
symbol highlights important points where this page leaves out or bends
the truth for simplification of explanation, but there is more if you
dig deeper.
This guide is long, but after reading it you will be know of all major
components in the Jupyter ecosystem and everything else you read
should make sense.
## What is Jupyter?
Before we get too far, let's remember what our end goal is. A
**Jupyter Notebook** is nothing more than a Python(&) process
which is getting commands from a web browser and displaying the output
via that browser. What the process actually sees is roughly like
getting commands on standard input(&) and writing to standard
output(&). There is nothing intrinsically special about this process
- it can do anything a normal Python process can do, and nothing more.
The **Jupyter kernel** handles capturing output and converting things
such as graphics to a form usable by the browser.
Everything we explain below is building up to this, going through many
different layers which give you many ways of customizing how this
process runs.
## JupyterHub
**JupyterHub** is the central piece that provides multi-user
login capabilities. Despite this, the end user only briefly interacts with
JupyterHub and most of the actual Jupyter session does not relate to
the hub at all: the hub mainly handles authentication and creating (JupyterHub calls it "spawning") the
single-user server. In short, anything which is related to _starting_
the user's workspace/environment is about JupyterHub, anything about
_running_ usually isn't.
If you have problems connecting the authentication, spawning, and the
proxy (explained below), the issue is usually with JupyterHub. To
debug, JupyterHub has extensive logs which get printed to its console
and can be used to discover most problems.
The main pieces of JupyterHub are:
### Authenticator
JupyterHub itself doesn't actually manage your users. It has a
database of users, but it is usually connected with some other system
that manages the usernames and passwords. When someone tries to log
in to JupyteHub, it asks the
**authenticator**([basics](authenticators),
[reference](../reference/authenticators)) if the
username/password is valid(&). The authenticator returns a username(&),
which is passed on to the spawner, which has to use it to start that
user's environment. The authenticator can also return user
groups and admin status of users, so that JupyterHub can do some
higher-level management.
The following authenticators are included with JupyterHub:
- **PAMAuthenticator** uses the standard Unix/Linux operating system
functions to check users. Roughly, if someone already has access to
the machine (they can log in by ssh), they will be able to log in to
JupyterHub without any other setup. Thus, JupyterHub fills the role
of a ssh server, but providing a web-browser based way to access the
machine.
There are [plenty of others to choose from](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
You can connect to almost any other existing service to manage your
users. You either use all users from this other service (e.g. your
company), or enable only the allowed users (e.g. your group's
Github usernames). Some other popular authenticators include:
- **OAuthenticator** uses the standard OAuth protocol to verify users.
For example, you can easily use Github to authenticate your users -
people have a "click to login with Github" button. This is often
done with a allowlist to only allow certain users.
- **NativeAuthenticator** actually stores and validates its own
usernames and passwords, unlike most other authenticators. Thus,
you can manage all your users within JupyterHub only.
- There are authenticators for LTI (learning management systems),
Shibboleth, Kerberos - and so on.
The authenticator is configured with the
`c.JupyterHub.authenticator_class` configuration option in the
`jupyterhub_config.py` file.
The authenticator runs internally to the Hub process but communicates
with outside services.
If you have trouble logging in, this is usually a problem of the
authenticator. The authenticator logs are part of the the JupyterHub
logs, but there may also be relevant information in whatever external
services you are using.
### Spawner
The **spawner** ([basics](spawners),
[reference](../reference/spawners)) is the real core of
JupyterHub: when someone wants a notebook server, the spawner allocates
resources and starts the server. The notebook server could run on the
same machine as JupyterHub, on another machine, on some cloud service,
or more. Administrators can limit resources (CPU, memory) or isolate users
from each other - if the spawner supports it. They can also do no
limiting and allow any user to access any other user's files if they
are not configured properly.
Some basic spawners included in JupyterHub are:
- **LocalProcessSpawner** is built into JupyterHub. Upon launch it tries
to switch users to the given username (`su` (&)) and start the
notebook server. It requires that the hub be run as root (because
only root has permission to start processes as other user IDs).
LocalProcessSpawner is no different than a user logging in with
something like `ssh` and running `jupyter notebook`. PAMAuthenticator and
LocalProcessSpawner is the most basic way of using JupyterHub (and
what it does out of the box) and makes the hub not too dissimilar to
an advanced ssh server.
There are [many more advanced spawners](/reference/spawners), and to
show the diversity of spawning strategys some are listed below:
- **SudoSpawner** is like LocalProcessSpawner but lets you run
JupyterHub without root. `sudo` has to be configured to allow the
hub's user to run processes under other user IDs.
- **SystemdSpawner** uses Systemd to start other processes. It can
isolate users from each other and provide resource limiting.
- **DockerSpawner** runs stuff in Docker, a containerization system.
This lets you fully isolate users, limit CPU, memory, and provide
other container images to fully customize the environment.
- **KubeSpawner** runs on the Kubernetes, a cloud orchestration
system. The spawner can easily limit users and provide cloud
scaling - but the spawner doesn't actually do that, Kubernetes
does. The spawner just tells Kubernetes what to do. If you want to
get KubeSpawner to do something, first you would figure out how to
do it in Kubernetes, then figure out how to tell KubeSpawner to tell
Kubernetes that. Actually... this is true for most spawners.
- **BatchSpawner** runs on computer clusters with batch job scheduling
systems (e.g Slurm, HTCondor, PBS, etc). The user processes are run
as batch jobs, having access to all the data and software that the
users normally will.
In short, spawners are the interface to the rest of the operating
system, and to configure them right you need to know a bit about how
the corresponding operating system service works.
The spawner is responsible for the environment of the single-user
notebook servers (described in the next section). In the end, it just
makes a choice about how to start these processes: for example, the
Docker spawner starts a normal Docker container and runs the right
command inside of it. Thus, the spawner is responsible for setting
what kind of software and data is available to the user.
The spawner runs internally to the Hub process but communicates with
outside services. It is configured by `c.JupyterHub.spawner_class` in
`jupyterhub_config.py`.
If a user tries to launch a notebook server and it doesn't work, the
error is usually with the spawner or the notebook server (as described
in the next section). Each spawner outputs some logs to the main
JupyterHub logs, but may also have logs in other places depending on
what services it interacts with (for example, the Docker spawner
somehow puts logs in the Docker system services, Kubernetes through
the `kubectl` API).
### Proxy
The JupyterHub **proxy** relays connections between the users
and their single-user notebook servers. What this basically means is
that the hub itself can shut down and the proxy can continue to
allow users to communicate with their notebook servers. (This
further emphasizes that the hub is responsible for starting, not
running, the notebooks). By default, the hub starts the proxy
automatically
and stops the proxy when the hub stops (so that connections get
interrupted). But when you [configure the proxy to run
separately](howto:separate-proxy),
user's connections will continue to work even without the hub.
The default proxy is **ConfigurableHttpProxy** which is simple but
effective. A more advanced option is the [**Traefik Proxy**](https://blog.jupyter.org/introducing-traefikproxy-a-new-jupyterhub-proxy-based-on-traefik-4839e972faf6),
which gives you redundancy and high-availability.
When users "connect to JupyterHub", they _always_ first connect to the
proxy and the proxy relays the connection to the hub. Thus, the proxy
is responsible for SSL and accepting connections from the rest of the
internet. The user uses the hub to authenticate and start the server,
and then the hub connects back to the proxy to adjust the proxy routes
for the user's server (e.g. the web path `/user/someone` redirects to
the server of someone at a certain internal address). The proxy has
to be able to internally connect to both the hub and all the
single-user servers.
The proxy always runs as a separate process to JupyterHub (even though
JupyterHub can start it for you). JupyterHub has one set of
configuration options for the proxy addresses (`bind_url`) and one for
the hub (`hub_bind_url`). If `bind_url` is given, it is just passed to
the automatic proxy to tell it what to do.
If you have problems after users are redirected to their single-user
notebook servers, or making the first connection to the hub, it is
usually caused by the proxy. The ConfigurableHttpProxy's logs are
mixed with JupyterHub's logs if it's started through the hub (the
default case), otherwise from whatever system runs the proxy (if you
do configure it, you'll know).
### Services
JupyterHub has the concept of **services** ([basics](tutorial:services),
[reference](services-reference)), which are other web services
started by the hub, but otherwise are not necessarily related to the
hub itself. They are often used to do things related to Jupyter
(things that user interacts with, usually not the hub), but could
always be run some other way. Running from the hub provides an easy
way to get Hub API tokens and authenticate users against the hub. It
can also automatically add a proxy route to forward web requests to
JupyterHub uses a database to store information about users, services, and other data needed for operating the Hub.
This is the **state** of the Hub.
## Why does JupyterHub have a database?
JupyterHub is a **stateful** application (more on that 'state' later).
Updating JupyterHub's configuration or upgrading the version of JupyterHub requires restarting the JupyterHub process to apply the changes.
We want to minimize the disruption caused by restarting the Hub process, so it can be a mundane, frequent, routine activity.
Storing state information outside the process for later retrieval is necessary for this, and one of the main thing databases are for.
A lot of the operations in JupyterHub are also **relationships**, which is exactly what SQL databases are great at.
For example:
- Given an API token, what user is making the request?
- Which users don't have running servers?
- Which servers belong to user X?
- Which users have not been active in the last 24 hours?
Finally, a database allows us to have more information stored without needing it all loaded in memory,
e.g. supporting a large number (several thousands) of inactive users.
## What's in the database?
The short answer of what's in the JupyterHub database is "everything."
JupyterHub's **state** lives in the database.
That is, everything JupyterHub needs to be aware of to function that _doesn't_ come from the configuration files, such as
- users, roles, role assignments
- state, urls of running servers
- Hashed API tokens
- Short-lived state related to OAuth flow
- Timestamps for when users, tokens, and servers were last used
### What's _not_ in the database
Not _quite_ all of JupyterHub's state is in the database.
This mostly involves transient state, such as the 'pending' transitions of Spawners (starting, stopping, etc.).
Anything not in the database must be reconstructed on Hub restart, and the only sources of information to do that are the database and JupyterHub configuration file(s).
## How does JupyterHub use the database?
JupyterHub makes some _unusual_ choices in how it connects to the database.
These choices represent trade-offs favoring single-process simplicity and performance at the expense of horizontal scalability (multiple Hub instances).
We often say that the Hub 'owns' the database.
This ownership means that we assume the Hub is the only process that will talk to the database.
This assumption enables us to make several caching optimizations that dramatically improve JupyterHub's performance (i.e. data written recently to the database can be read from memory instead of fetched again from the database) that would not work if multiple processes could be interacting with the database at the same time.
Database operations are also synchronous, so while JupyterHub is waiting on a database operation, it cannot respond to other requests.
This allows us to avoid complex locking mechanisms, because transaction races can only occur during an `await`, so we only need to make sure we've completed any given transaction before the next `await` in a given request.
:::{note}
We are slowly working to remove these assumptions, and moving to a more traditional db session per-request pattern.
This will enable multiple Hub instances and enable scaling JupyterHub, but will significantly reduce the number of active users a single Hub instance can serve.
:::
### Database performance in a typical request
Most authenticated requests to JupyterHub involve a few database transactions:
1. look up the authenticated user (e.g. look up token by hash, then resolve owner and permissions)
2. record activity
3. perform any relevant changes involved in processing the request (e.g. create the records for a running server when starting one)
This means that the database is involved in almost every request, but only in quite small, simple queries, e.g.:
- lookup one token by hash
- lookup one user by name
- list tokens or servers for one user (typically 1-10)
- etc.
### The database as a limiting factor
As a result of the above transactions in most requests, database performance is the _leading_ factor in JupyterHub's baseline requests-per-second performance, but that cost does not scale significantly with the number of users, active or otherwise.
However, the database is _rarely_ a limiting factor in JupyterHub performance in a practical sense, because the main thing JupyterHub does is start, stop, and monitor whole servers, which take far more time than any small database transaction, no matter how many records you have or how slow your database is (within reason).
Additionally, there is usually _very_ little load on the database itself.
By far the most taxing activity on the database is the 'list all users' endpoint, primarily used by the [idle-culling service](https://github.com/jupyterhub/jupyterhub-idle-culler).
Database-based optimizations have been added to make even these operations feasible for large numbers of users:
1. State filtering on [GET /hub/api/users?state=active](rest-api-get-users),
which limits the number of results in the query to only the relevant subset (added in JupyterHub 1.3), rather than all users.
2. [Pagination](api-pagination) of all list endpoints, allowing the request of a large number of resources to be more fairly balanced with other Hub activities across multiple requests (added in 2.0).
:::{note}
It's important to note when discussing performance and limiting factors and that all of this only applies to requests to `/hub/...`.
The Hub and its database are not involved in most requests to single-user servers (`/user/...`), which is by design, and largely motivated by the fact that the Hub itself doesn't _need_ to be fast because its operations are infrequent and large.
:::
## Database backends
JupyterHub supports a variety of database backends via [SQLAlchemy][].
The default is sqlite, which works great for many cases, but you should be able to use many backends supported by SQLAlchemy.
Usually, this will mean PostgreSQL or MySQL, both of which are officially supported and well tested with JupyterHub, but others may work as well.
See [SQLAlchemy's docs][sqlalchemy-dialect] for how to connect to different database backends.
Doing so generally involves:
1. installing a Python package that provides a client implementation, and
2. setting [](JupyterHub.db_url) to connect to your database with the specified implementation
The default database backend for JupyterHub is [SQLite](https://sqlite.org).
We have chosen SQLite as JupyterHub's default because it's simple (the 'database' is a single file) and ubiquitous (it is in the Python standard library).
It works very well for testing, small deployments, and workshops.
For production systems, SQLite has some disadvantages when used with JupyterHub:
-`upgrade-db` may not always work, and you may need to start with a fresh database
-`downgrade-db`**will not** work if you want to rollback to an earlier
version, so backup the `jupyterhub.sqlite` file before upgrading (JupyterHub automatically creates a date-stamped backup file when upgrading sqlite)
The sqlite documentation provides a helpful page about [when to use SQLite and
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
### Picking your database backend (PostgreSQL, MySQL)
When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement, which is used in some database upgrade steps.
In general, you select your database backend with [](JupyterHub.db_url), and can further configure it (usually not necessary) with [](JupyterHub.db_kwargs).
## Notes and Tips
### SQLite
The SQLite database should not be used on NFS. SQLite uses reader/writer locks
to control access to the database. This locking mechanism might not work
correctly if the database file is kept on an NFS filesystem. This is because
`fcntl()` file locking is broken on many NFS implementations. Therefore, you
should avoid putting SQLite database files on NFS since it will not handle well
multiple processes which might try to access the file at the same time.
### PostgreSQL
We recommend using PostgreSQL for production if you are unsure whether to use
MySQL or PostgreSQL or if you do not have a strong preference.
There is additional configuration required for MySQL that is not needed for PostgreSQL.
For example, to connect to a PostgreSQL database with psycopg2:
1. install psycopg2: `pip install psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
- You should probably use the `pymysql` or `mysqlclient` sqlalchemy provider, or another backend [recommended by sqlalchemy](https://docs.sqlalchemy.org/en/20/dialects/mysql.html#dialect-mysql)
- You also need to set `pool_recycle` to some value (typically 60 - 300, JupyterHub will default to 60)
which depends on your MySQL setup. This is necessary since MySQL kills
connections serverside if they've been idle for a while, and the connection
from the hub will be idle for longer than most connections. This behavior
will lead to frustrating 'the connection has gone away' errors from
sqlalchemy if `pool_recycle` is not set.
- If you use `utf8mb4` collation with MySQL earlier than 5.7.7 or MariaDB
earlier than 10.2.1 you may get an `1709, Index column size too large` error.
To fix this you need to set `innodb_large_prefix` to enabled and
`innodb_file_format` to `Barracuda` to allow for the index sizes jupyterhub
uses. `row_format` will be set to `DYNAMIC` as long as those options are set
correctly. Later versions of MariaDB and MySQL should set these values by
default, as well as have a default `DYNAMIC` `row_format` and pose no trouble
to users.
For example, to connect to a mysql database with mysqlclient:
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
When a user logs into JupyterHub, they get a 'server', which we usually call the **single-user server**, because it's a server that's meant for a single JupyterHub user.
Each JupyterHub user gets a different one (or more than one!).
A single-user server is a process running somewhere that is:
1. accessible over http[s],
2. authenticated via JupyterHub using OAuth 2.0,
3. started by a [Spawner](spawners), and
4. 'owned' by a single JupyterHub user
## The single-user server command
The Spawner's default single-user server startup command, `jupyterhub-singleuser`, launches `jupyter-server`, the same program used when you run `jupyter lab` on your laptop.
(_It can also launch the legacy `jupyter-notebook` server_).
That's why JupyterHub looks familiar to folks who are already using Jupyter at home or elsewhere.
It's the same!
`jupyterhub-singleuser`_customizes_ that program to change (approximately) one thing: **authenticate requests with JupyterHub**.
(singleuser-auth)=
## Single-user server authentication
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services-reference`
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
This is primarily implemented in the {class}`~.HubOAuth` class.
This code resides in `jupyterhub.singleuser` subpackage of JupyterHub.
The main task of this code is to:
1. resolve a JupyterHub token to a JupyterHub user (authenticate)
2. check permissions (`access:servers`) for the token to make sure the request should be allowed (authorize)
3. if not authorized, begin the OAuth process with a redirect to the Hub
4. after login, store OAuth tokens in a cookie only used by this single-user server
5. implement logout to clear the cookie
Most of this is implemented in the {class}`~.HubOAuth` class. `jupyterhub.singleuser` is responsible for _adapting_ the base Jupyter Server to use HubOAuth for these tasks.
### JupyterHub authentication extension
By default, `jupyter-server` uses its own cookie to authenticate.
If that cookie is not present, the server redirects you a login page and asks you to enter a password or token.
Jupyter Server 2.0 introduces two new _APIs_ for customizing authentication: the [IdentityProvider](inv:jupyter-server#jupyter_server.auth.IdentityProvider) and the [Authorizer](inv:jupyter-server#jupyter_server.auth.Authorizer).
More information can be found in the [Jupyter Server documentation](https://jupyter-server.readthedocs.io).
JupyterHub implements these APIs in `jupyterhub.singleuser.extension`.
The IdentityProvider is responsible for _authenticating_ requests.
In JupyterHub, that means extracting OAuth tokens from the request and resolving them to a JupyterHub user.
The Authorizer is a _separate_ API for _authorizing_ actions on particular resources.
Because the JupyterHub IdentityProvider only allows _authenticating_ users who already have the necessary `access:servers` permission to access the server, the default Authorizer only contains a redundant check for this same permission, and ignores the resource inputs.
However, specifying a _custom_ Authorizer allows for granular permissions, such as read-only access to subsets of a shared server.
### JupyterHub authentication via subclass
Prior to Jupyter Server 2 (i.e. Jupyter Server 1.x or the legacy `jupyter-notebook` server), JupyterHub authentication is applied via _subclass_.
Originally a subclass of `NotebookApp`,
this approach works with both `jupyter-server` and `jupyter-notebook`.
Instead of using the extension mechanisms above,
the server application is _subclassed_. This worked well in the `jupyter-notebook` days,
but doesn't fit well with Jupyter Server's extension-based architecture.
Using the JupyterHub singleuser-server extension is the default behavior of JupyterHub 4 and Jupyter Server 2, otherwise the subclass approach is taken.
You can opt-out of the extension by setting the environment variable `JUPYTERHUB_SINGLEUSER_EXTENSION=0`:
```python
c.Spawner.environment.update(
{
"JUPYTERHUB_SINGLEUSER_EXTENSION":"0",
}
)
```
The subclass approach will also be taken if you've opted to use the classic notebook server with:
```
JUPYTERHUB_SINGLEUSER_APP=notebook
```
which was introduced in JupyterHub 2.
## Other customizations
`jupyterhub-singleuser` makes other small customizations to how the single-user server behaves:
1. logs activity on the single-user server, used in [idle-culling](https://github.com/jupyterhub/jupyterhub-idle-culler).
2. disables some features that don't make sense in JupyterHub (trash, retrying ports)
3. loading options such as URLs and SSL configuration from the environment
4. customize logging for consistency with JupyterHub logs
## Running a single-user server that's not `jupyterhub-singleuser`
By default, `jupyterhub-singleuser` is the same `jupyter-server` used by JupyterLab, Jupyter notebook (>= 7), etc.
But technically, all JupyterHub cares about is that it is:
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services-reference`.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
The **Security Overview** section helps you learn about:
- the design of JupyterHub with respect to web security
- the semi-trusted user
- the available mitigations to protect untrusted users from each other
- the value of periodic security audits
This overview also helps you obtain a deeper understanding of how JupyterHub
works.
## Semi-trusted and untrusted users
JupyterHub is designed to be a _simple multi-user server for modestly sized
groups_ of **semi-trusted** users. While the design reflects serving
semi-trusted users, JupyterHub can also be suitable for serving **untrusted** users,
but **is not suitable for untrusted users** in its default configuration.
As a result, using JupyterHub with **untrusted** users means more work by the
administrator, since much care is required to secure a Hub, with extra caution on
protecting users from each other.
One aspect of JupyterHub's _design simplicity_ for **semi-trusted** users is that
the Hub and single-user servers are placed in a _single domain_, behind a
[_proxy_][configurable-http-proxy]. If the Hub is serving untrusted
users, many of the web's cross-site protections are not applied between
single-user servers and the Hub, or between single-user servers and each
other, since browsers see the whole thing (proxy, Hub, and single user
servers) as a single website (i.e. single domain).
## Protect users from each other
To protect users from each other, a user must **never** be able to write arbitrary
HTML and serve it to another user on the Hub's domain. This is prevented by JupyterHub's
authentication setup because only the owner of a given single-user notebook server is
allowed to view user-authored pages served by the given single-user notebook
server.
To protect all users from each other, JupyterHub administrators must
ensure that:
- A user **does not have permission** to modify their single-user notebook server,
including:
- the installation of new packages in the Python environment that runs
their single-user server;
- the creation of new files in any `PATH` directory that precedes the
directory containing `jupyterhub-singleuser` (if the `PATH` is used
to resolve the single-user executable instead of using an absolute path);
- the modification of environment variables (e.g. PATH, PYTHONPATH) for
their single-user server;
- the modification of the configuration of the notebook server
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
- unrestricted selection of the base environment (e.g. the image used in container-based Spawners)
If any additional services are run on the same domain as the Hub, the services
**must never** display user-authored HTML that is neither _sanitized_ nor _sandboxed_
to any user that lacks authentication as the author of a file.
### Sharing access to servers
Because sharing access to servers (via `access:servers` scopes or the sharing feature in JupyterHub 5) by definition means users can serve each other files, enabling sharing is not suitable for untrusted users without also enabling per-user domains.
JupyterHub does not enable any sharing by default.
## Mitigate security issues
The several approaches to mitigating security issues with configuration
options provided by JupyterHub include:
(subdomains)=
### Enable user subdomains
JupyterHub provides the ability to run single-user servers on their own
domains. This means the cross-origin protections between servers has the
desired effect, and user servers and the Hub are protected from each other.
**Subdomains are the only way to reliably isolate user servers from each other.**
When subdomains are enabled, each user's single-user server will be at e.g. `https://username.jupyter.example.org`.
This also requires all user subdomains to point to the same address,
which is most easily accomplished with wildcard DNS, where a single A record points to your server and a wildcard CNAME record points to your A record:
```
A jupyter.example.org 192.168.1.123
CNAME *.jupyter.example.org jupyter.example.org
```
Since this spreads the service across multiple domains, you will likely need wildcard SSL as well,
matching `*.jupyter.example.org`.
Unfortunately, for many institutional domains, wildcard DNS and SSL may not be available.
We also **strongly encourage** serving JupyterHub and user content on a domain that is _not_ a subdomain of any sensitive content.
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/2013-04-09-yummy-cookies-across-domains/).
**If you do plan to serve untrusted users, enabling subdomains is highly encouraged**,
as it resolves many security issues, which are difficult to unavoidable when JupyterHub is on a single-domain.
:::{important}
JupyterHub makes no guarantees about protecting users from each other unless subdomains are enabled.
If you want to protect users from each other, you **_must_** enable per-user domains.
:::
### Disable user config
If subdomains are unavailable or undesirable, JupyterHub provides a
configuration option `Spawner.disable_user_config = True`, which can be set to prevent
the user-owned configuration files from being loaded. After implementing this
option, `PATH`s and package installation are the other things that the
admin must enforce.
### Prevent spawners from evaluating shell configuration files
For most Spawners, `PATH` is not something users can influence, but it's important that
the Spawner should _not_ evaluate shell configuration files prior to launching the server.
### Isolate packages in a read-only environment
The user must not have permission to install packages into the environment where the singleuser-server runs.
On a shared system, package isolation is most easily handled by running the single-user server in
a root-owned virtualenv with disabled system-site-packages.
The user must not have permission to install packages into this environment.
The same principle extends to the images used by container-based deployments.
If users can select the images in which their servers run, they can disable all security for their own servers.
It is important to note that the control over the environment is only required for the
single-user server, and not the environment(s) in which the users' kernel(s)
may run. Installing additional packages in the kernel environment does not
pose additional risk to the web application's security.
### Encrypt internal connections with SSL/TLS
By default, all communications within JupyterHub—between the proxy, hub, and single
-user notebooks—are performed unencrypted. Setting the `internal_ssl` flag in
`jupyterhub_config.py` secures the aforementioned routes. Turning this
feature on does require that the enabled `Spawner` can use the certificates
generated by the `Hub` (the default `LocalProcessSpawner` can, for instance).
It is also important to note that this encryption **does not** cover the
`zmq tcp` sockets between the Notebook client and kernel yet. While users cannot
submit arbitrary commands to another user's kernel, they can bind to these
sockets and listen. When serving untrusted users, this eavesdropping can be
mitigated by setting `KernelManager.transport` to `ipc`. This applies standard
Unix permissions to the communication sockets thereby restricting
communication to the socket owner. The `internal_ssl` option will eventually
extend to securing the `tcp` sockets as well.
### Mitigating same-origin deployments
While per-user domains are **required** for robust protection of users from each other,
you can mitigate many (but not all) cross-user issues.
First, it is critical that users cannot modify their server environments, as described above.
Second, it is important that users do not have `access:servers` permission to any server other than their own.
If users can access each others' servers, additional security measures must be enabled, some of which come with distinct user-experience costs.
Without the [Same-Origin Policy] (SOP) protecting user servers from each other,
each user server is considered a trusted origin for requests to each other user server (and the Hub itself).
Servers _cannot_ meaningfully distinguish requests originating from other user servers,
because SOP implies a great deal of trust, losing many restrictions applied to cross-origin requests.
That means pages served from each user server can:
1. arbitrarily modify the path in the Referer
2. make fully authorized requests with cookies
3. access full page contents served from the hub or other servers via popups
JupyterHub uses distinct xsrf tokens stored in cookies on each server path to attempt to limit requests across.
This has limitations because not all requests are protected by these XSRF tokens,
and unless additional measures are taken, the XSRF tokens from other user prefixes may be retrieved.
`allow-popups` is not disabled by default because disabling it breaks legitimate functionality, like "Open this in a new tab", and the "JupyterHub Control Panel" menu item.
To reiterate, the right way to avoid these issues is to enable per-user domains, where none of these concerns come up.
Note: even this level of protection requires administrators maintaining full control over the user server environment.
If users can modify their server environment, these methods are ineffective, as users can readily disable them.
### Cookie tossing
Cookie tossing is a technique where another server on a subdomain or peer subdomain can set a cookie
which will be read on another domain.
This is not relevant unless there are other user-controlled servers on a peer domain.
"Domain-locked" cookies avoid this issue, but have their own restrictions:
- JupyterHub must be served over HTTPS
- All secure cookies must be set on `/`, not on sub-paths, which means they are shared by all JupyterHub components in a single-domain deployment.
As a result, this option is only recommended when per-user subdomains are enabled,
to prevent sending all jupyterhub cookies to all user servers.
To enable domain-locked cookies, set:
```python
c.JupyterHub.cookie_host_prefix_enabled=True
```
```{versionadded} 4.1
```
### Forced-login
Jupyter servers can share links with `?token=...`.
JupyterHub prior to 5.0 will accept this request and persist the token for future requests.
This is useful for enabling admins to create 'fully authenticated' links bypassing login.
However, it also means users can share their own links that will log other users into their own servers,
enabling them to serve each other notebooks and other arbitrary HTML, depending on server configuration.
```{versionadded} 4.1
Setting environment variable `JUPYTERHUB_ALLOW_TOKEN_IN_URL=0` in the single-user environment can opt out of accepting token auth in URL parameters.
```
```{versionadded} 5.0
Accepting tokens in URLs is disabled by default, and `JUPYTERHUB_ALLOW_TOKEN_IN_URL=1` environment variable must be set to _allow_ token auth in URL parameters.
```
## Security audits
We recommend that you do periodic reviews of your deployment's security. It's
good practice to keep [JupyterHub](https://readthedocs.org/projects/jupyterhub/), [configurable-http-proxy][], and [nodejs
versions](https://github.com/nodejs/Release) up to date.
and can look different depending on what you mean by 'share.'
Your first instinct might be to copy the URL you see in the browser,
e.g. `jupyterhub.example/user/yourname/notebooks/coolthing.ipynb`,
but this usually won't work, depending on the permissions of the person you share the link with.
Unfortunately, 'share' means at least a few things to people in a JupyterHub context.
We'll cover 3 common cases here, when they are applicable, and what assumptions they make:
1. sharing links that will open the same file on the visitor's own server
2. sharing links that will bring the visitor to _your_ server (e.g. for real-time collaboration, or RTC)
3. publishing notebooks and sharing links that will download the notebook into the user's server
### link to the same file on the visitor's server
This is for the case where you have JupyterHub on a shared (or sufficiently similar) filesystem, where you want to share a link that will cause users to login and start their _own_ server, to view or edit the file.
**Assumption:** the same path on someone else's server is valid and points to the same file
This is useful in e.g. classes where you know students have certain files in certain locations, or collaborations where you know you have a shared filesystem where everyone has access to the same files.
A link should look like `https://jupyterhub.example/hub/user-redirect/lab/tree/foo.ipynb`.
You can hand-craft these URLs from the URL you are looking at, where you see `/user/name/lab/tree/foo.ipynb` use `/hub/user-redirect/lab/tree/foo.ipynb` (replace `/user/name/` with `/hub/user-redirect/`).
Or you can use JupyterLab's "copy shareable link" in the context menu in the file browser:

which will produce a correct URL with `/hub/user-redirect/` in it.
### link to the file on your server
This is for the case where you want to both be using _your_ server, e.g. for real-time collaboration (RTC).
**Assumption:** the user has (or should have) access to your server.
**Assumption:** your server is running _or_ the user has permission to start it.
By default, JupyterHub users don't have access to each other's servers, but JupyterHub 2.0 administrators can grant users limited access permissions to each other's servers.
If the visitor doesn't have access to the server, these links will result in a 403 Permission Denied error.
In many cases, for this situation you can copy the link in your URL bar (`/user/yourname/lab`), or you can add `/tree/path/to/specific/notebook.ipynb` to open a specific file.
The [jupyterlab-link-share] JupyterLab extension generates these links, and even can _grant_ other users access to your server.
### How can I kill ports from JupyterHub-managed services that have been orphaned?
@@ -165,7 +167,7 @@ When your whole JupyterHub sits behind an organization proxy (_not_ a reverse pr
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
[JupyterHub services](https://jupyterhub.readthedocs.io/en/stable/reference/services.html) allow processes to interact with JupyterHub's REST API. Example use-cases include:
{ref}`services-reference` allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
@@ -196,6 +198,23 @@ With a docker container, pass in the environment variable with the run command:
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
### Jupyter Notebook/Lab can be launched, but notebooks seem to hang when trying to execute a cell
This often occurs when your browser is unable to open a websocket connection to a Jupyter kernel.
#### Diagnose
Open your browser console, e.g. [Chrome](https://developer.chrome.com/docs/devtools/console), [Firefox](https://firefox-source-docs.mozilla.org/devtools-user/web_console/).
If you see errors related to opening websockets this is likely to be the problem.
#### Solutions
This could be caused by anything related to the network between your computer/browser and the server running JupyterHub, such as:
- reverse proxies (see {ref}`howto:config:reverse-proxy` for example configurations)
- anti-virus or firewalls running on your computer or JupyterHub server
- transparent proxies running on your network
## How do I...?
### Use a chained SSL certificate
@@ -257,17 +276,6 @@ the entire filesystem and set the default to the user's home directory.
c.Spawner.notebook_dir = '/'
c.Spawner.default_url = '/home/%U' # %U will be replaced with the username
### How do I increase the number of pySpark executors on YARN?
From the command line, pySpark executors can be configured using a command
[Cloudera documentation for configuring spark on YARN applications](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps)
provides additional information. The [pySpark configuration documentation](https://spark.apache.org/docs/0.9.0/configuration.html)
is also helpful for programmatic configuration examples.
### How do I use JupyterLab's pre-release version with JupyterHub?
While JupyterLab is still under active development, we have had users
@@ -298,6 +306,52 @@ notebook servers to default to JupyterLab:
Users will need a GitHub account to log in and be authenticated by the Hub.
### I'm seeing "403 Forbidden XSRF cookie does not match POST" when users try to login
During login, JupyterHub takes the request IP into account for CSRF protection.
If proxies are not configured to properly set forwarded ips,
JupyterHub will see all requests as coming from an internal ip,
likely the ip of the proxy itself.
You can see this in the JupyterHub logs, which log the ip address of requests.
If most requests look like they are coming from a small number `10.0.x.x` or `172.16.x.x` ips, the proxy is not forwarding the true request ip properly.
If the proxy has multiple replicas,
then it is likely the ip may change from one request to the next,
leading to this error during login:
> 403 Forbidden XSRF cookie does not match POST argument
The best way to fix this is to ensure your proxies set the forwarded headers, e.g. for nginx:
A [generic implementation](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.generic.html#module-oauthenticator.generic), which you can use for OAuth authentication
with any provider, is also available.
## Use DummyAuthenticator for testing
The `DummyAuthenticator` is a simple Authenticator that
allows for any username or password unless a global password has been set. If
set, it will allow for any username as long as the correct password is provided.
To set a global password, add this to the config file:
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
Sharing links to notebooks is a common activity,
and can look different based on what you mean.
Your first instinct might be to copy the URL you see in the browser,
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
However, let's break down what this URL means:
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
which means that sharing this URL is asking the person you share the link with
to come to _your server_ and look at the exact same file.
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
**But what is our actual goal?**
A typical situation is that you have some shared or common filesystem,
such that the same path corresponds to the same document
(either the exact same document or another copy of it).
Typically, what folks want when they do sharing like this
is for each visitor to open the same file _on their own server_,
so Breq would open `/user/breq/notebooks/foo.ipynb` and
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
JupyterHub has a special URL that does exactly this!
It's called `/hub/user-redirect/...`.
So if you replace `/user/yourname` in your URL bar
with `/hub/user-redirect` any visitor should get the same
URL on their own server, rather than visiting yours.
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
# Run JupyterHub without root privileges using `sudo`
**Note:** Setting up `sudo` permissions involves many pieces of system
@@ -6,14 +8,14 @@ Only do this if you are very sure you must.
## Overview
There are many [Authenticators](../getting-started/authenticators-users-basics) and [Spawners](../getting-started/spawners-basics) available for JupyterHub. Some, such
There are many [Authenticators](authenticators) and [Spawners](spawners) available for JupyterHub. Some, such
as [DockerSpawner](https://github.com/jupyterhub/dockerspawner) or [OAuthenticator](https://github.com/jupyterhub/oauthenticator), do not need any elevated permissions. This
document describes how to get the full default behavior of JupyterHub while
running notebook servers as real system users on a shared system, without
running the Hub itself as root.
Since JupyterHub needs to spawn processes as other users, the simplest way
is to run it as root, spawning user servers with [setuid](http://linux.die.net/man/2/setuid).
is to run it as root, spawning user servers with [setuid](https://linux.die.net/man/2/setuid).
But this isn't especially safe, because you have a process running on the
public web as root.
@@ -114,7 +116,7 @@ sudo: a password is required
## Enable PAM for non-root
By default, [PAM authentication](http://en.wikipedia.org/wiki/Pluggable_authentication_module)
By default, [PAM authentication](https://en.wikipedia.org/wiki/Pluggable_authentication_module)
is used by JupyterHub. To use PAM, the process may need to be able to read
and [IPython](https://ipython.readthedocs.io/en/stable/development/config.html)
have their own configuration systems.
@@ -57,6 +59,24 @@ The typical locations for these config files are:
- **system-wide** in `/etc/{jupyter|ipython}`
- **env-wide** (environment wide) in `{sys.prefix}/etc/{jupyter|ipython}`.
### Jupyter environment configuration priority
When Jupyter runs in an environment (conda or virtualenv), it prefers to load configuration from the environment over each user's own configuration (e.g. in `~/.jupyter`).
This may cause issues if you use a _shared_ conda environment or virtualenv for users, because e.g. jupyterlab may try to write information like workspaces or settings to the environment instead of the user's own directory.
This could fail with something like `Permission denied: $PREFIX/etc/jupyter/lab`.
To avoid this issue, set `JUPYTER_PREFER_ENV_PATH=0` in the user environment:
```python
c.Spawner.environment.update(
{
"JUPYTER_PREFER_ENV_PATH": "0",
}
)
```
which tells Jupyter to prefer _user_ configuration paths (e.g. in `~/.jupyter`) to configuration set in the environment.
### Example: Enable an extension system-wide
For example, to enable the `cython` IPython extension for all of your users, create the file `/etc/ipython/ipython_config.py`:
Named servers were implemented in the REST API in JupyterHub 0.8,
and JupyterHub 1.0 introduces UI for managing named servers via the user home page:


as well as the admin page:


Named servers can be accessed, created, started, stopped, and deleted
from these pages. Activity tracking is now per server as well.
@@ -184,6 +204,8 @@ This can be useful for quota service implementations. The example above limits t
If `named_server_limit_per_user` is set to `0`, no limit is enforced.
When using named servers, Spawners may need additional configuration to take the `servername` into account. Whilst `KubeSpawner` takes the `servername` into account by default in [`pod_name_template`](https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner.pod_name_template), other Spawners may not. Check the documentation for the specific Spawner to see how singleuser servers are named, for example in `DockerSpawner` this involves modifying the [`name_template`](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html) setting to include `servername`, eg. `"{prefix}-{username}-{servername}"`.
(classic-notebook-ui)=
## Switching back to the classic notebook
@@ -192,13 +214,31 @@ By default, the single-user server launches JupyterLab,
which is based on [Jupyter Server][].
This is the default server when running JupyterHub ≥ 2.0.
To switch to using the legacy Jupyter Notebook server, you can set the `JUPYTERHUB_SINGLEUSER_APP` environmentvariable
To switch to using the legacy Jupyter Notebook server (notebook <7.0),youcanset the`JUPYTERHUB_SINGLEUSER_APP`environment variable
is only valid for notebook <7.notebookv7isbasedonjupyter-server,
and the default jupyter-server application must be used.
Selecting the new notebook UI is no longer a matter of selecting the server app to launch,
but only the default URL for users to visit.
To use notebook v7 with JupyterHub, leave the default singleuser app config alone (or specify `JUPYTERHUB_SINGLEUSER_APP=jupyter-server`) and set the default _URL_ for user servers:
The _How-to_ guides provide practical step-by-step details to help you achieve a particular goal. They are useful when you are trying to get something done but require you to understand and adapt the steps to your specific usecase.
Use the following guides when:
```{toctree}
:maxdepth: 1
api-only
proxy
rest
separate-proxy
templates
upgrading
log-messages
```
(config-examples)=
## Configuration
The following guides provide examples, including configuration files and tips, for the
If you were using subdomains before, some user servers and all services will be on different hosts in the default configuration.
JupyterHub 5 allows complete customization of the subdomain scheme via the new {attr}`.JupyterHub.subdomain_hook`,
and changes the default subdomain scheme.
.
You can provide a completely custom subdomain scheme, or select one of two default implementations by name: `idna` or `legacy`. `idna` is the default.
The new default behavior can be selected explicitly via:
```python
c.JupyterHub.subdomain_hook="idna"
```
Or to delay any changes to URLs for your users, you can opt-in to the pre-5.0 behavior with:
```python
c.JupyterHub.subdomain_hook="legacy"
```
The key differences of the new `idna` scheme:
- It should always produce valid domains, regardless of username (not true for the legacy scheme when using characters that might need escaping or usernames that are long)
- each Service gets its own subdomain on `service--` rather than sharing `services.`
Below is a table of examples of users and services with their domains with the old and new scheme, assuming the configuration:
| user | laudna | `laudna.jupyter.example.org` | `laudna.jupyter.example.org` |
| service | bells | `services.jupyter.example.org` | `bells--service.jupyter.example.org` |
| user | jester@mighty.nein | `jester_40mighty.nein.jupyter.example.org` (may not work!) | `u-jestermi--8037680.jupyter.example.org` (not as pretty, but guaranteed to be valid and not collide) |
## Tokens in URLs
JupyterHub 5 does not accept `?token=...` URLs by default in single-user servers.
These URLs allow one user to force another to login as them,
which can be the start of an inter-user attack.
There is a valid use case for producing links which allow starting a fully authenticated session,
so you may still opt in to this behavior by setting:
if you are not concerned about protecting your users from each other.
If you have subdomains enabled, the threat is substantially reduced.
## Sharing
The big new feature in JupyterHub 5.0 is sharing.
Check it out in [the sharing docs](sharing-tutorial).
## Authenticator.allow_all and allow_existing_users
Prior to JupyterHub 5, JupyterHub Authenticators had the _implicit_ default behavior to allow any user who successfully authenticates to login **if no users are explicitly allowed** (i.e. `allowed_users` is empty on the base class).
This behavior was considered a too-permissive default in Authenticators that source large user pools like OAuthenticator, which would accept e.g. all users with a Google account by default.
As a result, OAuthenticator 16 introduced two configuration options: `allow_all` and `allow_existing_users`.
JupyterHub 5 adopts these options for all Authenticators:
1.`Authenticator.allow_all` (default: False)
2.`Authenticator.allow_existing_users` (default: True if allowed_users is non-empty, False otherwise)
having the effect that _some_ allow configuration is required for anyone to be able to login.
If you want to preserve the pre-5.0 behavior with no explicit `allow` configuration, set:
```python
c.Authenticator.allow_all=True
```
`allow_existing_users` defaults are meant to be backward-compatible, but you can now _explicitly_ allow or not based on presence in the database by setting `Authenticator.allow_existing_users` to True or False.
:::{seealso}
[Authenticator config docs](authenticators) for details on these and other Authenticator options.
:::
## Bootstrap 5
JupyterHub uses the CSS framework [bootstrap](https://getbootstrap.com), which is upgraded from 3.4 to 5.3.
If you don't have any custom HTML templates, you are likely to only see relatively minor aesthetic changes.
If you have custom HTML templates or spawner options forms, they may need some updating to look right.
See the bootstrap documentation. Since we upgraded two major versions, you might need to look at both v4 and v5 documentation for what has changed since 3.x:
- [migrating to v4](https://getbootstrap.com/docs/4.6/migration/)
- [migrating to v5](https://getbootstrap.com/docs/5.3/migration/)
If you customized the JupyterHub CSS by recompiling from LESS files, bootstrap migrated to SCSS.
You can start by autoconverting your LESS to SCSS (it's not that different) with [less2sass](https://github.com/ekryski/less2sass):
```bash
npm install --global less2scss
# converts less/foo.less to scss/foo.scss
less2scss --src ./less --dst ./scss
```
Bootstrap also allows configuring things with [CSS variables](https://getbootstrap.com/docs/5.3/customize/css-variables/), so depending on what you have customized, you may be able to get away with just adding a CSS file defining variables without rebuilding the whole SCSS.
## groups required with Authenticator.manage_groups
Setting `Authenticator.manage_groups = True` allows the Authenticator to manage group membership by returning `groups` from the authentication model.
However, this option is available even on Authenticators that do not support it, which led to confusion.
Starting with JupyterHub 5, if `manage_groups` is True `authenticate`_must_ return a groups field, otherwise an error is raised.
This prevents confusion when users enable managed groups that is not implemented.
If an Authenticator _does_ support managing groups but was not providing a `groups` field in order to leave membership unmodified, it must specify `"groups": None` to make this explicit instead of implicit (this is backward-compatible).
[JupyterHub] is the best way to serve [Jupyter notebook] for multiple users.
Because JupyterHub manages a separate Jupyter environment for each user,
it can be used in a class of students, a corporate data science group, or a scientific
research group. It is a multi-user **Hub** that spawns, manages, and proxies multiple
instances of the single-user [Jupyter notebook] server.
(index/distributions)=
## Distributions
JupyterHub can be used in a collaborative environment by both small (0-100 users) and
large teams (more than 100 users) such as a class of students, corporate data science group
or scientific research group.
It has two main distributions which are developed to serve the needs of each of these teams respectively.
1. [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub) distribution is suitable if you need a small number of users (1-100) and a single server with a simple environment.
2. [Zero to JupyterHub with Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) allows you to deploy dynamic servers on the cloud if you need even more users.
This distribution runs JupyterHub on top of [Kubernetes](https://k8s.io).
```{note}
It is important to evaluate these distributions before you can continue with the
configuration of JupyterHub.
```
## Subsystems
JupyterHub is made up of four subsystems:
- a **Hub** (tornado process) that is the heart of JupyterHub
- a **configurable http proxy** (node-http-proxy) that receives the requests from the client's browser
- multiple **single-user Jupyter notebook servers** (Python/IPython/tornado) that are monitored by Spawners
- an **authentication class** that manages how users can access the system
Additionally, optional configurations can be added through a `config.py` file and manage users
kernels on an admin panel. A simplification of the whole system is displayed in the figure below:
```{image} images/jhub-fluxogram.jpeg
:align: center
:alt: JupyterHub subsystems
:width: 80%
```
JupyterHub performs the following functions:
- The Hub launches a proxy
- The proxy forwards all requests to the Hub by default
- The Hub handles user login and spawns single-user servers on demand
- The Hub configures the proxy to forward URL prefixes to the single-user
notebook servers
For convenient administration of the Hub, its users, and services,
JupyterHub also provides a {doc}`REST API <reference/rest-api>`.
The JupyterHub team and Project Jupyter value our community, and JupyterHub
follows the Jupyter [Community Guides](https://jupyter.readthedocs.io/en/latest/community/content-community.html).
---
## Documentation structure
### Tutorials
This section of the documentation contains step-by-step tutorials that help outline the capabilities of JupyterHub and how you can achieve specific aims, such as installing it. The tutorials are recommended if you do not have much experience with JupyterHub.
```{toctree}
:maxdepth: 2
tutorial/index.md
```
### How-to guides
The _How-to_ guides provide more in-depth details than the tutorials. They are recommended for those already familiar with JupyterHub and have a specific goal. The guides help answer the question _"How do I ...?"_ based on a particular topic.
```{toctree}
:maxdepth: 2
howto/index.md
```
### Explanation
The _Explanation_ section provides further details that can be used to better understand JupyterHub, such as how it can be used and configured. They are intended for those seeking to expand their knowledge of JupyterHub.
```{toctree}
:maxdepth: 2
explanation/index.md
```
### Reference
The _Reference_ section provides technical information about JupyterHub, such as monitoring the state of your installation and working with JupyterHub's API modules and classes.
```{toctree}
:maxdepth: 2
reference/index.md
```
### Frequently asked questions
Find answers to the most frequently asked questions about JupyterHub such as how to troubleshoot an issue.
```{toctree}
:maxdepth: 2
faq/index.md
```
### Contributing
JupyterHub welcomes all contributors, whether you are new to the project or know your way around. The _Contributing_ section provides information on how you can make your contributions.
```{toctree}
:maxdepth: 2
contributing/index
```
---
## Indices and tables
- {ref}`genindex`
- {ref}`modindex`
## Questions? Suggestions?
All questions and suggestions are welcome. Please feel free to use our [Jupyter Discourse Forum](https://discourse.jupyter.org/) to contact our team.
`JupyterHub`_ is the best way to serve `Jupyter notebook`_ for multiple users.
Because JupyterHub manages a separate Jupyter environment for each user,
it can be used in a class of students, a corporate data science group, or a scientific
research group. It is a multi-user **Hub** that spawns, manages, and proxies multiple
instances of the single-user `Jupyter notebook`_ server.
JupyterHub offers distributions for different use cases. As of now, you can find two main cases:
1.`The Littlest JupyterHub <https://github.com/jupyterhub/the-littlest-jupyterhub>`__ distribution is suitable if you need a small number of users (1-100) and a single server with a simple environment.
2.`Zero to JupyterHub with Kubernetes <https://github.com/jupyterhub/zero-to-jupyterhub-k8s>`__ allows you to deploy dynamic servers on the cloud if you need even more users.
JupyterHub can be used in a collaborative environment by both both small (0-100 users) and
large teams (more than 100 users) such as a class of students, corporate data science group
or scientific research group. It has distributions which are developed to serve the needs of
each of these teams respectively.
JupyterHub is made up of four subsystems:
* a **Hub** (tornado process) that is the heart of JupyterHub
* a **configurable http proxy** (node-http-proxy) that receives the requests from the client's browser
* multiple **single-user Jupyter notebook servers** (Python/IPython/tornado) that are monitored by Spawners
* an **authentication class** that manages how users can access the system
Additionally, optional configurations can be added through a `config.py` file and manage users
kernels on an admin panel. A simplification of the whole system is displayed in the figure below:
..image:: images/jhub-fluxogram.jpeg
:alt:JupyterHub subsystems
:width:80%
:align:center
JupyterHub performs the following functions:
- The Hub launches a proxy
- The proxy forwards all requests to the Hub by default
- The Hub handles user login and spawns single-user servers on demand
- The Hub configures the proxy to forward URL prefixes to the single-user
notebook servers
For convenient administration of the Hub, its users, and services,
JupyterHub also provides a :doc:`REST API <reference/rest-api>`.
The JupyterHub team and Project Jupyter value our community, and JupyterHub
follows the Jupyter `Community Guides <https://jupyter.readthedocs.io/en/latest/community/content-community.html>`_.
Contents
========
.._index/distributions:
Distributions
-------------
A JupyterHub **distribution** is tailored
towards a particular set of use cases. These are generally easier
to set up than setting up JupyterHub from scratch, assuming they fit your use case.
Today, you can find two main use cases:
1. If you need a simple case for a small amount of users (0-100) and single server
2. If you need to allow for a larger number of machines and users,
a dynamic amount of servers can be used on a cloud,
take a look at the `Zero to JupyterHub with Kubernetes <https://github.com/jupyterhub/zero-to-jupyterhub-k8s>`__ distribution.
This distribution runs JupyterHub on top of `Kubernetes <https://k8s.io>`_.
*It is important to evaluate these distributions before you can continue with the
configuration of JupyterHub*.
Installation Guide
------------------
..toctree::
:maxdepth:2
installation-guide
Getting Started
---------------
..toctree::
:maxdepth:2
getting-started/index
Technical Reference
-------------------
..toctree::
:maxdepth:2
reference/index
Administrators guide
--------------------
..toctree::
:maxdepth:2
index-admin
API Reference
-------------
..toctree::
:maxdepth:2
api/index
RBAC Reference
--------------
..toctree::
:maxdepth:2
rbac/index
Contributing
------------
We welcome you to contribute to JupyterHub in ways that are most exciting
& useful to you. We value documentation, testing, bug reporting & code equally
and are glad to have your contributions in whatever form you wish :)
Our `Code of Conduct <https://github.com/jupyter/governance/blob/HEAD/conduct/code_of_conduct.md>`_ and `reporting guidelines <https://github.com/jupyter/governance/blob/HEAD/conduct/reporting_online.md>`_
help keep our community welcoming to as many people as possible.
..toctree::
:maxdepth:2
contributing/index
About JupyterHub
----------------
..toctree::
:maxdepth:2
index-about
Indices and tables
==================
*:ref:`genindex`
*:ref:`modindex`
Questions? Suggestions?
=======================
All questions and suggestions are welcome. Please feel free to use our `Jupyter Discourse Forum <https://discourse.jupyter.org/>`_ to contact our team.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.