sevices/auth prevents calling check_xsrf_cookie,
but if the Handler itself called it the newly strict check would still be applied
this ensures the check is actually allowed for navigate GET requests
rather than attempting to clear multiple tokens (too complicated, breaks named servers)
look for and accept first valid token
have to do our own cookie parsing because existing cookie implementations only return a single value for each key
and default to selecting the _least_ likely to be correct, according to RFCs.
set updated xsrf cookie on login to avoid needing two requests to get the right cookie
need to inject our override into the base class,
rather than at the instance level,
to avoid clobbering any overrides in extensions like jupyter-server-proxy
- suggest token page first
- remove caveat about JupyterHub 0.8, which can be assumed now
- undocument `jupyterhub token`
- refresh token page screenshots, and remove duplicate screenshot of the token page
- minor improvements to language in token page
`Configuration Reference` sounds like it's the place to go to see the full list of JupyterHub config options.
However it's not very readable as it's a plain-text dump of the output of `jupyterhub --generate-config`.
This links to some of the API doc pages instead, which present most of the information in an easier to read format. Unfortunately it also includes a lot of non-traitlets documentation.
Added Playwright tests which are covered Login, Spawning, Home and Token pages
Removed Selenium cases which are covered Login, Spawning, Home and Token pages
consistent behavior with oauth_client_allowed_scopes,
where the _intersection_ of requested and owner-held permissions is granted,
instead of failing
Enables different users to have different permissions in $JUPTYERHUB_API_TOKEN,
either via callables or via requesting as much as you may want and only granting the subset.
Additionally, the !server filter can now be correctly applied to the server token
default behavior is unchanged
According to the review updated cases covered the Admin UI:
- rename the function,
- add docstring,
- add parametrization,
- use the fixture admin_user instead of the function
mostly a copy (fork) of singleuser app
using public APIs instead of lots of patching.
opt-in via `JUPYTERHUB_SINGLEUSER_EXTENSION=1`
related changes:
- stop running a test single-user server in a thread. It's complicated and fragile.
Instead, run it normally, and get the info we need from a custom handler registered via an extension
via the `full_spawn` fixture
use async fixtures for simpler event-loop integration
several of these fixtures were written before fixtures themselves could be async,
but now they can, which means we can use async/await instead of run_sync.
removes some workarounds needed for sqlalchemy 1.1 + 2.0 support
1.4 backports most 2.0 behavior, keeping it off-by-default for an easier opt-in transition
opt-in with `session.future = True` flag
- avoid backref warnings by adding objects to session explicitly before creating any relationships
- remove unnecessary `[]` around scalar query
- use `text()` wrapper on connection.execute
- engine.execute is removed
- update import of declarative_base
- ensure RemovedIn20Warning is available for warnings filters on sqlalchemy < 1.4 (needs editable install to avoid pytest path mismatch)
- explicitly relay password in engine.url to alembic
Removes all Referer checks, which have proven unreliable and have never been particularly strong
We can use XSRF on paths for more robust inter-path protections.
- `_xsrf` is added for forms via hidden input
- xsrf check is additionally applied to GET requests on API endpoints
- jupyter.chameleoncloud SSL is failing (I can reproduce with conda curl, but not /usr/bin/curl, so seems to be a CA issue)
- remove dead arnesund tag link (keep single article link)
- Add html language attribute
- Rename logo's alt text so it clearly states the image's purpose
- Fix missing first level heading for Login, Home and Token page
- Fix missing header level 1 of Login page
- Fix low contrast issue of navbar
Co-authored-by: Min RK <benjaminrk@gmail.com>
Added table-as-dict function instead of few functions for working with the tokens table
replaced static value from locators.py by locator itself in test_browser
simplified menu-bar case
allows us to wait for the javascript to finish loading,
since clicking buttons won't do anything if we click before the js has registered click handlers
- several http->https
- a few page moves
- miniconda->miniforge
- remove rochester from gallery, which doesn't apepar to be publicly documented (may be accessible internally, but that's not for a public gallery)
adding checks token from db, adding some explanations to async sleep(still wip), renamed few cases (spawn_panding, token), adding "cleanup_after" fixture into def browser()
- use interval instead of period to match other interval config
- avoid redundant `metric` in both class and method/attr names
- interval is separate per metric group (only one for now)
- Stops ambiguous wording on 'monthly', use more precise '30d'
- Add a 7d active users metric
- Allows us to make this configurable in the future if needed
This extension helps us restructure our documentation without creating
dead links. It requires us to explicitly declare what should be
redirected where though.
It seems better to have it in place ahead of time than to be something
we ask a contributor add just in time when its needed.
- E### warnings were already ignored by E
- Sorting of import warnings won't matter as we have black and isort
- D400: First line should end with a period
This is not flake8 configuration, but related to `pydocstyle`, which
isn't used.
These are *extremely useful* for people advocating for
more resources for their JupyterHubs, but a little difficult
to calculate without a full scale log ingestion and analytics
pipeline (such as ELK or equivalent). However, these are easy
to calculate on the JupyterHub side at any given instance -
these are fairly quick SQL queries. Prometheus can capture and
store this as a timeseries, and provide valuble advocacy data
that is hard to get otherwise.
This turns the metrics on by default, but only updates them every
hour - which seems fine for metrics that don't change that often.
This should reduce performance impact. Admins can also turn this
off if needed.
I've had to implement this in many different ways in many different
contexts, and it's also important to be able to *trust* these. Getting
this from other grafana data can be tricky to validate - we had an
experimental one in our grafana dashboards at some point in the past
that we had to kill due to it being hard to validate
(https://github.com/jupyterhub/grafana-dashboards/pull/45). This metric
will provide an authoritative source of truth for this.
Ref https://github.com/2i2c-org/infrastructure/issues/1888
When I needed to set up Linux on my windows, i found out the easier to do so was through Windows Subsystem for Linux. Hence, I needed to add the guide to the development setup
Hi @minrk, I went through the reference files you sent and made the necessary adjustments. I think everything should be fine now. I also added a new correction. Please review and revert.
Worked on the documentation page (jupyterhub/docs/source/rbac/use-case.md)
Added a wikipedia reference to [RBCA framework] and also emphasized on the solution under the *service to cull idle servers*
Adjusted the Heading sudospawner - Sudospawner to suite sentence case
Under proxy Settings, i adjusted the sentence "a organization to an organization" also singleuser to single-user
under Toree intergration, the sentence is not clear so I adjusted to Toree kernel will raise and issue when running with jupyterHub,
removed time.sleep(), removed inrelevant comments, removed unused packages, added few explanations for some functions (in progress), documenting in progress, added to conftest.py package with Options class
move offset to redux state, rather than independent,
since it can come from two places (user_page and pagination footer). Keeps things in sync.
Adds reducers for setting offset, name filter explicitly.
- sort keys for consistent presentation
- use text list for roles, groups, which aren't well rendered by the table-formatter (number index isn't helpful)
- render timestamps
- leave empty name for default server, instead of '[MAIN]' which isn't terminology used anywhere else
- define our own init_ioloop
- call it ASAP
- put init_httpserver in IOLoop.run_sync because instantiating a server accesses the current loop
- run cleanup via asyncio.run
3.7 adds ~ to the 'unreserved' (always safe) set,
but it's not safe in domain names.
so do it ourselves. Formalize in a `_dns_quote` private function,
with notes about issues.
The only usernames that change in this PR are those containing `_` or `/`,
the latter of which would have failed.
pass ssl.Purpose explicitly, deprecate verify/check_hostname
3.10 disallows 'purpose=SERVER_AUTH' from creating server sockets.
Instead:
- pass purpose directly
- always verify
- no need to set check_hostname, already covered by purpose defaults
Switches requests to tornado AsyncHTTPClient instead of requests
For backward-compatibility, use opt-in `sync=False` arg for all public methods that _may_ be async
When sync=True (default), async functions still used, but blocking via ThreadPool + asyncio run_until_complete
rather than roles, matching tokens
because oauth clients are mostly involved with issuing tokens,
they don't have roles themselves (their owners do).
This deprecates the `oauth_roles` config on Spawners and Services, in favor of `oauth_allowed_scopes`.
The ambiguously named `oauth_scopes` is renamed to `oauth_access_scopes`.
otherwise, index isn't used
note: this means changing the token prefix size requires revoking all tokens,
where before only _increasing_ the token prefix size required doing that.
The `admin_access` variable is still referenced elsewhere and I considered removing it, but I couldn't work out exactly how/if it's used so this is just a docs change for now.
allows oauth clients to issue scopes that only grant access to the issuing service
e.g. access:service!service or access:servers!server
especially useful with custom scopes
This was part of an attempt to get the url from self.server.bind_url that didn't end up getting used
shouldn't mutate db state when getting the environment
we expand/parse the same scopes _a lot_.
We can save time with some caching.
Main change: cached functions must return immutable frozenset instead of mutable set,
to avoid mutating the result of subsequent returns.
Some functions can only be cached _sometimes_ (e.g. group lookups in db cannot be cached),
for which we have a DoNotCache(result) exception
tokens have scopes
instead of roles, which allow tokens to change permissions over time
This is mostly a low-level change,
with little outward-facing effects.
- on upgrade, evaluate all token role assignments to their current scopes,
and store those scopes on the tokens
- assigning roles to tokens still works, but scopes are evaluated and validated immediately,
rather than lazily stored as roles
- no longer need to check for role permission changes on startup, because token permissions aren't affected
- move a few scope utilities from roles to scopes
- oauth allows specifying scopes, not just roles.
But these are still at the level specified in roles,
not fully-resolved scopes.
- more granular APIs for working with scopes and roles
Still to do later:
- expose scopes config for Spawner/service
- compute 'full' intersection of requested scopes, rather than on the 'raw' scope list in roles
removes need for our own implementation of the same behavior
but keep it around while we still support Python 3.6,
since the version (0.17) introducing asyncio_mode drops support for Python 3.6
instead of roles, which allow tokens to change permissions over time
This is mostly a low-level change,
with little outward-facing effects.
- on upgrade, evaluate all token role assignments to their current scopes,
and store those scopes on the tokens
- assigning roles to tokens still works, but scopes are evaluated and validated immediately,
rather than lazily stored as roles
- no longer need to check for role permission changes on startup, because token permissions aren't affected
- move a few scope utilities from roles to scopes
- oauth allows specifying scopes, not just roles.
But these are still at the level specified in roles,
not fully-resolved scopes.
- more granular APIs for working with scopes and roles
- oauth clients can request a list of roles
- authorization will proceed with the _subset_ of those roles held by the user
- in the future, this subsetting will be refined to the scope level
defined with
c.JupyterHub.custom_scopes = {
'custom:scope': {'description': "text shown on oauth confirm"}
}
Allows injecting custom scopes to roles,
allowing extension of granular permissions to service-defined custom scopes.
Custom scopes:
- MUST start with `custom:`
- MUST only contain ascii lowercase, numbers, colon, hyphen, asterisk, underscore
- MUST define a `description`
- MAY also define `subscopes` list(s), each of which must also be explicitly defined
HubAuth can be used to retrieve and check for custom scopes to authorize requests.
When debugging errors and outages, looking at the logs emitted by
JupyterHub is very helpful. This document tries to document some common
log messages, and what they mean.
I currently added just one log message, but we can add more
over time.
Ref https://github.com/2i2c-org/infrastructure/issues/1081
where this would've been useful troubleshooting
Avoids leaving stale state when re-using a spawner that failed the last time it started
we keep failed spawners around to track their errors,
but we don't want to re-use them when it comes time to start a new launch.
adds User.get_spawner(server_name, replace_failed=True) to always get a non-failed Spawner
restores token field useful for javascript-originating API requests,
removed in 1.5 / 2.0 for security reasons because it was the wrong token.
This places the _user's_ token in PageConfig,
so it should have the right permissions.
requires jupyterlab_server 2.9, has no effect on earlier versions.
Useful for backend services that want to use the user's token.
Added `in_cookie` bool argument to exclude cookies (previous behavior),
since notebook servers do some things differently when auth is in query param or header vs cookies
We don't do it correctly, so don't try by default
It does work _sometimes_, but most of the time it does work, it's because it's a no-op.
Turning it off by default makes it more likely folks will see the caveat that it may not work.
When we run the proxy separately, defaults of `hub.bind_url` may be different from proxy's public url. Actually, the hub has no ways to know about which address the proxy is serving at if we do not configure its `bind_url` explicitly.
- tests
- docs
- ensure all group APIs are rejected when auth is in control
- use 'groups' field in return value of authenticate/refresh_user, instead of defining new method
- log group changes in sync_groups
- Added hook function stub to authenticator base class
- Added new config option `manage_groups` to base `Authenticator` class
- Call authenticator hook from `refresh_auth`-function in `Base` handler class
- Added example
Some non-api spawn and redirect checks still had `self or admin`,
when they should have checked directly for the appropriate permissions
This removes the long-deprecated redirect from `/user/other` -> `/user/self` _if_ the other server is not running.
The result is a more consistent behavior whether the requested server is running or not,
and whether the user has _access_ to the running server or not.
wee care about what the browser sees, so trust the outermost entry instead of the innermost
This is not secure _in general_, in that these values can be spoofed by malicious proxies,
but for CORS and cookie purposes, we only care about what the browser sees,
however many hops there may be.
A malicious proxy in the chain here isn't a concern because what matters is the immediate
hop from the _browser_, not the immediate hop from the _server_.
This patch is related to the implementation of the
MultiAuthenticator in jupyterhub/oauthenticator#459
The issue will be triggered when using more than one local provider
or mixing with oauth providers.
With multiple providers the template generates a set of buttons to
choose from to continue the login process.
For OAuth, the user will be sent to the provider login page and
the redirect at the end will continue nicely the process.
Now for the tricky part: using a local provider (e.g. PAM), the
user will be redirected to the "same page" thus the same template
will be rendered but this time to show the username/password dialog.
This will trip the workflow because of the action URL coming from
the settings and not from the authenticator. Therefore when the button
is clicked, the user will come back to the original multiple choice page
rather than continue the login.
- update models with 2.0.0
- different scopes for oauth, api
shows model depends on permissions
- update text with more details about scopes
- fix outdated reference to local-system credentials
I got confused with a variable called `rolename` that was actually an orm.Role
casting types in a signature is confusing,
but now `role` input can be Role or name,
and in the body it will always be a Role that exists
Behavior is unchanged
This isn't the only or even main thing likely to raise here,
so don't blame it, which is confusing, especially in a message shown to users.
Log the full exception, and show a more opaque message to the user to avoid confusion
Cover different combinations of:
- existing assignments in db
- additive allowed_users/admin_users config
- strict users membership assignment in load_roles
If the user role was defined but did not specify a user membership list,
users granted access by the Authenticator would lose their status
Instead, do nothing on an undefined user membership list,
leaving any users with their existing default role assignment
favor HubOAuth, as that should really be the default for most services
- Remove some outdated 'new in' text
- Remove docs for some deprecated features (hub_users, hub_groups)
- more detail on what's required
do not allow token-based access to pages
Tokens are only accepted via Authorization header, which doesn't make sense to pass to pages,
so disallow it explicitly to avoid surprises
- move spec to _static/rest-api.yml, since the original yaml must be served
- copy javascript rendering code from FastAPI (uses swagger-ui)
- remove link to pet store, since there isn't a big enough difference to duplicate it
- remove bootprint rendering with node
since it means 'inheriting' the owner's permissions
'all' prompted the question 'all of what, exactly?'
Additionally, fix some NameErrors that should have been KeyErrors
some things raise standard TimeoutError, others may raise tornado gen.TimeoutError (gen.with_timeout)
For consistency, add AnyTimeoutError tuple to allow catching any timeout, no matter what kind
Where we were raising `TimeoutError`,
we should have been raising `asyncio.TimeoutError`.
The base TimeoutError is an OSError for ETIMEO, which is for system calls
404 is also used to identify that a particular resource
(like a kernel or terminal) is not present, maybe because
it is deleted. That comes from the notebook server, while
here we are responding from JupyterHub. Saying that the
user server they are trying to request the resource (kernel, etc)
from does not exist seems right.
Non-running user servers making requests is a fairly
common occurance - user servers get culled while their
browser tabs are left open. So we now have a background level
of 503s responses on the hub *all* the time, making it
very difficult to detect *real* 503s, which should ideally
be closely monitored and alerted on.
I *think* 404 is a more appropriate response, as the resource
(API) being requested is no longer present.
I closed this issue because it was labelled as a support question.
Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.
Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.
Thank you for being an active member of our community! :heart:
Please note that this repository is participating in a study into the sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-11.
Data collected will include the number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.
For more information, please visit
[our informational page](https://sustainable-open-science-and-software.github.io/) or download our [participant information sheet](https://sustainable-open-science-and-software.github.io/assets/PIS_sustainable_software.pdf).
[](https://github.com/jupyterhub/jupyterhub/actions)
If you believe you’ve found a security vulnerability in a Jupyter
project, please report it to security@ipython.org. If you prefer to
encrypt your security reports, you can use [this PGP public key](https://jupyter-notebook.readthedocs.io/en/stable/_downloads/1d303a645f2505a8fd283826fafc9908/ipython_security.asc).
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
JupyterHub also provides a REST API for administration of the Hub and users.
The documentation on `Using JupyterHub's REST API <../reference/rest.html>`_ provides
information on:
- what you can do with the API
- creating an API token
- adding API tokens to the config files
- making an API request programmatically using the requests library
- learning more about JupyterHub's API
The same JupyterHub API spec, as found here, is available in an interactive form
`here (on swagger's petstore) <https://petstore3.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/HEAD/docs/rest-api.yml#!/default>`__.
The `OpenAPI Initiative`_ (fka Swagger™) is a project used to describe
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
## Discourse (recommended)
We use [Discourse](https://discourse.jupyter.org) for online discussions and support questions.
You can ask questions here if you are a first-time contributor to the JupyterHub project.
Everyone in the Jupyter community is welcome to bring ideas and questions there.
We recommend that you first use our Discourse as all past and current discussions on it are archived and searchable. Thus, all discussions remain useful and accessible to the whole community.
## Gitter
We use [our Gitter channel](https://gitter.im/jupyterhub/jupyterhub) for online, real-time text chat; a place for more ephemeral discussions. When you're not on Discourse, you can stop here to have other discussions on the fly.
## Github Issues
[Github issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues) are used for most long-form project discussions, bug reports and feature requests.
- Issues related to a specific authenticator or spawner should be opened in the appropriate repository for the authenticator or spawner.
- If you are using a specific JupyterHub distribution (such as [Zero to JupyterHub on Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) or [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub/)), you should open issues directly in their repository.
- If you cannot find a repository to open your issue in, do not worry! Open the issue in the [main JupyterHub repository](https://github.com/jupyterhub/jupyterhub/) and our community will help you figure it out.
```{note}
Our community is distributed across the world in various timezones, so please be patient if you do not get a response immediately!
Documentation is often more important than code. This page helps
you get set up on how to contribute to JupyterHub's documentation.
## Building documentation locally
We use [sphinx](https://www.sphinx-doc.org) to build our documentation. It takes
our documentation source files (written in [markdown](https://daringfireball.net/projects/markdown/) or [reStructuredText](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) &
stored under the `docs/source` directory) and converts it into various
formats for people to read. To make sure the documentation you write or
change renders correctly, it is good practice to test it locally.
1. Make sure you have successfully completed {ref}`contributing/setup`.
2. Install the packages required to build the docs.
```bash
python3 -m pip install -r docs/requirements.txt
```
3. Build the html version of the docs. This is the most commonly used
output format, so verifying it renders correctly is usually good
enough.
```bash
cd docs
make html
```
This step will display any syntax or formatting errors in the documentation,
along with the filename / line number in which they occurred. Fix them,
and re-run the `make html` command to re-render the documentation.
4. View the rendered documentation by opening `_build/html/index.html` in
a web browser.
:::{tip}
**On Windows**, you can open a file from the terminal with `start <path-to-file>`.
**On macOS**, you can do the same with `open <path-to-file>`.
**On Linux**, you can do the same with `xdg-open <path-to-file>`.
After opening index.html in your browser you can just refresh the page whenever
you rebuild the docs via `make html`
:::
(contributing-docs-conventions)=
## Documentation conventions
This section lists various conventions we use in our documentation. This is a
living document that grows over time, so feel free to add to it / change it!
Our entire documentation does not yet fully conform to these conventions yet,
so help in making it so would be appreciated!
### `pip` invocation
There are many ways to invoke a `pip` command, we recommend the following
approach:
```bash
python3 -m pip
```
This invokes pip explicitly using the python3 binary that you are
currently using. This is the **recommended way** to invoke pip
in our documentation, since it is least likely to cause problems
with python3 and pip being from different environments.
For more information on how to invoke `pip` commands, see
We want you to contribute to JupyterHub in ways that are most exciting
and useful to you. We value documentation, testing, bug reporting & code equally,
and are glad to have your contributions in whatever form you wish.
Be sure to first check our [Code of Conduct](https://github.com/jupyter/governance/blob/HEAD/conduct/code_of_conduct.md)
([reporting guidelines](https://github.com/jupyter/governance/blob/HEAD/conduct/reporting_online.md)), which help keep our community welcoming to as many people as possible.
This section covers information about our community, as well as ways that you can connect and get involved.
for development & collaboration. You need to [install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to work on
JupyterHub. We also recommend getting a free account on GitHub.com.
## Setting up a development install
When developing JupyterHub, you would need to make changes and be able to instantly view the results of the changes. To achieve that, a developer install is required.
:::{note}
This guide does not attempt to dictate _how_ development
environments should be isolated since that is a personal preference and can
be achieved in many ways, for example, `tox`, `conda`, `docker`, etc. See this
[forum thread](https://discourse.jupyter.org/t/thoughts-on-using-tox/3497) for
a more detailed discussion.
:::
1. Clone the [JupyterHub git repository](https://github.com/jupyterhub/jupyterhub)
- `admin_user`: creates a new temporary admin user
- single user servers
\- `cleanup_after`: allows cleanup of single user servers between tests
- mocked service
\- `MockServiceSpawner`: a spawner that mocks services for testing with a short poll interval
\- `` mockservice` ``: mocked service with no external service url
\- `mockservice_url`: mocked service with a url to test external services
And fixtures to add functionality or spawning behavior:
- `admin_access`: grants admin access
- `` no_patience` ``: sets slow-spawning timeouts to zero
- `slow_spawn`: enables the SlowSpawner (a spawner that takes a few seconds to start)
- `never_spawn`: enables the NeverSpawner (a spawner that will never start)
- `bad_spawn`: enables the BadSpawner (a spawner that fails immediately)
- `slow_bad_spawn`: enables the SlowBadSpawner (a spawner that fails after a short delay)
Refer to the [pytest fixtures documentation](https://pytest.readthedocs.io/en/latest/fixture.html) to learn how to use fixtures that exists already and to create new ones.
### The Pytest-Asyncio Plugin
When testing the various JupyterHub components and their various implementations, it sometimes becomes necessary to have a running instance of JupyterHub to test against.
The [`app`](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/conftest.py#L60) fixture mocks a JupyterHub application for use in testing by:
- enabling ssl if internal certificates are available
- creating an instance of [MockHub](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/mocking.py#L221) using any provided configurations as arguments
- initializing the mocked instance
- starting the mocked instance
- finally, a registered finalizer function performs a cleanup and stops the mocked instance
The JupyterHub test suite uses the [pytest-asyncio plugin](https://pytest-asyncio.readthedocs.io/en/latest/) that handles [event-loop](https://docs.python.org/3/library/asyncio-eventloop.html) integration in [Tornado](https://www.tornadoweb.org/en/stable/) applications. This allows for the use of top-level awaits when calling async functions or [fixtures](https://docs.pytest.org/en/6.2.x/fixture.html#what-fixtures-are) during testing. All test functions and fixtures labelled as `async` will run on the same event loop.
```{note}
With the introduction of [top-level awaits](https://piccolo-orm.com/blog/top-level-await-in-python/), the use of the `io_loop` fixture of the [pytest-tornado plugin](https://www.tornadoweb.org/en/stable/ioloop.html) is no longer necessary. It was initially used to call coroutines. With the upgrades made to `pytest-asyncio`, this usage is now deprecated. It is now, only utilized within the JupyterHub test suite to ensure complete cleanup of resources used during testing such as open file descriptors. This is demonstrated in this [pull request](https://github.com/jupyterhub/jupyterhub/pull/4332).
More information is provided below.
```
One of the general goals of the [JupyterHub Pytest Plugin project](https://github.com/jupyterhub/pytest-jupyterhub) is to ensure the MockHub cleanup fully closes and stops all utilized resources during testing so the use of the `io_loop` fixture for teardown is not necessary. This was highlighted in this [issue](https://github.com/jupyterhub/pytest-jupyterhub/issues/30)
For more information on asyncio and event-loops, here are some resources:
- **Read**: [Introduction to the Python event loop](https://www.pythontutorial.net/python-concurrency/python-event-loop)
- **Read**: [Overview of Async IO in Python 3.7](https://stackabuse.com/overview-of-async-io-in-python-3-7)
- **Watch**: [Asyncio: Understanding Async / Await in Python](https://www.youtube.com/watch?v=bs9tlDFWWdQ)
- **Watch**: [Learn Python's AsyncIO #2 - The Event Loop](https://www.youtube.com/watch?v=E7Yn5biBZ58)
## Troubleshooting Test Failures
### All the tests are failing
Make sure you have completed all the steps in {ref}`contributing/setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
## Code formatting and linting
JupyterHub automatically enforces code formatting. This means that pull requests
with changes breaking this formatting will receive a commit from pre-commit.ci
automatically.
To automatically format code locally, you can install pre-commit and register a
_git hook_ to automatically check with pre-commit before you make a commit if
the formatting is okay.
```bash
pip install pre-commit
pre-commit install --install-hooks
```
To run pre-commit manually you would do:
```bash
# check for changes to code not yet committed
pre-commit run
# check for changes also in already committed code
pre-commit run --all-files
```
You may also install [black integration](https://github.com/psf/black#editor-integration)
into your text editor to format code automatically.
JupyterHub can be configured to record structured events from a running server using Jupyter's `Telemetry System`_. The types of events that JupyterHub emits are defined by `JSON schemas`_ listed at the bottom of this page_.
| Requests too high | Unnecessarily high cost and/or low capacity. |
| CPU limit too low | Poor performance experienced by users |
| CPU oversubscribed (too-low request + too-high limit) | Poor performance across the system; may crash, if severe |
| Memory limit too low | Servers killed by Out-of-Memory Killer (OOM); lost work for users |
| Memory oversubscribed (too-low request + too-high limit) | System memory exhaustion - all kinds of hangs and crashes and weird errors. Very bad. |
Note that the 'oversubscribed' problem case is where the request is lower than _typical_ usage,
meaning that the total reserved resources isn't enough for the total _actual_ consumption.
This doesn't mean that _all_ your users exceed the request,
just that the _limit_ gives enough room for the _average_ user to exceed the request.
All of these considerations are important _per node_.
Larger nodes means more users per node, and therefore more users to average over.
It also means more chances for multiple outliers on the same node.
### Example case for oversubscribing memory
Take for example, this system and sampling of user behavior:
- System memory = 8G
- memory request = 1G, limit = 3G
- typical 'heavy' user: 2G
- typical 'light' user: 0.5G
This will assign 8 users to those 8G of RAM (remember: only requests are used for deciding when a machine is 'full').
As long as the total of 8 users _actual_ usage is under 8G, everything is fine.
But the _limit_ allows a total of 24G to be used,
which would be a mess if everyone used their full limit.
But _not_ everyone uses the full limit, which is the point!
This pattern is fine if 1/8 of your users are 'heavy' because _typical_ usage will be ~0.7G,
and your total usage will be ~5G (`1 × 2 + 7 × 0.5 = 5.5`).
But if _50%_ of your users are 'heavy' you have a problem because that means your users will be trying to use 10G (`4 × 2 + 4 × 0.5 = 10`),
which you don't have.
You can make guesses at these numbers, but the only _real_ way to get them is to measure (see [](measuring)).
### CPU:memory ratio
Most of the time, you'll find that only one resource is the limiting factor for your users.
Most often it's memory, but for certain tasks, it could be CPU (or even GPUs).
Many cloud deployments have just one or a few fixed ratios of cpu to memory
(e.g. 'general purpose', 'high memory', and 'high cpu').
Setting your secondary resource allocation according to this ratio
after selecting the more important limit results in a balanced resource allocation.
For instance, some of Google Cloud's ratios are:
| node type | GB RAM / CPU core |
| ----------- | ----------------- |
| n2-highmem | 8 |
| n2-standard | 4 |
| n2-highcpu | 1 |
(idleness)=
### Idleness
Jupyter being an interactive tool means people tend to spend a lot more time reading and thinking than actually running resource-intensive code.
This significantly affects how much _cpu_ resources a typical active user needs,
but often does not significantly affect the _memory_.
Ways to think about this:
- More idle users means unused CPU.
This generally means setting your CPU _limit_ higher than your CPU _request_.
- What do your users do when they _are_ running code?
Is it typically single-threaded local computation in a notebook?
If so, there's little reason to set a limit higher than 1 CPU core.
- Do typical computations take a long time, or just a few seconds?
Longer typical computations means it's more likely for users to be trying to use the CPU at the same moment,
suggesting a higher _request_.
- Even with idle users, parallel computation adds up quickly - one user fully loading 4 cores and 3 using almost nothing still averages to more than a full CPU core per user.
Again, using mybinder.org as an example—we run around 100 users on 8-core nodes,
and still see fairly _low_ overall CPU usage on each user node.
The limit here is actually Kubernetes' pods per node, not memory _or_ CPU.
This is likely a extreme case, as many Binder users come from clicking links on webpages
without any actual intention of running code.
```{figure} /images/mybinder-load5.png
mybinder.org node CPU usage is low with 50-150 users sharing just 8 cores
```
### Concurrent users and culling idle servers
Related to [][idleness], all of these resource consumptions and limits are calculated based on **concurrently active users**,
not total users.
You might have 10,000 users of your JupyterHub deployment, but only 100 of them running at any given time.
That 100 is the main number you need to use for your capacity planning.
JupyterHub costs scale very little based on the number of _total_ users,
up to a point.
There are two important definitions for **active user**:
- Are they _actually_ there (i.e. a human interacting with Jupyter, or running code that might be )
- Is their server running (this is where resource reservations and limits are actually applied)
Connecting those two definitions (how long are servers running if their humans aren't using them) is an important area of deployment configuration, usually implemented via the [JupyterHub idle culler service][idle-culler].
There are a lot of considerations when it comes to culling idle users that will depend:
- How much does it save me to shut down user servers? (e.g. keeping an elastic cluster small, or keeping a fixed-size deployment available to active users)
- How much does it cost my users to have their servers shut down? (e.g. lost work if shutdown prematurely)
- How easy do I want it to be for users to keep their servers running? (e.g. Do they want to run unattended simulations overnight? Do you want them to?)
Like many other things in this guide, there are many correct answers leading to different configuration choices.
For more detail on culling configuration and considerations, consult the [JupyterHub idle culler documentation][idle-culler].
## More tips
### Start strict and generous, then measure
A good tip, in general, is to give your users as much resources as you can afford that you think they _might_ use.
Then, use resource usage metrics like prometheus to analyze what your users _actually_ need,
and tune accordingly.
Remember: **Limits affect your user experience and stability. Requests mostly affect your costs**.
For example, a sensible starting point (lacking any other information) might be:
```yaml
request:
cpu: 0.5
mem: 2G
limit:
cpu: 1
mem: 2G
```
(more memory if significant computations are likely - machine learning models, data analysis, etc.)
Some actions
- If you see out-of-memory killer events, increase the limit (or talk to your users!)
- If you see typical memory well below your limit, reduce the request (but not the limit)
- If _nobody_ uses that much memory, reduce your limit
- If CPU is your limiting scheduling factor and your CPUs are mostly idle,
reduce the cpu request (maybe even to 0!).
- If CPU usage continues to be low, increase the limit to 2 or 4 to allow bursts of parallel execution.
(measuring)=
### Measuring user resource consumption
It is _highly_ recommended to deploy monitoring services such as [Prometheus][]
and [Grafana][] to get a view of your users' resource usage.
This is the only way to truly know what your users need.
JupyterHub has some experimental [grafana dashboards][] you can use as a starting point,
to keep an eye on your resource usage.
Here are some sample charts from (again from mybinder.org),
showing >90% of users using less than 10% CPU and 200MB,
but a few outliers near the limit of 1 CPU and 2GB of RAM.
This is the kind of information you can use to tune your requests and limits.

JupyterHub uses a database to store information about users, services, and other data needed for operating the Hub.
This is the **state** of the Hub.
## Why does JupyterHub have a database?
JupyterHub is a **stateful** application (more on that 'state' later).
Updating JupyterHub's configuration or upgrading the version of JupyterHub requires restarting the JupyterHub process to apply the changes.
We want to minimize the disruption caused by restarting the Hub process, so it can be a mundane, frequent, routine activity.
Storing state information outside the process for later retrieval is necessary for this, and one of the main thing databases are for.
A lot of the operations in JupyterHub are also **relationships**, which is exactly what SQL databases are great at.
For example:
- Given an API token, what user is making the request?
- Which users don't have running servers?
- Which servers belong to user X?
- Which users have not been active in the last 24 hours?
Finally, a database allows us to have more information stored without needing it all loaded in memory,
e.g. supporting a large number (several thousands) of inactive users.
## What's in the database?
The short answer of what's in the JupyterHub database is "everything."
JupyterHub's **state** lives in the database.
That is, everything JupyterHub needs to be aware of to function that _doesn't_ come from the configuration files, such as
- users, roles, role assignments
- state, urls of running servers
- Hashed API tokens
- Short-lived state related to OAuth flow
- Timestamps for when users, tokens, and servers were last used
### What's _not_ in the database
Not _quite_ all of JupyterHub's state is in the database.
This mostly involves transient state, such as the 'pending' transitions of Spawners (starting, stopping, etc.).
Anything not in the database must be reconstructed on Hub restart, and the only sources of information to do that are the database and JupyterHub configuration file(s).
## How does JupyterHub use the database?
JupyterHub makes some _unusual_ choices in how it connects to the database.
These choices represent trade-offs favoring single-process simplicity and performance at the expense of horizontal scalability (multiple Hub instances).
We often say that the Hub 'owns' the database.
This ownership means that we assume the Hub is the only process that will talk to the database.
This assumption enables us to make several caching optimizations that dramatically improve JupyterHub's performance (i.e. data written recently to the database can be read from memory instead of fetched again from the database) that would not work if multiple processes could be interacting with the database at the same time.
Database operations are also synchronous, so while JupyterHub is waiting on a database operation, it cannot respond to other requests.
This allows us to avoid complex locking mechanisms, because transaction races can only occur during an `await`, so we only need to make sure we've completed any given transaction before the next `await` in a given request.
:::{note}
We are slowly working to remove these assumptions, and moving to a more traditional db session per-request pattern.
This will enable multiple Hub instances and enable scaling JupyterHub, but will significantly reduce the number of active users a single Hub instance can serve.
:::
### Database performance in a typical request
Most authenticated requests to JupyterHub involve a few database transactions:
1. look up the authenticated user (e.g. look up token by hash, then resolve owner and permissions)
2. record activity
3. perform any relevant changes involved in processing the request (e.g. create the records for a running server when starting one)
This means that the database is involved in almost every request, but only in quite small, simple queries, e.g.:
- lookup one token by hash
- lookup one user by name
- list tokens or servers for one user (typically 1-10)
- etc.
### The database as a limiting factor
As a result of the above transactions in most requests, database performance is the _leading_ factor in JupyterHub's baseline requests-per-second performance, but that cost does not scale significantly with the number of users, active or otherwise.
However, the database is _rarely_ a limiting factor in JupyterHub performance in a practical sense, because the main thing JupyterHub does is start, stop, and monitor whole servers, which take far more time than any small database transaction, no matter how many records you have or how slow your database is (within reason).
Additionally, there is usually _very_ little load on the database itself.
By far the most taxing activity on the database is the 'list all users' endpoint, primarily used by the [idle-culling service](https://github.com/jupyterhub/jupyterhub-idle-culler).
Database-based optimizations have been added to make even these operations feasible for large numbers of users:
1. State filtering on [GET /hub/api/users?state=active](../reference/rest-api.html#/default/get_users){.external},
which limits the number of results in the query to only the relevant subset (added in JupyterHub 1.3), rather than all users.
2. [Pagination](api-pagination) of all list endpoints, allowing the request of a large number of resources to be more fairly balanced with other Hub activities across multiple requests (added in 2.0).
:::{note}
It's important to note when discussing performance and limiting factors and that all of this only applies to requests to `/hub/...`.
The Hub and its database are not involved in most requests to single-user servers (`/user/...`), which is by design, and largely motivated by the fact that the Hub itself doesn't _need_ to be fast because its operations are infrequent and large.
:::
## Database backends
JupyterHub supports a variety of database backends via [SQLAlchemy][].
The default is sqlite, which works great for many cases, but you should be able to use many backends supported by SQLAlchemy.
Usually, this will mean PostgreSQL or MySQL, both of which are officially supported and well tested with JupyterHub, but others may work as well.
See [SQLAlchemy's docs][sqlalchemy-dialect] for how to connect to different database backends.
Doing so generally involves:
1. installing a Python package that provides a client implementation, and
2. setting [](JupyterHub.db_url) to connect to your database with the specified implementation
The default database backend for JupyterHub is [SQLite](https://sqlite.org).
We have chosen SQLite as JupyterHub's default because it's simple (the 'database' is a single file) and ubiquitous (it is in the Python standard library).
It works very well for testing, small deployments, and workshops.
For production systems, SQLite has some disadvantages when used with JupyterHub:
-`upgrade-db` may not always work, and you may need to start with a fresh database
-`downgrade-db`**will not** work if you want to rollback to an earlier
version, so backup the `jupyterhub.sqlite` file before upgrading (JupyterHub automatically creates a date-stamped backup file when upgrading sqlite)
The sqlite documentation provides a helpful page about [when to use SQLite and
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
### Picking your database backend (PostgreSQL, MySQL)
When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement, which is used in some database upgrade steps.
In general, you select your database backend with [](JupyterHub.db_url), and can further configure it (usually not necessary) with [](JupyterHub.db_kwargs).
## Notes and Tips
### SQLite
The SQLite database should not be used on NFS. SQLite uses reader/writer locks
to control access to the database. This locking mechanism might not work
correctly if the database file is kept on an NFS filesystem. This is because
`fcntl()` file locking is broken on many NFS implementations. Therefore, you
should avoid putting SQLite database files on NFS since it will not handle well
multiple processes which might try to access the file at the same time.
### PostgreSQL
We recommend using PostgreSQL for production if you are unsure whether to use
MySQL or PostgreSQL or if you do not have a strong preference.
There is additional configuration required for MySQL that is not needed for PostgreSQL.
For example, to connect to a postgres database with psycopg2:
1. install psycopg2: `pip instal psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
- You should probably use the `pymysql` or `mysqlclient` sqlalchemy provider, or another backend [recommended by sqlalchemy](https://docs.sqlalchemy.org/en/20/dialects/mysql.html#dialect-mysql)
- You also need to set `pool_recycle` to some value (typically 60 - 300, JupyterHub will default to 60)
which depends on your MySQL setup. This is necessary since MySQL kills
connections serverside if they've been idle for a while, and the connection
from the hub will be idle for longer than most connections. This behavior
will lead to frustrating 'the connection has gone away' errors from
sqlalchemy if `pool_recycle` is not set.
- If you use `utf8mb4` collation with MySQL earlier than 5.7.7 or MariaDB
earlier than 10.2.1 you may get an `1709, Index column size too large` error.
To fix this you need to set `innodb_large_prefix` to enabled and
`innodb_file_format` to `Barracuda` to allow for the index sizes jupyterhub
uses. `row_format` will be set to `DYNAMIC` as long as those options are set
correctly. Later versions of MariaDB and MySQL should set these values by
default, as well as have a default `DYNAMIC` `row_format` and pose no trouble
to users.
For example, to connect to a mysql database with mysqlclient:
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
JupyterHub uses OAuth 2 internally as a mechanism for authenticating users.
JupyterHub uses [OAuth 2](https://oauth.net/2/) as an internal mechanism for authenticating users.
As such, JupyterHub itself always functions as an OAuth **provider**.
More on what that means [below](oauth-terms).
You can find out more about what that means [below](oauth-terms).
Additionally, JupyterHub is _often_ deployed with [oauthenticator](https://oauthenticator.readthedocs.io),
Additionally, JupyterHub is _often_ deployed with [OAuthenticator](https://oauthenticator.readthedocs.io),
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
When this is the case, there are _two_ nested oauth flows:
an _internal_oauth flow where JupyterHub is the **provider**,
and and_external_oauth flow, where JupyterHub is a**client**.
When this is the case, there are _two_ nested OAuth flows:
an _internal_OAuth flow where JupyterHub is the **provider**,
and an _external_OAuth flow, where JupyterHub is the**client**.
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
Some relevant points:
The following points are noteworthy:
- Single-user servers _never_ need to communicate with or be aware of the upstream provider configured in your Authenticator.
As far as they are concerned, only JupyterHub is an OAuth provider,
As far as the servers are concerned, only JupyterHub is an OAuth provider,
and how users authenticate with the Hub itself is irrelevant.
- When talking to a single-user server,
- When interacting with a single-user server,
there are ~always two tokens:
a token issued to the server itself to communicate with the Hub API,
and a second per-user token in the browser to represent the completed login process and authorized permissions.
first, a token issued to the server itself to communicate with the Hub API,
and second, a per-user token in the browser to represent the completed login process and authorized permissions.
More on this [later](two-tokens).
(oauth-terms)=
@@ -28,66 +28,66 @@ Some relevant points:
## Key OAuth terms
Here are some key definitions to keep in mind when we are talking about OAuth.
You can also read more detail [here](https://www.oauth.com/oauth2-servers/definitions/).
You can also read more in detail [here](https://www.oauth.com/oauth2-servers/definitions/).
- **provider** the entity responsible for managing identity and authorization,
- **provider**: The entity responsible for managing identity and authorization;
always a web server.
JupyterHub is _always_ an oauth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an oauth provider.
- **client** An entity that requests OAuth **tokens** on a user's behalf,
JupyterHub is _always_ an OAuth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an OAuth provider.
- **client**: An entity that requests OAuth **tokens** on a user's behalf;
generally a web server of some kind.
OAuth **clients** are services that _delegate_ authentication and/or authorization
to an OAuth **provider**.
JupyterHub _services_ or single-user _servers_ are OAuth **clients** of the JupyterHub **provider**.
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external oauth **provider**, e.g. GitHub.
- **browser** A user's web browser, which makes requests and stores things like cookies
- **token** The secret value used to represent a user's authorization. This is the final product of the OAuth process.
- **code** A short-lived temporary secret that the **client** exchanges
for a **token** at the conclusion of oauth,
in what's generally called the "oauth callback handler."
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external OAuth **provider**, e.g. GitHub.
- **browser**: A user's web browser, which makes requests and stores things like cookies.
- **token**: The secret value used to represent a user's authorization. This is the final product of the OAuth process.
- **code**: A short-lived temporary secret that the **client** exchanges
for a **token** at the conclusion of OAuth,
in what's generally called the "OAuth callback handler."
## One oauth flow
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorized access to a service or single-user server.
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorizing access to a service or single-user server.
A single oauth flow generally goes like this:
A single OAuth flow typically goes like this:
### OAuth request and redirect
1. A **browser** makes an HTTP request to an oauth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the oauth **provider** with some extra information:
- the oauth **client id** of the client itself
- the **redirect uri** to be redirected back to after completion
1. A **browser** makes an HTTP request to an OAuth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the OAuth **provider** with some extra information:
- the OAuth **client ID** of the client itself.
- the **redirect URI** to be redirected back to after completion.
- the **scopes** requested, which the user should be presented with to confirm.
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
3. During this authorize step,
the browser must be _authenticated_ with the provider.
This is often already stored in a cookie,
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
This _may_ even begin another oauth flow!
This _may_ even begin another OAuth flow!
4. After the user tells the provider that they want to proceed with the authorization,
the provider records this authorization in a short-lived record called an **oauth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect uri"
(or "oauth callback uri"),
with the oauth code in a url parameter.
the provider records this authorization in a short-lived record called an **OAuth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect URI"
(or "OAuth callback URI"),
with the OAuth code in a URL parameter.
That's the end of the requests made between the **browser** and the **provider**.
That marks the end of the requests made between the **browser** and the **provider**.
### State after redirect
At this point:
- The browser is authenticated with the _provider_
- The user's authorized permissions are recorded in an _oauth code_
- The _provider_ knows that the given oauth client's requested permissions have been granted, but the client doesn't know this yet.
- All requests so far have been made directly by the browser.
No requests have originated at the client or provider.
- The browser is authenticated with the _provider_.
- The user's authorized permissions are recorded in an _OAuth code_.
- The _provider_ knows that the permissions requested by the OAuth client have been granted, but the client doesn't know this yet.
- All the requests so far have been made directly by the browser.
No requests have originated from the client or provider.
### OAuth Client Handles Callback Request
Now we get to finish the OAuth process.
Let's dig into what the oauth client does when it handles
the oauth callback request with the
At this stage, we get to finish the OAuth process.
Let's dig into what the OAuth client does when it handles
the OAuth callback request.
- The OAuth client receives the _code_ and makes an API request to the _provider_ to exchange the code for a real _token_.
This is the first direct request between the OAuth _client_ and the _provider_.
@@ -95,12 +95,12 @@ the oauth callback request with the
makes a second API request to the _provider_
to retrieve information about the owner of the token (the user).
This is the step where behavior diverges for different OAuth providers.
Up to this point, all oauth providers are the same, following the oauth specification.
However, oauth does not define a standard for exchanging tokens for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
Up to this point, all OAuth providers are the same, following the OAuth specification.
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
so this step may be different for each OAuth provider.
- Finally, the oauth client stores its own record that the user is authorized in a cookie.
- Finally, the OAuth client stores its own record that the user is authorized in a cookie.
This could be the token itself, or any other appropriate representation of successful authentication.
- Last of all, now that credentials have been established,
- Now that credentials have been established,
the browser can be redirected to the _original_ URL where it started,
to try the request again.
If the client wasn't able to keep track of the original URL all this time
@@ -113,24 +113,24 @@ So that's _one_ OAuth process.
## Full sequence of OAuth in JupyterHub
Let's go through the above oauth process in JupyterHub,
with specific examples of each HTTP request and what information is contained.
For bonus points, we are using the double-oauth example of JupyterHub configured with GitHubOAuthenticator.
Let's go through the above OAuth process in JupyterHub,
with specific examples of each HTTP request and what information it contains.
For bonus points, we are using the double-OAuth example of JupyterHub configured with GitHubOAuthenticator.
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal oauth,"
and the one with JupyterHub as a **client** "external oauth."
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal OAuth,"
and the one with JupyterHub as a **client** "external OAuth."
Our starting point:
- a user's single-user server is running. Let's call them `danez`
- jupyterhub is running with GitHub as an oauth provider (this means two full instances of oauth),
- Danez has a fresh browser session with no cookies yet
- Jupyterhub is running with GitHub as an OAuth provider (this means two full instances of OAuth),
- Danez has a fresh browser session with no cookies yet.
First request:
- browser->single-user server running JupyterLab or Jupyter Classic
- `GET /user/danez/notebooks/mynotebook.ipynb`
- no credentials, so single-user server (as an oauth **client**) starts internal oauth process with JupyterHub (the **provider**)
- no credentials, so single-user server (as an OAuth **client**) starts internal OAuth process with JupyterHub (the **provider**)
When a user logs into JupyterHub, they get a 'server', which we usually call the **single-user server**, because it's a server that's meant for a single JupyterHub user.
Each JupyterHub user gets a different one (or more than one!).
A single-user server is a process running somewhere that is:
1. accessible over http[s],
2. authenticated via JupyterHub using OAuth 2.0,
3. started by a [Spawner](spawners), and
4. 'owned' by a single JupyterHub user
## The single-user server command
The Spawner's default single-user server startup command, `jupyterhub-singleuser`, launches `jupyter-server`, the same program used when you run `jupyter lab` on your laptop.
(_It can also launch the legacy `jupyter-notebook` server_).
That's why JupyterHub looks familiar to folks who are already using Jupyter at home or elsewhere.
It's the same!
`jupyterhub-singleuser`_customizes_ that program to change (approximately) one thing: **authenticate requests with JupyterHub**.
(singleuser-auth)=
## Single-user server authentication
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services`
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
This is primarily implemented in the {class}`~.HubOAuth` class.
This code resides in `jupyterhub.singleuser` subpackage of JupyterHub.
The main task of this code is to:
1. resolve a JupyterHub token to a JupyterHub user (authenticate)
2. check permissions (`access:servers`) for the token to make sure the request should be allowed (authorize)
3. if not authorized, begin the OAuth process with a redirect to the Hub
4. after login, store OAuth tokens in a cookie only used by this single-user server
5. implement logout to clear the cookie
Most of this is implemented in the {class}`~.HubOAuth` class. `jupyterhub.singleuser` is responsible for _adapting_ the base Jupyter Server to use HubOAuth for these tasks.
### JupyterHub authentication extension
By default, `jupyter-server` uses its own cookie to authenticate.
If that cookie is not present, the server redirects you a login page and asks you to enter a password or token.
Jupyter Server 2.0 introduces two new _APIs_ for customizing authentication: the [IdentityProvider](inv:jupyter-server#jupyter_server.auth.IdentityProvider) and the [Authorizer](inv:jupyter-server#jupyter_server.auth.Authorizer).
More information can be found in the [Jupyter Server documentation](https://jupyter-server.readthedocs.io).
JupyterHub implements these APIs in `jupyterhub.singleuser.extension`.
The IdentityProvider is responsible for _authenticating_ requests.
In JupyterHub, that means extracting OAuth tokens from the request and resolving them to a JupyterHub user.
The Authorizer is a _separate_ API for _authorizing_ actions on particular resources.
Because the JupyterHub IdentityProvider only allows _authenticating_ users who already have the necessary `access:servers` permission to access the server, the default Authorizer only contains a redundant check for this same permission, and ignores the resource inputs.
However, specifying a _custom_ Authorizer allows for granular permissions, such as read-only access to subsets of a shared server.
### JupyterHub authentication via subclass
Prior to Jupyter Server 2 (i.e. Jupyter Server 1.x or the legacy `jupyter-notebook` server), JupyterHub authentication is applied via _subclass_.
Originally a subclass of `NotebookApp`,
this approach works with both `jupyter-server` and `jupyter-notebook`.
Instead of using the extension mechanisms above,
the server application is _subclassed_. This worked well in the `jupyter-notebook` days,
but doesn't fit well with Jupyter Server's extension-based architecture.
Using the JupyterHub singleuser-server extension is the default behavior of JupyterHub 4 and Jupyter Server 2, otherwise the subclass approach is taken.
You can opt-out of the extension by setting the environment variable `JUPYTERHUB_SINGLEUSER_EXTENSION=0`:
```python
c.Spawner.environment.update(
{
"JUPYTERHUB_SINGLEUSER_EXTENSION":"0",
}
)
```
The subclass approach will also be taken if you've opted to use the classic notebook server with:
```
JUPYTERHUB_SINGLEUSER_APP=notebook
```
which was introduced in JupyterHub 2.
## Other customizations
`jupyterhub-singleuser` makes other small customizations to how the single-user server behaves:
1. logs activity on the single-user server, used in [idle-culling](https://github.com/jupyterhub/jupyterhub-idle-culler).
2. disables some features that don't make sense in JupyterHub (trash, retrying ports)
3. loading options such as URLs and SSL configuration from the environment
4. customize logging for consistency with JupyterHub logs
## Running a single-user server that's not `jupyterhub-singleuser`
By default, `jupyterhub-singleuser` is the same `jupyter-server` used by JupyterLab, Jupyter notebook (>= 7), etc.
But technically, all JupyterHub cares about is that it is:
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services`.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
The **Security Overview** section helps you learn about:
- the design of JupyterHub with respect to web security
- the semi-trusted user
- the available mitigations to protect untrusted users from each other
- the value of periodic security audits
This overview also helps you obtain a deeper understanding of how JupyterHub
works.
## Semi-trusted and untrusted users
JupyterHub is designed to be a _simple multi-user server for modestly sized
groups_ of **semi-trusted** users. While the design reflects serving
semi-trusted users, JupyterHub can also be suitable for serving **untrusted** users,
but **is not suitable for untrusted users** in its default configuration.
As a result, using JupyterHub with **untrusted** users means more work by the
administrator, since much care is required to secure a Hub, with extra caution on
protecting users from each other.
One aspect of JupyterHub's _design simplicity_ for **semi-trusted** users is that
the Hub and single-user servers are placed in a _single domain_, behind a
[_proxy_][configurable-http-proxy]. If the Hub is serving untrusted
users, many of the web's cross-site protections are not applied between
single-user servers and the Hub, or between single-user servers and each
other, since browsers see the whole thing (proxy, Hub, and single user
servers) as a single website (i.e. single domain).
## Protect users from each other
To protect users from each other, a user must **never** be able to write arbitrary
HTML and serve it to another user on the Hub's domain. This is prevented by JupyterHub's
authentication setup because only the owner of a given single-user notebook server is
allowed to view user-authored pages served by the given single-user notebook
server.
To protect all users from each other, JupyterHub administrators must
ensure that:
- A user **does not have permission** to modify their single-user notebook server,
including:
- the installation of new packages in the Python environment that runs
their single-user server;
- the creation of new files in any `PATH` directory that precedes the
directory containing `jupyterhub-singleuser` (if the `PATH` is used
to resolve the single-user executable instead of using an absolute path);
- the modification of environment variables (e.g. PATH, PYTHONPATH) for
their single-user server;
- the modification of the configuration of the notebook server
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
- unrestricted selection of the base environment (e.g. the image used in container-based Spawners)
If any additional services are run on the same domain as the Hub, the services
**must never** display user-authored HTML that is neither _sanitized_ nor _sandboxed_
to any user that lacks authentication as the author of a file.
### Sharing access to servers
Because sharing access to servers (via `access:servers` scopes or the sharing feature in JupyterHub 5) by definition means users can serve each other files, enabling sharing is not suitable for untrusted users without also enabling per-user domains.
JupyterHub does not enable any sharing by default.
## Mitigate security issues
The several approaches to mitigating security issues with configuration
options provided by JupyterHub include:
### Enable user subdomains
JupyterHub provides the ability to run single-user servers on their own
domains. This means the cross-origin protections between servers has the
desired effect, and user servers and the Hub are protected from each other.
**Subdomains are the only way to reliably isolate user servers from each other.**
When subdomains are enabled, each user's single-user server will be at e.g. `https://username.jupyter.example.org`.
This also requires all user subdomains to point to the same address,
which is most easily accomplished with wildcard DNS, where a single A record points to your server and a wildcard CNAME record points to your A record:
```
A jupyter.example.org 192.168.1.123
CNAME *.jupyter.example.org jupyter.example.org
```
Since this spreads the service across multiple domains, you will likely need wildcard SSL as well,
matching `*.jupyter.example.org`.
Unfortunately, for many institutional domains, wildcard DNS and SSL may not be available.
We also **strongly encourage** serving JupyterHub and user content on a domain that is _not_ a subdomain of any sensitive content.
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/2013-04-09-yummy-cookies-across-domains/).
**If you do plan to serve untrusted users, enabling subdomains is highly encouraged**,
as it resolves many security issues, which are difficult to unavoidable when JupyterHub is on a single-domain.
:::{important}
JupyterHub makes no guarantees about protecting users from each other unless subdomains are enabled.
If you want to protect users from each other, you **_must_** enable per-user domains.
:::
### Disable user config
If subdomains are unavailable or undesirable, JupyterHub provides a
configuration option `Spawner.disable_user_config = True`, which can be set to prevent
the user-owned configuration files from being loaded. After implementing this
option, `PATH`s and package installation are the other things that the
admin must enforce.
### Prevent spawners from evaluating shell configuration files
For most Spawners, `PATH` is not something users can influence, but it's important that
the Spawner should _not_ evaluate shell configuration files prior to launching the server.
### Isolate packages in a read-only environment
The user must not have permission to install packages into the environment where the singleuser-server runs.
On a shared system, package isolation is most easily handled by running the single-user server in
a root-owned virtualenv with disabled system-site-packages.
The user must not have permission to install packages into this environment.
The same principle extends to the images used by container-based deployments.
If users can select the images in which their servers run, they can disable all security for their own servers.
It is important to note that the control over the environment is only required for the
single-user server, and not the environment(s) in which the users' kernel(s)
may run. Installing additional packages in the kernel environment does not
pose additional risk to the web application's security.
### Encrypt internal connections with SSL/TLS
By default, all communications within JupyterHub—between the proxy, hub, and single
-user notebooks—are performed unencrypted. Setting the `internal_ssl` flag in
`jupyterhub_config.py` secures the aforementioned routes. Turning this
feature on does require that the enabled `Spawner` can use the certificates
generated by the `Hub` (the default `LocalProcessSpawner` can, for instance).
It is also important to note that this encryption **does not** cover the
`zmq tcp` sockets between the Notebook client and kernel yet. While users cannot
submit arbitrary commands to another user's kernel, they can bind to these
sockets and listen. When serving untrusted users, this eavesdropping can be
mitigated by setting `KernelManager.transport` to `ipc`. This applies standard
Unix permissions to the communication sockets thereby restricting
communication to the socket owner. The `internal_ssl` option will eventually
extend to securing the `tcp` sockets as well.
### Mitigating same-origin deployments
While per-user domains are **required** for robust protection of users from each other,
you can mitigate many (but not all) cross-user issues.
First, it is critical that users cannot modify their server environments, as described above.
Second, it is important that users do not have `access:servers` permission to any server other than their own.
If users can access each others' servers, additional security measures must be enabled, some of which come with distinct user-experience costs.
Without the [Same-Origin Policy] (SOP) protecting user servers from each other,
each user server is considered a trusted origin for requests to each other user server (and the Hub itself).
Servers _cannot_ meaningfully distinguish requests originating from other user servers,
because SOP implies a great deal of trust, losing many restrictions applied to cross-origin requests.
That means pages served from each user server can:
1. arbitrarily modify the path in the Referer
2. make fully authorized requests with cookies
3. access full page contents served from the hub or other servers via popups
JupyterHub uses distinct xsrf tokens stored in cookies on each server path to attempt to limit requests across.
This has limitations because not all requests are protected by these XSRF tokens,
and unless additional measures are taken, the XSRF tokens from other user prefixes may be retrieved.
`allow-popups` is not disabled by default because disabling it breaks legitimate functionality, like "Open this in a new tab", and the "JupyterHub Control Panel" menu item.
To reiterate, the right way to avoid these issues is to enable per-user domains, where none of these concerns come up.
Note: even this level of protection requires administrators maintaining full control over the user server environment.
If users can modify their server environment, these methods are ineffective, as users can readily disable them.
### Cookie tossing
Cookie tossing is a technique where another server on a subdomain or peer subdomain can set a cookie
which will be read on another domain.
This is not relevant unless there are other user-controlled servers on a peer domain.
"Domain-locked" cookies avoid this issue, but have their own restrictions:
- JupyterHub must be served over HTTPS
- All secure cookies must be set on `/`, not on sub-paths, which means they are shared by all JupyterHub components in a single-domain deployment.
As a result, this option is only recommended when per-user subdomains are enabled,
to prevent sending all jupyterhub cookies to all user servers.
To enable domain-locked cookies, set:
```python
c.JupyterHub.cookie_host_prefix_enabled=True
```
```{versionadded} 4.1
```
### Forced-login
Jupyter servers can share links with `?token=...`.
JupyterHub prior to 5.0 will accept this request and persist the token for future requests.
This is useful for enabling admins to create 'fully authenticated' links bypassing login.
However, it also means users can share their own links that will log other users into their own servers,
enabling them to serve each other notebooks and other arbitrary HTML, depending on server configuration.
```{versionadded} 4.1
Setting environment variable `JUPYTERHUB_ALLOW_TOKEN_IN_URL=0` in the single-user environment can opt out of accepting token auth in URL parameters.
```
```{versionadded} 5.0
Accepting tokens in URLs is disabled by default, and `JUPYTERHUB_ALLOW_TOKEN_IN_URL=1` environment variable must be set to _allow_ token auth in URL parameters.
```
## Security audits
We recommend that you do periodic reviews of your deployment's security. It's
good practice to keep [JupyterHub](https://readthedocs.org/projects/jupyterhub/), [configurable-http-proxy][], and [nodejs
versions](https://github.com/nodejs/Release) up to date.
and can look different depending on what you mean by 'share.'
Your first instinct might be to copy the URL you see in the browser,
e.g. `jupyterhub.example/user/yourname/notebooks/coolthing.ipynb`,
but this usually won't work, depending on the permissions of the person you share the link with.
Unfortunately, 'share' means at least a few things to people in a JupyterHub context.
We'll cover 3 common cases here, when they are applicable, and what assumptions they make:
1. sharing links that will open the same file on the visitor's own server
2. sharing links that will bring the visitor to _your_ server (e.g. for real-time collaboration, or RTC)
3. publishing notebooks and sharing links that will download the notebook into the user's server
### link to the same file on the visitor's server
This is for the case where you have JupyterHub on a shared (or sufficiently similar) filesystem, where you want to share a link that will cause users to login and start their _own_ server, to view or edit the file.
**Assumption:** the same path on someone else's server is valid and points to the same file
This is useful in e.g. classes where you know students have certain files in certain locations, or collaborations where you know you have a shared filesystem where everyone has access to the same files.
A link should look like `https://jupyterhub.example/hub/user-redirect/lab/tree/foo.ipynb`.
You can hand-craft these URLs from the URL you are looking at, where you see `/user/name/lab/tree/foo.ipynb` use `/hub/user-redirect/lab/tree/foo.ipynb` (replace `/user/name/` with `/hub/user-redirect/`).
Or you can use JupyterLab's "copy shareable link" in the context menu in the file browser:

which will produce a correct URL with `/hub/user-redirect/` in it.
### link to the file on your server
This is for the case where you want to both be using _your_ server, e.g. for real-time collaboration (RTC).
**Assumption:** the user has (or should have) access to your server.
**Assumption:** your server is running _or_ the user has permission to start it.
By default, JupyterHub users don't have access to each other's servers, but JupyterHub 2.0 administrators can grant users limited access permissions to each other's servers.
If the visitor doesn't have access to the server, these links will result in a 403 Permission Denied error.
In many cases, for this situation you can copy the link in your URL bar (`/user/yourname/lab`), or you can add `/tree/path/to/specific/notebook.ipynb` to open a specific file.
The [jupyterlab-link-share] JupyterLab extension generates these links, and even can _grant_ other users access to your server.
### How can I kill ports from JupyterHubmanaged services that have been orphaned?
### How can I kill ports from JupyterHub-managed services that have been orphaned?
I started JupyterHub + nbgrader on the same host without containers. When I try to restart JupyterHub + nbgrader with this configuration, errors appear that the service accounts cannot start because the ports are being used.
@@ -92,12 +68,12 @@ Where `<service_port>` is the port used by the nbgrader course service. This con
### Why am I getting a Spawn failed error message?
After successfully logging in to JupyterHub with a compatible authenticators, I get a 'Spawn failed' error message in the browser. The JupyterHub logs have `jupyterhub KeyError: "getpwnam(): name not found: <my_user_name>`.
After successfully logging in to JupyterHub with a compatible authenticator, I get a 'Spawn failed' error message in the browser. The JupyterHub logs have `jupyterhub KeyError: "getpwnam(): name not found: <my_user_name>`.
This issue occurs when the authenticator requires a local system user to exist. In these cases, you need to use a spawner
that does not require an existing system user account, such as `DockerSpawner` or `KubeSpawner`.
### How can I run JupyterHub with sudo but use my current env vars and virtualenv location?
### How can I run JupyterHub with sudo but use my current environment variables and virtualenv location?
When launching JupyterHub with `sudo jupyterhub` I get import errors and my environment variables don't work.
@@ -109,25 +85,11 @@ sudo MY_ENV=abc123 \
/srv/jupyterhub/jupyterhub
```
### How can I view the logs for JupyterHub or the user's Notebook servers when using the DockerSpawner?
Use `docker logs <container>` where `<container>` is the container name defined within `docker-compose.yml`. For example, to view the logs of the JupyterHub container use:
docker logs hub
By default, the user's notebook server is named `jupyter-<username>` where `username` is the user's username within JupyterHub's db. So if you wanted to see the logs for user `foo` you would use:
docker logs jupyter-foo
You can also tail logs to view them in real time using the `-f` option:
docker logs -f hub
## Errors
### 500 error after spawning my single-user server
### Error 500 after spawning my single-user server
You receive a 500 error when accessing the URL `/user/<your_name>/...`.
You receive a 500 error while accessing the URL `/user/<your_name>/...`.
This is often seen when your single-user server cannot verify your user cookie
with the Hub.
@@ -153,9 +115,9 @@ If everything is working, the response logged will be similar to this:
You should see a similar 200 message, as above, in the Hub log when you first
visit your single-user notebook server. If you don't see this message in the log, it
may mean that your single-user notebook server isn't connecting to your Hub.
may mean that your single-user notebook server is not connecting to your Hub.
If you see 403 (forbidden) like this, it's likely a token problem:
If you see 403 (forbidden) like this, it is likely a token problem:
```
403 GET /hub/api/authorizations/cookie/jupyterhub-token-name/[secret] (@10.0.1.4) 4.14ms
@@ -185,10 +147,10 @@ If you receive a 403 error, the API token for the single-user server is likely
invalid. Commonly, the 403 error is caused by resetting the JupyterHub
database (either removing jupyterhub.sqlite or some other action) while
leaving single-user servers running. This happens most frequently when using
DockerSpawner, because Docker's default behavior is to stop/start containers
which resets the JupyterHub database, rather than destroying and recreating
DockerSpawner because Docker's default behavior is to stop/start containers
that reset the JupyterHub database, rather than destroying and recreating
the container every time. This means that the same API token is used by the
server for its whole life, until the container is rebuilt.
server for its whole life until the container is rebuilt.
The fix for this Docker case is to remove any Docker containers seeing this
issue (typically all containers created before a certain point in time):
@@ -201,28 +163,28 @@ your server again.
##### Proxy settings (403 GET)
When your whole JupyterHub sits behind a organization proxy (_not_ a reverse proxy like NGINX as part of your setup and _not_ the configurable-http-proxy) the environment variables `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy` and `https_proxy` might be set. This confuses the jupyterhub-singleuser servers: When connecting to the Hub for authorization they connect via the proxy instead of directly connecting to the Hub on localhost. The proxy might deny the request (403 GET). This results in the singleuser server thinking it has a wrong auth token. To circumvent this you should add `<hub_url>,<hub_ip>,localhost,127.0.0.1` to the environment variables `NO_PROXY` and `no_proxy`.
When your whole JupyterHub sits behind an organization proxy (_not_ a reverse proxy like NGINX as part of your setup and _not_ the configurable-http-proxy) the environment variables `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` might be set. This confuses the JupyterHubsingle-user servers: When connecting to the Hub for authorization they connect via the proxy instead of directly connecting to the Hub on localhost. The proxy might deny the request (403 GET). This results in the single-user server thinking it has the wrong auth token. To circumvent this you should add `<hub_url>,<hub_ip>,localhost,127.0.0.1` to the environment variables `NO_PROXY` and `no_proxy`.
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
[JupyterHub services](https://jupyterhub.readthedocs.io/en/stable/reference/services.html) allow processes to interact with JupyterHub's REST API. Example use-cases include:
{ref}`services` allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such grading assignments.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
- **Private Dashboards**: share dashboards with certain group members.
If possible, try to run the Jupyter Notebook as an externally managed service with one of the provided [jupyter/docker-stacks](https://github.com/jupyter/docker-stacks).
Standard JupyterHub installations include a [jupyterhub-singleuser](https://github.com/jupyterhub/jupyterhub/blob/9fdab027daa32c9017845572ad9d5ba1722dbc53/setup.py#L116) command which is built from the `jupyterhub.singleuser:main` method. The `jupyterhub-singleuser` command is the default command when JupyterHub launches single-user Jupyter Notebooks. One of the goals of this command is to make sure the version of JupyterHub installed within the Jupyter Notebook coincides with the version of the JupyterHub server itself.
If you launch a Jupyter Notebook with the `jupyterhub-singleuser` command directly from the command line the Jupyter Notebook won't have access to the `JUPYTERHUB_API_TOKEN` and will return:
If you launch a Jupyter Notebook with the `jupyterhub-singleuser` command directly from the command line, the Jupyter Notebook won't have access to the `JUPYTERHUB_API_TOKEN` and will return:
```
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser.
Did you launch it manually?
```
If you plan on testing `jupyterhub-singleuser` independently from JupyterHub, then you can set the api token environment variable. For example, if were to run the single-user Jupyter Notebook on the host, then:
If you plan on testing `jupyterhub-singleuser` independently from JupyterHub, then you can set the API token environment variable. For example, if you were to run the single-user Jupyter Notebook on the host, then:
export JUPYTERHUB_API_TOKEN=my_secret_token
jupyterhub-singleuser
@@ -243,7 +205,7 @@ With a docker container, pass in the environment variable with the run command:
Some certificate providers, i.e. Entrust, may provide you with a chained
certificate that contains multiple files. If you are using a chained
certificate you will need to concatenate the individual files by appending the
The important thing is that jupyterlab is installed and enabled in the
The important thing is that JupyterLab is installed and enabled in the
single-user notebook server environment. For system users, this means
system-wide, as indicated above. For Docker containers, it means inside
the single-user docker image, etc.
@@ -334,14 +296,14 @@ notebook servers to default to JupyterLab:
### How do I set up JupyterHub for a workshop (when users are not known ahead of time)?
1. Set up JupyterHub using OAuthenticator for GitHub authentication
2. Configure admin list to have workshop leaders be listed with administrator privileges.
2. Configure the admin list to have workshop leaders listed with administrator privileges.
Users will need a GitHub account to login and be authenticated by the Hub.
Users will need a GitHub account to login and be authenticated by the Hub.
### How do I set up rotating daily logs?
You can do this with [logrotate](https://linux.die.net/man/8/logrotate),
or pipe to `logger` to use syslog instead of directly to a file.
or pipe to `logger` to use Syslog instead of directly to a file.
For example, with this logrotate config file:
@@ -362,34 +324,9 @@ Or use syslog:
jupyterhub | logger -t jupyterhub
## Troubleshooting commands
The following commands provide additional detail about installed packages,
versions, and system information that may be helpful when troubleshooting
a JupyterHub deployment. The commands are:
- System and deployment information
```bash
jupyter troubleshooting
```
- Kernel information
```bash
jupyter kernelspec list
```
- Debug logs when running JupyterHub
```bash
jupyterhub --debug
```
### Toree integration with HDFS rack awareness script
The Apache Toree kernel will an issue, when running with JupyterHub, if the standard HDFS
rack awareness script is used. This will materialize in the logs as a repeated WARN:
The Apache Toree kernel will have an issue when running with JupyterHub if the standard HDFS rack awareness script is used. This will materialize in the logs as a repeated WARN:
@@ -410,10 +347,49 @@ In order to resolve this issue, there are two potential options.
### Where do I find Docker images and Dockerfiles related to JupyterHub?
Docker images can be found at the [JupyterHub organization on DockerHub](https://hub.docker.com/u/jupyterhub/).
The Docker image [jupyterhub/singleuser](https://hub.docker.com/r/jupyterhub/singleuser/)
provides an example singleuser notebook server for use with DockerSpawner.
Docker images can be found at the [JupyterHub organization on Quay.io](https://quay.io/organization/jupyterhub).
The Docker image [jupyterhub/singleuser](https://quay.io/repository/jupyterhub/singleuser)
provides an example single-user notebook server for use with DockerSpawner.
Additional singleuser notebook server images can be found at the [Jupyter
organization on DockerHub](https://hub.docker.com/r/jupyter/) and information
Additional single-user notebook server images can be found at the [Jupyter
organization on Quay.io](https://quay.io/organization/jupyter) and information
about each image at the [jupyter/docker-stacks repo](https://github.com/jupyter/docker-stacks).
### How can I view the logs for JupyterHub or the user's Notebook servers when using the DockerSpawner?
Use `docker logs <container>` where `<container>` is the container name defined within `docker-compose.yml`. For example, to view the logs of the JupyterHub container use:
docker logs hub
By default, the user's notebook server is named `jupyter-<username>` where `username` is the user's username within JupyterHub's database.
So if you wanted to see the logs for user `foo` you would use:
docker logs jupyter-foo
You can also tail logs to view them in real-time using the `-f` option:
docker logs -f hub
## Troubleshooting commands
The following commands provide additional detail about installed packages,
versions, and system information that may be helpful when troubleshooting
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
Sharing links to notebooks is a common activity,
and can look different based on what you mean.
Your first instinct might be to copy the URL you see in the browser,
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
However, let's break down what this URL means:
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
which means that sharing this URL is asking the person you share the link with
to come to _your server_ and look at the exact same file.
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
But what is our actual goal?
A typical situation is that you have some shared or common filesystem,
such that the same path corresponds to the same document
(either the exact same document or another copy of it).
Typically, what folks want when they do sharing like this
is for each visitor to open the same file _on their own server_,
so Breq would open `/user/breq/notebooks/foo.ipynb` and
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
JupyterHub has a special URL that does exactly this!
It's called `/hub/user-redirect/...`.
So if you replace `/user/yourname` in your URL bar
with `/hub/user-redirect` any visitor should get the same
URL on their own server, rather than visiting yours.
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
Visit the [Github OAuthenticator reference](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.github.html) to see the full list of options for configuring Github OAuth with JupyterHub.
@@ -6,14 +6,14 @@ Only do this if you are very sure you must.
## Overview
There are many Authenticators and Spawners available for JupyterHub. Some, such
as DockerSpawner or OAuthenticator, do not need any elevated permissions. This
There are many [Authenticators](authenticators) and [Spawners](spawners) available for JupyterHub. Some, such
as [DockerSpawner](https://github.com/jupyterhub/dockerspawner) or [OAuthenticator](https://github.com/jupyterhub/oauthenticator), do not need any elevated permissions. This
document describes how to get the full default behavior of JupyterHub while
running notebook servers as real system users on a shared system without
running notebook servers as real system users on a shared system, without
running the Hub itself as root.
Since JupyterHub needs to spawn processes as other users, the simplest way
is to run it as root, spawning user servers with [setuid](http://linux.die.net/man/2/setuid).
is to run it as root, spawning user servers with [setuid](https://linux.die.net/man/2/setuid).
But this isn't especially safe, because you have a process running on the
To deploy JupyterHub means you are providing Jupyter notebook environments for
multiple users. Often, this includes a desire to configure the user
environment in a custom way.
Since the `jupyterhub-singleuser` server extends the standard Jupyter notebook
server, most configuration and documentation that applies to Jupyter Notebook
applies to the single-user environments. Configuration of user environments
typically does not occur through JupyterHub itself, but rather through system-wide
configuration of Jupyter, which is inherited by `jupyterhub-singleuser`.
**Tip:** When searching for configuration tips for JupyterHub user environments, you might want to remove JupyterHub from your search because there are a lot more people out there configuring Jupyter than JupyterHub and the configuration is the same.
This section will focus on user environments, which includes the following:
- [Installing packages](#installing-packages)
- [Configuring Jupyter and IPython](#configuring-jupyter-and-ipython)
and [IPython](https://ipython.readthedocs.io/en/stable/development/config.html)
have their own configuration systems.
As a JupyterHub administrator, you will typically want to install and configure environments for all JupyterHub users. For example, let's say you wish for each student in a class to have the same user environment configuration.
Jupyter and IPython support **"system-wide"** locations for configuration, which is the logical place to put global configuration that you want to affect all users. It's generally more efficient to configure user environments "system-wide", and it's a good practice to avoid creating files in the users' home directories.
The typical locations for these config files are:
- **system-wide** in `/etc/{jupyter|ipython}`
- **env-wide** (environment wide) in `{sys.prefix}/etc/{jupyter|ipython}`.
### Jupyter environment configuration priority
When Jupyter runs in an environment (conda or virtualenv), it prefers to load configuration from the environment over each user's own configuration (e.g. in `~/.jupyter`).
This may cause issues if you use a _shared_ conda environment or virtualenv for users, because e.g. jupyterlab may try to write information like workspaces or settings to the environment instead of the user's own directory.
This could fail with something like `Permission denied: $PREFIX/etc/jupyter/lab`.
To avoid this issue, set `JUPYTER_PREFER_ENV_PATH=0` in the user environment:
```python
c.Spawner.environment.update(
{
"JUPYTER_PREFER_ENV_PATH":"0",
}
)
```
which tells Jupyter to prefer _user_ configuration paths (e.g. in `~/.jupyter`) to configuration set in the environment.
### Example: Enable an extension system-wide
For example, to enable the `cython` IPython extension for all of your users, create the file `/etc/ipython/ipython_config.py`:
```python
c.InteractiveShellApp.extensions.append("cython")
```
### Example: Enable a Jupyter notebook configuration setting for all users
:::{note}
These examples configure the Jupyter ServerApp, which is used by JupyterLab, the default in JupyterHub 2.0.
If you are using the classing Jupyter Notebook server,
the same things should work,
with the following substitutions:
- Search for `jupyter_server_config`, and replace with `jupyter_notebook_config`
- Search for `NotebookApp`, and replace with `ServerApp`
:::
To enable Jupyter notebook's internal idle-shutdown behavior (requires notebook ≥ 5.4), set the following in the `/etc/jupyter/jupyter_server_config.py` file:
```python
# shutdown the server after no activity for an hour
c.ServerApp.shutdown_no_activity_timeout=60*60
# shutdown kernels after no activity for 20 minutes
c.MappingKernelManager.cull_idle_timeout=20*60
# check for idle kernels every two minutes
c.MappingKernelManager.cull_interval=2*60
```
## Installing kernelspecs
You may have multiple Jupyter kernels installed and want to make sure that they are available to all of your users. This means installing kernelspecs either system-wide (e.g. in /usr/local/) or in the `sys.prefix` of JupyterHub
itself.
Jupyter kernelspec installation is system-wide by default, but some kernels
may default to installing kernelspecs in your home directory. These will need
to be moved system-wide to ensure that they are accessible.
To see where your kernelspecs are, you can use the following command:
```bash
jupyter kernelspec list
```
### Example: Installing kernels system-wide
Let's assume that I have a Python 2 and Python 3 environment that I want to make sure are available, I can install their specs **system-wide** (in /usr/local) using the following command:
There are two broad categories of user environments that depend on what
Spawner you choose:
- Multi-user hosts (shared system)
- Container-based
How you configure user environments for each category can differ a bit
depending on what Spawner you are using.
The first category is a **shared system (multi-user host)** where
each user has a JupyterHub account, a home directory as well as being
a real system user. In this example, shared configuration and installation
must be in a 'system-wide' location, such as `/etc/`, or `/usr/local`
or a custom prefix such as `/opt/conda`.
When JupyterHub uses **container-based** Spawners (e.g. KubeSpawner or
DockerSpawner), the 'system-wide' environment is really the container image used for users.
In both cases, you want to _avoid putting configuration in user home
directories_ because users can change those configuration settings. Also, home directories typically persist once they are created, thereby making it difficult for admins to update later.
## Named servers
By default, in a JupyterHub deployment, each user has one server only.
JupyterHub can, however, have multiple servers per user.
This is mostly useful in deployments where users can configure the environment in which their server will start (e.g. resource requests on an HPC cluster), so that a given user can have multiple configurations running at the same time, without having to stop and restart their own server.
To allow named servers, include this code snippet in your config file:
```python
c.JupyterHub.allow_named_servers=True
```
Named servers were implemented in the REST API in JupyterHub 0.8,
and JupyterHub 1.0 introduces UI for managing named servers via the user home page:

as well as the admin page:

Named servers can be accessed, created, started, stopped, and deleted
from these pages. Activity tracking is now per server as well.
To limit the number of **named server** per user by setting a constant value, include this code snippet in your config file:
```python
c.JupyterHub.named_server_limit_per_user=5
```
Alternatively, to use a callable/awaitable based on the handler object, include this code snippet in your config file:
This can be useful for quota service implementations. The example above limits the number of named servers for non-admin users only.
If `named_server_limit_per_user` is set to `0`, no limit is enforced.
When using named servers, Spawners may need additional configuration to take the `servername` into account. Whilst `KubeSpawner` takes the `servername` into account by default in [`pod_name_template`](https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner.pod_name_template), other Spawners may not. Check the documentation for the specific Spawner to see how singleuser servers are named, for example in `DockerSpawner` this involves modifying the [`name_template`](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html) setting to include `servername`, eg. `"{prefix}-{username}-{servername}"`.
(classic-notebook-ui)=
## Switching back to the classic notebook
By default, the single-user server launches JupyterLab,
which is based on [Jupyter Server][].
This is the default server when running JupyterHub ≥ 2.0.
To switch to using the legacy Jupyter Notebook server (notebook <7.0),youcansetthe`JUPYTERHUB_SINGLEUSER_APP`environmentvariable
The _How-to_ guides provide practical step-by-step details to help you achieve a particular goal. They are useful when you are trying to get something done but require you to understand and adapt the steps to your specific usecase.
Use the following guides when:
```{toctree}
:maxdepth: 1
api-only
proxy
rest
separate-proxy
templates
upgrading
log-messages
```
(config-examples)=
## Configuration
The following guides provide examples, including configuration files and tips, for the
- make an API request programmatically using the requests library
- learn more about JupyterHub's API
- What you can do with the API
- How to create an API token
- Assigning permissions to a token
- Updating to admin services
- Making an API request programmatically using the requests library
- Paginating API requests
- Enabling users to spawn multiple named-servers via the API
- Learn more about JupyterHub's API
## What you can do with the API
Using the [JupyterHub REST API][], you can perform actions on the Hub,
such as:
- checking which users are active
- adding or removing users
- stopping or starting single user notebook servers
- authenticating services
- communicating with an individual Jupyter server's REST API
A [REST](https://en.wikipedia.org/wiki/Representational_state_transfer)
Before we discuss about JupyterHub's REST API, you can learn about [REST APIs here](https://en.wikipedia.org/wiki/Representational_state_transfer). A REST
API provides a standard way for users to get and send information to the
Hub.
## What you can do with the API
Using the [JupyterHub REST API](jupyterhub-rest-API), you can perform actions on the Hub,
such as:
- Checking which users are active
- Adding or removing users
- Stopping or starting single user notebook servers
- Authenticating services
- Communicating with an individual Jupyter server's REST API
## Create an API token
To send requests using JupyterHub API, you must pass an API token with
To send requests using the JupyterHub API, you must pass an API token with
the request.
The preferred way of generating an API token is:
While JupyterHub is running, any JupyterHub user can request a token via the `token` page.
This is accessible via a `token` link in the top nav bar from the JupyterHub home page,
and that page has some docs. If you are using a different proxy, such
as Traefik, these instructions are probably not relevant to you.
[configurable-http-proxy](https://github.com/jupyterhub/configurable-http-proxy). If you are using a different proxy, such
as [Traefik](https://github.com/traefik/traefik), these instructions are probably not relevant to you.
## Configuration options
@@ -40,9 +41,14 @@ set to the URL which the hub uses to connect _to the proxy's API_.
## Proxy configuration
You need to configure a service to start the proxy. An example
command line for this is `configurable-http-proxy --ip=127.0.0.1 --port=8000 --api-ip=127.0.0.1 --api-port=8001 --default-target=http://localhost:8081 --error-target=http://localhost:8081/hub/error`. (Details for how to
do this is out of scope for this tutorial - for example it might be a
systemd service on within another docker cotainer). The proxy has no
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.