- move spec to _static/rest-api.yml, since the original yaml must be served
- copy javascript rendering code from FastAPI (uses swagger-ui)
- remove link to pet store, since there isn't a big enough difference to duplicate it
- remove bootprint rendering with node
since it means 'inheriting' the owner's permissions
'all' prompted the question 'all of what, exactly?'
Additionally, fix some NameErrors that should have been KeyErrors
some things raise standard TimeoutError, others may raise tornado gen.TimeoutError (gen.with_timeout)
For consistency, add AnyTimeoutError tuple to allow catching any timeout, no matter what kind
Where we were raising `TimeoutError`,
we should have been raising `asyncio.TimeoutError`.
The base TimeoutError is an OSError for ETIMEO, which is for system calls
404 is also used to identify that a particular resource
(like a kernel or terminal) is not present, maybe because
it is deleted. That comes from the notebook server, while
here we are responding from JupyterHub. Saying that the
user server they are trying to request the resource (kernel, etc)
from does not exist seems right.
Non-running user servers making requests is a fairly
common occurance - user servers get culled while their
browser tabs are left open. So we now have a background level
of 503s responses on the hub *all* the time, making it
very difficult to detect *real* 503s, which should ideally
be closely monitored and alerted on.
I *think* 404 is a more appropriate response, as the resource
(API) being requested is no longer present.
swaps from default nbclassic and opt-in to lab, to now default to lab and opt-in to nbclassic
defaults to jupyterlab *if* lab 3.1 is available,
so should still work without configuration if lab is unavailable (or too old)
- remove mention of outdated nodejs-legacy
- mention nodesource for more recent node
- mention jupyterlab
- initial localhost request will be on http, not https
Of 18355 lines of logs in a 5day old hub instance,
8228 are just this message. That's 44% of the logs! We now
have prometheus metrics to monitor performance of this if
needed, and people can always turn on debug logging.
- circle CI no longer used
- ubuntu/debian nodejs may be too old (12.0+ required)
- remove mention of mailing list
- Python 3.6 required
- Emphasise JupyterLab over notebook
so it can be applied to all cookie-authenticated POST requests
also parse the content-type header to handle e.g. `Content-Type: application/json; charset`
The content-type of Hub API requests used for user management, specifically for creating a user
is not validated and so the ‘text/plain’ type is accepted, where it must be ‘application/json’.
This commit adds validation for `Content-type` header for the /hub/api/users endpoint to only
allow requests with content-type as `application/json`
improves backward compatibility for clients that haven't implemented pagination
by requesting the max page size by default instead of the new default page size
use `Accept: application/jupyterhub-pagination+json` to opt-in to the new response format
With a paginated API, we need to return pagination info (next page arguments, whether a next page exists, etc.),
but a simple list response doesn't give a good way to do that.
We can follow precedents and use a dict with an `items` field for the actual items,
and a `_pagination` field for info about pagination, including offset, limit, url for the next request
and govern GET /users|groups|services endpoints with these
Greatly simplifies filtering and pagination,
because these filters can be expressed in db filters,
unlike the potentially complex `read:users`.
Now the query itself will never return a model that should be excluded.
While writing the tests, I added more cleanup between tests.
We now ensure cleanup of all users and groups after each test,
which required updating some group tests which relied on this state leaking
distutils is slated for deprecation in the stdlib
we can use packaging for version parsing and setuptools in setup.py
packaging is technically an extra dependency, but rarely missing because it's so widespread
- ensure create_certs is called for managed services
- wait for services with http, which checks ssl connections (without http, only tcp was checked, which doesn't verify it works!)
adding `read:` to everything isn't right because not everything has a `read:` counterpart and not every `read:` has a write counterpart
includes a test verifying that every scope has a definition
- access:services for services
- access:users:servers for servers
- tokens automatically have access to their issuing client (if their owner does, too)
- Check access scope in HubAuth integration
check_db_locks checks for db lock state after the end of a function,
but wasn't properly waiting when it wrapped an async function,
meaning it would run the check while the async function was still outstanding,
causing possible spurious failures
and fix warning condition for intersection overlap
- only warn when there's a group only on one side and a user or server only on the other,
otherwise there is no lost information to warn about (group and/or defined on both sides)
- correctly resolve servers as sub-scopes of user
- update references to default branch name in docs, workflows
- use HEAD in github urls, which always works regardless of default branch name
- fix petstore URLs since the old petstore links seem to have stopped working
should never occur in real applications where only one loop is run,
but may occur in tests if the Proxy object lives longer than the loop in which it runs
I suspect this is the source of our intermittent test failures with
> got Future <Future pending> attached to a different loop
- remove long-deprecated `POST /api/authorizations/token` for creating tokens
- deprecate but do not remove `GET /api/authorizations/token/:token` in favor of GET /api/user
- remove shared-cookie auth for services from HubAuth, rely on OAuth for browser-auth instead
- use `/hub/api/user` to resolve user instead of `/authorizations/token` which is now deprecated
instead of on the test class
and fix the logic for when it is called a bit:
- call on *all* Spawners, not just the default
- call on named server deletion when remove=True
and clarify warning when a base handler isn't patched
- reorganize patch steps into functions for easier re-use
- patch notebook and jupyter_server handlers if they are already imported
- run patch after initialize to ensure extensions have done their importing before we check
- Attach role limit to OAuthClient
- Attach authorized roles to OAuthCode
- pass roles from code to API token on completion
standard 'scopes' in oauth process are matched against our 'roles' instead of our low-level scopes
These only affected servers upgrading directly from 0.8 or earlier with still-running servers
0.8 was a long time ago, it's okay to require restarting servers for an upgrade that long
- 3-255 characters
- ascii lowercase, numbers, -
- must start with letter
- must not end with -
this lets us avoid url escaping issues in e.g. oauth params
When the hub is running in API-only mode, it's
very useful to have the proxy know where to send
URLs that would normally be serviced by the hub.
For example, / might go to a service that renders
a home page, while `/user` might go to a service that
tells the user their server is dead.
Right now, this happens 'out of band', with a process
that has to talk to the proxy directly. This is a
bit messy - the routes need to be re-added when the
proxy restarts, the hub might try to remove them, etc.
By adding support for this in the hub itself, all
this complexity is now removed and the hub continues
to own all the routes in the proxy
This allows for more flexible customization of the login page,
since it allows to re-use the login form in an extending template
by reusing the new block.
This was not cleanly possible before since the main container
was part of the very same block as the form code.
fixes#3414
- merge oauth token fields into APITokens
- create oauth client 'jupyterhub' which owns current API tokens
- db upgrade is currently to drop both token tables, and force recreation on next start
so changing cookie age changes oauth token expiry,
since these are what are stored in those cookies anyway,
it makes sense for them to expire at the same time
When an oauth client changes, we delete all the tokens
associated with that client. This invalidates all user sessions
for that oauth client, and the oauth client's users will need to
go through the OAuth workflow again after the cache period (specified
by cache_max_age in HubAuth, 5min by default). This is fine in theory,
since oauth client information doesn't change frequently.
However, we were deleting and re-adding all oauth clients each time
the hub started! This was unnecessary, since the data was going to
be the same 99% of the time. Rest of the time, we should just update,
preventing unnecessary churn.
This PR does that.
Ref https://github.com/yuvipanda/jupyterhub-configurator/issues/2
Ref https://github.com/berkeley-dsep-infra/datahub/issues/2284
get_current_user returns a User model instead of a dict.
using cookies for Hub auth is deprecated, so removed
that option and refactored get_current_user
makes testing a PR even easier since we build an sdist and wheel for every PR and push
since artifacts are double-archived, it's not quite as simple as giving a URL to install from,
but this at least makes it available. To use:
- download and unpack zip
- `pip install path/to/whl`
apply patch directly to BaseHandler instead of each handler instance
so that overrides can still take effect (i.e. APIHandler raising 403 instead of redirecting)
Merge request #3257fixed#3256 only on getting-started/services-basics.md
There is still a reference to jupyterhub example cull-idle in reference/services.md
setting per_page in constructor resolves before max_per_page limit is updated from config,
preventing max_per_page from being increased beyond the default limit
we already loaded these values anyway in the first instance,
so remove the redundant Pagination object
Running the Curl as is return a 500 with ```json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
``` Converting the payload to a proper Json
The feature is disabled by default.
If enabled (by setting `login_term_url`), user will have to check the
checkbox to accept the terms and conditions in order to login.
jinja2's async support requires Python 3.6+. That should
be an implementation detail - so we render it in the main
thread (current behavior) but pretend we did not
write_error is a synchronous method called by an async
method from inside the event loop. This means we can't just
schedule an async render_templates in the same loop and wait
for it - that would deadlock.
jinja2 compiled your code differently based on wether you
enable async support or not. Templates compiled with async
support can't be used in cases like ours, where we already
have an event loop running and calling a sync function. So
we maintain two almost identical jinja2 environments
testing other branches is useful, and there's little cost to removing the conditions:
- we don't run PRs from our repo, so test runs aren't duplicated on the repo
- testing on a fork without opening a PR is still useful (I use this often)
- if we push a branch, it should probably be tested (e.g. backport branch), and filters make this extra work
- the cost of running a few extra tests is low, especially given actions' current quotas and parallelism
The jupyterhub/tests/test_spawner.py::test_spawner_routing[has~x] test
failed in py37+ but not in py36, and I think it is foundational to the
socket library of Python that has changed.
This is a stacktrace from Python/3.7.9/x64/lib/python3.7/site-packages/urllib3/util/connection.py:61
```
> for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
E socket.gaierror: [Errno -2] Name or service not known
```
Here is relevant documentation about socket.getaddrinfo.
https://docs.python.org/3.7/library/socket.html#socket.getaddrinfo
- Fixes typo (eolving -> evolving)
- re-use the word current instead of momentary for comprehensibility
- references JupyterHubs current state with its instead of the for comprehensibility
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
allows selecting users based on the 'ready' 'active' or 'inactive' states of their servers
- ready: users who have any servers in the 'ready' state
- active: users who have any servers in the 'active' state (i.e. ready OR pending)
- inactive: users who have *no* servers in the 'active' state (inactive + active = all users, no overlap)
Does not change the user model, so a user with *any* ready servers will still return all their servers
Fixes issues with OAuth flows when internal_ssl is enabled.
When internal_ssl was enabled requests to non-internal endpoints failed
because the system CAs were not being loaded.
This caused failures with public OAuth providers with public CAs since
they would fail to validate.
We are users of the napoleon sphinx extension, which helps us parse our
Google Style Python Docstrings, and its syntax suggest we should use
indentation when we use more then one string for an entry in an
Arguments: or Returns: list.
For more details, see: https://github.com/jupyterhub/jupyterhub/pull/3151#issuecomment-676186565
- Explicitly mention min-8-char constraint
- Connect the api_token in the configuration with the one mentioned in auth requests
Co-authored-by: Mike Situ <msitu@ceresimaging.net>
for easier reuse with jupyter_server
mixins have a lot of assumptions about the NotebookApp structure.
Need to make sure these are met by jupyter_server (that's what tests are for!)
When using the `KubeSpawner` it is typical to disable the
`slow_spawn_timeout` by setting it to 0. `zero-to-jupyterhub-k8s`
does this by default [1]. However, this causes an immediate `TimeoutError`
which gets logged as a warning like this:
>User hub-stress-test-123 is slow to start (timeout=0)
This avoids the warning by checking the value and if disabled simply
returns without logging the warning.
[1] https://github.com/jupyterhub/zero-to-jupyterhub-k8s/commit/b4738edc5Closes#3126
- Related issue: #3120. Closes: #3120.
- I realized that spawner.clear_state() is called before
spawner.post_stop_hook(). This caused was a bit surprising to me,
and caused some issues.
- I tried the naive strategy of moving clear_state to later and
setting the orm_state to `{}` at the point where it used to be
clear.
- This tries to maintain the exception behavior of clear_state and
post_stop_hook, but is exactly identical.
- To review:
- I'm not sure this is a good idea!
- Carefully consider the implications of this. I am not at all sure
about unintended side-effects or what intended semantics are.
This was added in PR #2721 and by default results in just printing
out "10" without any context when starting the hub service. This
simply removes the orphan print statement.
I'm open to changing this to a debug log statement with context if
someone finds that useful, e.g.:
`self.log.debug('Effective init_spawners_timeout: %s', init_spawners_timeout)`
Bug #2852 describes an issue where templates cannot be found by
JupyterHub when using the Docker images built out of this repo. The
issue turned out to be due to missing node_modules at the time of build.
There is a hook in the `package.json` that causes node_modules to be
copied to the static/components directory post-install. If this is not
run, those components are not in the static directory and thus are not
included in the wheel when it is built.
Fix#2905 fixed one problem--the `bower-lite` hook script wasn't copied
to the Docker image, and so the hook couldn't run, but the other issue
is that the client dependencies are never explicitly built. They must be
built prior to the wheel build, and the hook script must have run so
they are copied to the ./static folder, which is included in the wheel
build thanks to [MANIFEST.in][1]
.. note::
This removes the verbose flag from the wheel build command. The
reason is that it generates a lot of writes to stdout. It seems that
wheel can (or always) is switching to non-blocking mode, which can cause
EAGAIN to be raised, which leads to fun errors like:
BlockingIOError(.., 'write could not complete without blocking', ..)
The wheels fail to build if this error is raised. Removing the verbosity
flag is a quick solution (it drastically reduces writes to STDOUT), but
comes at the cost of more trouble debugging a failed wheel build. Adding
the "-v" back in the Dockerfile when debugging a build failure is still
possible. [Credit: @vbraun][2]
.. note::
This commit also removes some extraneous COPY operations during the
Docker build, in particular the /src/jupyterhub/share directory is
not used unless users have explicitly override their
jupyterhub_config.py to include it somehow. If the default
data_files_path behavior is used, JupyterHub should find the proper
static directory when the application loads.
Fixes: #2852
[1]: https://packaging.python.org/guides/using-manifest-in/
[2]:
https://github.com/travis-ci/travis-ci/issues/4704#issuecomment-348435959
- base Expiring class
- ensures expiring values (OAuthCode, OAuthAccessToken, APIToken) are not returned from `find`
- all expire appropriately via purge_expired
behaves more like one would expect (same as try get-key, except: return default)
without relying on cache presence or underlying key type (integer only)
This does some of the test with the latest traitlets.
We are looking into making a 5.0 release and would like to have some
confidence that it does not break too many things.
They are less relevant than other request and could very well end up
cluttering the logs. It is not uncomming for these requests to be made
every second or every other second.
In case there are multiple singleuser notebooks at different
versions we want to log each of those mismatches as a warning
so this changes the global _version_mismatch_warning_logged flag
from a bool to a dict keyed by the hub/singleuser version mismatch
combination. A test wrinkle is added for that scenario.
Part of #2970
As a new contributor to jupyterhub it took awhile to get
up and running locally mainly because I didn't have sqlite
installed but also because I was flipping between README,
CONTRIBUTING and the actual contributing docs which are all
a little bit different.
This does a few things:
- Updates the contributor sphinx docs to mention that how
one chooses to isolate their development environment is
up to them with a link to the detailed forum thread on
that topic.
- Updates the contributor sphinx docs to mention sqlite and
database setup in general. While in here some trailing
whitespaces are cleaned up.
- Leave a comment in CONTRIBUTING.md about the redundant
information in the docs on getting a development environment
setup. Long-term we should really get those merged so there
is a single authoritative document on how to get a dev env
setup for contributing to jupyterhub.
- Link to the jupyterhub gitter channel for asking questions.
If your jupyterhub and jupyterhub-singleuser instances
are running at different minor or greater versions a
warning gets logged per active server which can be a lot
when you have hundreds of active servers.
This adds a flag to that version mismatch logging logic
such that the warning is only logged once per restart
of the hub server.
Closes issue #2970
APIHandler.server_model unconditionally returns the Spawner's
user_options dict but it wasn't mentioned in the API reference
so it's added here. The description is taken from the docstring
on Spawner.user_options.
Closes issue #2965
Authorization header has the form "<type> <credentials>"
rather than checking for "token" only, preserve type value, which could be Bearer, Basic, etc.
query on Server objects instead of User objects
avoids lots of ORM work on startup since there are typically a small number of running servers
relative to the total number of users
this also means that the users dict is not fully populated. Is that okay? I hope so.
Not exactly all though as some will be ignored by the .dockerignore
file. This change ensures we don't get future issues caused by a failure
to update what needs to be copied to the build stage and not like we've
had recently.
This fixes#2852 by adding a script part of package.json. But is this
enough? Should we perhaps look in MANIFEST.in and copy some more files
listed there?
This is all thanks to people coming together and helping out figuring
out the issue in https://github.com/jupyterhub/jupyterhub/issues/2852.
Thank you @shingo78 for spotting that we missed bower-lite and its role
and all others who reported and helped debug this!
Updated capitalisation of names. Addressed revisions.
Fleshed out the prerequists and explanation of access control.
Added part of configuration section to set JupyterLab as the default interface.
corrected need for sudo
Added warning to reverse-proxy section to recommend use of HTTPS and firewall.
- A trivial bug caused by my last change to #2397 - made possible by
the fact we didn't have a way to reliable test PAM stuff.
- Thanks to @narnish for noticing.
- Closes: #2875
- We now default to ubuntu bionic (18.04) and try once with ubuntu xenial
(16.04).
- We now always test Python 3.8 but allow it to fail, as compared to not
allowing it to fail and only testing it on tagged commits. This is a
bugfix I'd say.
- We now no longer test Python 3.5 and Python 3.6 dedicatedly without
any custom configuration like usage of subdomain, which allows us to
reduce the number of build jobs in a way I think makes a great sense to
compromise.
Some notes:
- Added a conda-forge and DockerHub badge
- Added logo's and made us conform with the team-compass badges section
as can be found here:
https://jupyterhub-team-compass.readthedocs.io/en/latest/building-blocks/readme-badges.html
- Concluded that our CircleCI badge is good because it let's us overview
the repo's build systems, but that it is bad because it is only is about
documentation preview in PRs which isn't useful in a README's header in
a way.
- Noted there was a CircleCI token in the badge, that I believe is meant
to be used with private repo access rather than public repo access. I'm
not sure we need that but I made it a markdown/html comment for now.
- Decided to not manually add a line break between badges. I figured it
could make sense to break manually before the social badges instead of
automatically letting it wrap at some point, but we don't really know
the size of the window viewing so it felt like a bad idea to hardcode
that.
- When the Dockerfile was turned into a multi-stage build, it seems
the share/ directory was not copied to the final image. This
resulted in certain components (static/components/, static/css/)
being missing, which resulted in the JupyterHub share directory not
being findable (in jupyterhub/_data.py). This led to all kinds of
weird havoc, like templates not being findable (#2852).
- I am still unsure if this is the right fix, please check this well.
- Closes: #2852
- While debugging another problem, I noticed some failures to build
the C extensions in the logs. Adding build-essential should fix
that (also as mentioned in the logs themselves).
- Extensions failed for tornado, sqlalchemy, and pyrsistent(pvectorc)
and can be found by searching the previous output for "fail".
Closes#2819 by exiting JupyterHub directly with an error if a config
file has been specified for the config_file traitlet, for example
through the -f or --config flag, but isn't available on the file
system.
- In the cull script, the max_age and inactive_limit are used from the
outer scope. In the case that you add extra logic, one may want to
modify these values.
- In that case, you either have to rename them locally, or access the
outer scope with "nonlocal", the first of which is too much work,
the second of which has a high chance of introducing bugs (as it did
for me).
- This change introduces a fix for everyone. It doesn't change basic
functionality, but makes local modifications simpler.
- Pass in user object & request object only explicitly.
Much better interface that is harder to break by internal
refactoring. We can always add more parameters if needed?
/user-redirect/ is used to help link to a particular url
in the logged in user's authenticated notebook. For example,
if I'm logged in as user 'yuvipanda' and hit the URL
/hub/user-redirect/git-pull, it'll redirect me to
/user/yuvipanda/git-pull. This is extremely useful in
connecting hub links to notebook server extensions, such
as nbgitpuller.
Admins might want to customize how this redirection is done -
for example, redirect users to different running servers
based on the nbgitpuller repository they are linking from.
Adding a hook here helps accomplish that.
allows services to be explicitly blessed to skip the extra oauth confirmation page
added in 1.0
This confirmation page is unhelpful for many admin-managed services,
and is mainly intended for cross-user access.
The default behavior is unchanged, but services can now opt-out of confirmation
(as is done already for the user's own servers).
Use with caution, as this eliminates users' ability to confirm that a service
should be able to authenticate them.
- API requests to non-running servers are not uncommon when you cull
servers and people leave tabs open and active. It returns with 503
and logs all headers, which can take up half of our total log lines
- This avoids logging headers for all 502 and 503 return statuses.
#2747 presented an alternative (more complex) implementation, but this
turned out to be appropriate.
- Closes: #2747
In current versions of MySQL and MariaDB `innodb_file_format`
and `innodb_large_prefix` have been removed. This allows them to not
exist and makes sure the format for the rows are `Dynamic` (default
for current versions).
If init_spawners takes too long (default: 10 seconds) to complete,
app start will be allowed to continue while finishing in the background.
Adds new `check` pending state for the initial check.
Checking lots of spawners can take a long time,
so allowing this to be async limits the impact on startup time
at the expense of starting the Hub in a not-quite-fully-ready state.
- Introduce the EventLog class from BinderHub for emitting
structured event data
- Instrument server starts and stops to emit events
- Defaults to not saving any events anywhere
The flask example in the documentation was still using the
input argument `cookie_cache_max_age` when instantiating
`HubAuth` object. `cookie_cache_max_age` is deprecated since
JupyterHub 0.8 and should be replaced by `cache_max_age`.
- Install pip in the docs conda env (or conda complains).
- Do not override page.html, the next/previous buttons are now handled by
alabaster_jupyterhub (this actually remove the duplicated next/prev
buttons)
- use alabaster_jupyterhub when building locally, this make it easy for
new contributor to get the _exact_ same appearance than on
readthedocs.
- cull_idle_servers.py gets the full server state, so is capable of
doing any kind of arbitrary logic on the profile in order to be more
flexible in culling.
- This patch does not change anything, but gives an embedded
(commented out) example of how you can easily add custom logic to
the script.
- This was added as a tempate/demo for #2598.
* Add missing responses (doesn't include all possible responses yet)
* Refactor invalid multi in body parameters into a single parameter
* Change form type into valid formData
* Fix use of required fields
* Apply a few other minor fixes
Fixes https://github.com/jupyterhub/jupyterhub/issues/2566 to some
degree by making the announcement stand out using twitter-bootstrap
classes `alert` and `alert-warning`. Perhaps we could theme twitter
bootstrap or this alert specifically with jupyter related colors as well
though?
Windows doesn't have support for signal handling so it can't use the
signal handling capabilities of asyncio. Use the previous atexit
strategy on the Windows case instead.
Signed-off-by: Alejandro Del Castillo <alejandro.delcastillo@ni.com>
Big thanks to Erik, Tim, and Min for the great comments!
Change names to be more clear, add function doc comments,
change scoping on some functions, add handle_logout to let
people take custom logout actions, extract
render_logout_page from get method, add TODO.
AS A developer of a Logout handler
I WANT to be able to call a function to kill spawners and
do other backend logout stuff and a separate function to
forward the user along the logout chain.
I believe this PR adds (moderately private) methods to the
Logout Handler to do just that.
If you are reporting an issue with JupyterHub, please use the [GitHub issue](https://github.com/jupyterhub/jupyterhub/issues) search feature to check if your issue has been asked already. If it has, please add your comments to the existing issue.
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
- Running `jupyter troubleshoot` from the command line, if possible, and posting
its output would also be helpful.
- Running in `--debug` mode can also be helpful for troubleshooting.
I closed this issue because it was labelled as a support question.
Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.
Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.
Thanks you for being an active member of our community! :heart:
Please note that this repository is participating in a study into the sustainability of open source projects. Data will be gathered about this repository for approximately the next 12 months, starting from 2021-06-11.
Data collected will include the number of contributors, number of PRs, time taken to close/merge these PRs, and issues closed.
For more information, please visit
[our informational page](https://sustainable-open-science-and-software.github.io/) or download our [participant information sheet](https://sustainable-open-science-and-software.github.io/assets/PIS_sustainable_software.pdf).
[](https://github.com/jupyterhub/jupyterhub/actions)
If you believe you’ve found a security vulnerability in a Jupyter
project, please report it to security@ipython.org. If you prefer to
encrypt your security reports, you can use [this PGP public key](https://jupyter-notebook.readthedocs.io/en/stable/_downloads/1d303a645f2505a8fd283826fafc9908/ipython_security.asc).
- making an API request programmatically using the requests library
- learning more about JupyterHub's API
The same JupyterHub API spec, as found here, is available in an interactive form
`here (on swagger's petstore) <http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/docs/rest-api.yml#!/default>`__.
The `OpenAPI Initiative`_ (fka Swagger™) is a project used to describe
JupyterHub can be configured to record structured events from a running server using Jupyter's `Telemetry System`_. The types of events that JupyterHub emits are defined by `JSON schemas`_ listed at the bottom of this page_.
- [Press release on Jupyter and Cori](http://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2016/jupyter-notebooks-will-open-up-new-possibilities-on-nerscs-cori-supercomputer/)
- [Moving and sharing data](https://www.nersc.gov/assets/Uploads/03-MovingAndSharingData-Cholia.pdf)
@@ -28,7 +30,7 @@ Please submit pull requests to update information or to add new institutions or
### University of California Davis
- [Spinning up multiple Jupyter Notebooks on AWS for a tutorial](https://github.com/mblmicdiv/course2017/blob/master/exercises/sourmash-setup.md)
- [Spinning up multiple Jupyter Notebooks on AWS for a tutorial](https://github.com/mblmicdiv/course2017/blob/HEAD/exercises/sourmash-setup.md)
Although not technically a JupyterHub deployment, this tutorial setup
may be helpful to others in the Jupyter community.
@@ -59,6 +61,13 @@ easy to do with RStudio too.
- [jupyterhub-deploy-teaching](https://github.com/jupyterhub/jupyterhub-deploy-teaching) based on work by Brian Granger for Cal Poly's Data Science 301 Course
### Chameleon
[Chameleon](https://www.chameleoncloud.org) is a NSF-funded configurable experimental environment for large-scale computer science systems research with [bare metal reconfigurability](https://chameleoncloud.readthedocs.io/en/latest/technical/baremetal.html). Chameleon users utilize JupyterHub to document and reproduce their complex CISE and networking experiments.
- [Shared JupyterHub](https://jupyter.chameleoncloud.org): provides a common "workbench" environment for any Chameleon user.
- [Trovi](https://www.chameleoncloud.org/experiment/share): a sharing portal of experiments, tutorials, and examples, which users can launch as a dedicated isolated environments on Chameleon's JupyterHub.
### Clemson University
- Advanced Computing
@@ -67,6 +76,7 @@ easy to do with RStudio too.
### University of Colorado Boulder
- (CU Research Computing) CURC
- [JupyterHub User Guide](https://www.rc.colorado.edu/support/user-guide/jupyterhub.html)
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
@@ -77,13 +87,17 @@ easy to do with RStudio too.
- Earth Lab at CU
- [Tutorial on Parallel R on JupyterHub](https://earthdatascience.org/tutorials/parallel-r-on-jupyterhub/)
### George Washington University
- [Jupyter Hub](http://go.gwu.edu/jupyter) with university single-sign-on. Deployed early 2017.
### HTCondor
- [HTCondor Python Bindings Tutorial from HTCondor Week 2017 includes information on their JupyterHub tutorials](https://research.cs.wisc.edu/htcondor/HTCondorWeek2017/presentations/TueBockelman_Python.pdf)
- [nbgraderutils](https://github.com/dice-group/nbgraderutils): Use JupyterHub + nbgrader + iJava kernel for online Java exercises. Used in lecture Statistical Natural Language Processing.
### Penn State University
- [Press release](https://news.psu.edu/story/523093/2018/05/24/new-open-source-web-apps-available-students-and-faculty): "New open-source web apps available for students and faculty" (but Hub is currently down; checked 04/26/19)
- [Deploy JupyterHub on a Supercomputer with SSH](https://zonca.github.io/2017/05/jupyterhub-hpc-batchspawner-ssh.html)
- [Run Jupyterhub on a Supercomputer](https://zonca.github.io/2015/04/jupyterhub-hpc.html)
- [Deploy JupyterHub on a VM for a Workshop](https://zonca.github.io/2016/04/jupyterhub-sdsc-cloud.html)
@@ -134,7 +153,10 @@ easy to do with RStudio too.
- Kristen Thyng - Oceanography
- [Teaching with JupyterHub and nbgrader](http://kristenthyng.com/blog/2016/09/07/jupyterhub+nbgrader/)
### Elucidata
- What's new in Jupyter Notebooks @[Elucidata](https://elucidata.io/):
- Using Jupyter Notebooks with Jupyterhub on GCP, managed by GKE - https://medium.com/elucidata/why-you-should-be-using-a-jupyter-notebook-8385a4ccd93d
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
Sharing links to notebooks is a common activity,
and can look different based on what you mean.
Your first instinct might be to copy the URL you see in the browser,
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
However, let's break down what this URL means:
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
which means that sharing this URL is asking the person you share the link with
to come to _your server_ and look at the exact same file.
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
But what is our actual goal?
A typical situation is that you have some shared or common filesystem,
such that the same path corresponds to the same document
(either the exact same document or another copy of it).
Typically, what folks want when they do sharing like this
is for each visitor to open the same file _on their own server_,
so Breq would open `/user/breq/notebooks/foo.ipynb` and
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
JupyterHub has a special URL that does exactly this!
It's called `/hub/user-redirect/...`.
So if you replace `/user/yourname` in your URL bar
with `/hub/user-redirect` any visitor should get the same
URL on their own server, rather than visiting yours.
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
2. If you need to allow for even more users, a dynamic amount of servers can be used on a cloud,
take a look at the `Zero to JupyterHub with Kubernetes <https://github.com/jupyterhub/zero-to-jupyterhub-k8s>`__ .
Four subsystems make up JupyterHub:
* a **Hub** (tornado process) that is the heart of JupyterHub
* a **configurable http proxy** (node-http-proxy) that receives the requests from the client's browser
* multiple **single-user Jupyter notebook servers** (Python/IPython/tornado) that are monitored by Spawners
* an **authentication class** that manages how users can access the system
Besides these central pieces, you can add optional configurations through a `config.py` file and manage users kernels on an admin panel. A simplification of the whole system can be seen in the figure below:
Role Based Access Control (RBAC) in JupyterHub serves to provide fine grained control of access to Jupyterhub's API resources.
RBAC is new in JupyterHub 2.0.
## Motivation
The JupyterHub API requires authorization to access its APIs.
This ensures that an arbitrary user, or even an unauthenticated third party, are not allowed to perform such actions.
For instance, the behaviour prior to adoption of RBAC is that creating or deleting users requires _admin rights_.
The prior system is functional, but lacks flexibility. If your Hub serves a number of users in different groups, you might want to delegate permissions to other users or automate certain processes.
Prior to RBAC, appointing a 'group-only admin' or a bot that culls idle servers, requires granting full admin rights to all actions. This poses a risk of the user or service intentionally or unintentionally accessing and modifying any data within the Hub and violates the [principle of least privilege](https://en.wikipedia.org/wiki/Principle_of_least_privilege).
To remedy situations like this, JupyterHub is transitioning to an RBAC system. By equipping users, groups and services with _roles_ that supply them with a collection of permissions (_scopes_), administrators are able to fine-tune which parties are granted access to which resources.
## Definitions
**Scopes** are specific permissions used to evaluate API requests. For example: the API endpoint `users/servers`, which enables starting or stopping user servers, is guarded by the scope `servers`.
Scopes are not directly assigned to requesters. Rather, when a client performs an API call, their access will be evaluated based on their assigned roles.
**Roles** are collections of scopes that specify the level of what a client is allowed to do. For example, a group administrator may be granted permission to control the servers of group members, but not to create, modify or delete group members themselves.
Within the RBAC framework, this is achieved by assigning a role to the administrator that covers exactly those privileges.
JupyterHub provides four roles that are available by default:
```{admonition} **Default roles**
- `user` role provides a {ref}`default user scope <default-user-scope-target>` `self` that grants access to the user's own resources.
- `admin` role contains all available scopes and grants full rights to all actions. This role **cannot be edited**.
- `token` role provides a {ref}`default token scope <default-token-scope-target>` `all` that resolves to the same permissions as the owner of the token has.
- `server` role allows for posting activity of "itself" only.
**These roles cannot be deleted.**
```
These default roles have a default collection of scopes,
but you can define the scopes associated with each role (excluding admin) to suit your needs,
as seen [below](overriding-default-roles).
The `user`, `admin`, and `token` roles by default all preserve the permissions prior to RBAC.
Only the `server` role is changed from pre-2.0, to reduce its permissions to activity-only
instead of the default of a full access token.
Additional custom roles can also be defined (see {ref}`define-role-target`).
Roles can be assigned to the following entities:
- Users
- Services
- Groups
- Tokens
An entity can have zero, one, or multiple roles, and there are no restrictions on which roles can be assigned to which entity. Roles can be added to or removed from entities at any time.
**Users** \
When a new user gets created, they are assigned their default role `user`. Additionaly, if the user is created with admin privileges (via `c.Authenticator.admin_users` in `jupyterhub_config.py` or `admin: true` via API), they will be also granted `admin` role. If existing user's admin status changes via API or `jupyterhub_config.py`, their default role will be updated accordingly (after next startup for the latter).
**Services** \
Services do not have a default role. Services without roles have no access to the guarded API end-points, so most services will require assignment of a role in order to function.
**Groups** \
A group does not require any role, and has no roles by default. If a user is a member of a group, they automatically inherit any of the group's permissions (see {ref}`resolving-roles-scopes-target` for more details). This is useful for assigning a set of common permissions to several users.
**Tokens** \
A token’s permissions are evaluated based on their owning entity. Since a token is always issued for a user or service, it can never have more permissions than its owner. If no specific role is requested for a new token, the token is assigned the `token` role.
(define-role-target)=
## Defining Roles
Roles can be defined or modified in the configuration file as a list of dictionaries. An example:
% TODO: think about loading users into roles if membership has been changed via API.
% What should be the result?
```python
# in jupyterhub_config.py
c.JupyterHub.load_roles = [
{
'name': 'server-rights',
'description': 'Allows parties to start and stop user servers',
'scopes': ['servers'],
'users': ['alice', 'bob'],
'services': ['idle-culler'],
'groups': ['admin-group'],
}
]
```
The role `server-rights` now allows the starting and stopping of servers by any of the following:
- users `alice` and `bob`
- the service `idle-culler`
- any member of the `admin-group`.
```{attention}
Tokens cannot be assigned roles through role definition but may be assigned specific roles when requested via API (see {ref}`requesting-api-token-target`).
```
Another example:
```python
# in jupyterhub_config.py
c.JupyterHub.load_roles = [
{
'description': 'Read-only user models',
'name': 'reader',
'scopes': ['read:users'],
'services': ['external'],
'users': ['maria', 'joe']
}
]
```
The role `reader` allows users `maria` and `joe` and service `external` to read (but not modify) any user’s model.
```{admonition} Requirements
:class: warning
In a role definition, the `name` field is required, while all other fields are optional.\
**Role names must:**
- be 3 - 255 characters
- use ascii lowercase, numbers, 'unreserved' URL punctuation `-_.~`
- start with a letter
- end with letter or number.
`users`, `services`, and `groups` only accept objects that already exist in the database or are defined previously in the file.
It is not possible to implicitly add a new user to the database by defining a new role.
```
If no scopes are defined for _new role_, JupyterHub will raise a warning. Providing non-existing scopes will result in an error.
In case the role with a certain name already exists in the database, its definition and scopes will be overwritten. This holds true for all roles except the `admin` role, which cannot be overwritten; an error will be raised if trying to do so. All the role bearers permissions present in the definition will change accordingly.
(overriding-default-roles)=
### Overriding default roles
Role definitions can include those of the "default" roles listed above (admin excluded),
if the default scopes associated with those roles do not suit your deployment.
For example, to specify what permissions the $JUPYTERHUB_API_TOKEN issued to all single-user servers
has,
define the `server` role.
To restore the JupyterHub 1.x behavior of servers being able to do anything their owners can do,
use the scope `inherit` (for 'inheriting' the owner's permissions):
```python
c.JupyterHub.load_roles = [
{
'name': 'server',
'scopes': ['inherit'],
}
]
```
or, better yet, identify the specific [scopes][] you want server environments to have access to.
[scopes]: available-scopes-target
If you don't want to get too detailed,
one option is the `self` scope,
which will have no effect on non-admin users,
but will restrict the token issued to admin user servers to only have access to their own resources,
instead of being able to take actions on behalf of all other users.
```python
c.JupyterHub.load_roles = [
{
'name': 'server',
'scopes': ['self'],
}
]
```
(removing-roles-target)=
## Removing roles
Only the entities present in the role definition in the `jupyterhub_config.py` remain the role bearers. If a user, service or group is removed from the role definition, they will lose the role on the next startup.
Once a role is loaded, it remains in the database until removing it from the `jupyterhub_config.py` and restarting the Hub. All previously defined role bearers will lose the role and associated permissions. Default roles, even if previously redefined through the config file and removed, will not be deleted from the database.
A scope has a syntax-based design that reveals which resources it provides access to. Resources are objects with a type, associated data, relationships to other resources, and a set of methods that operate on them (see [RESTful API](https://restful-api-design.readthedocs.io/en/latest/resources.html) documentation for more information).
`<resource>` in the RBAC scope design refers to the resource name in the [JupyterHub's API](../reference/rest-api.rst) endpoints in most cases. For instance, `<resource>` equal to `users` corresponds to JupyterHub's API endpoints beginning with _/users_.
(scope-conventions-target)=
## Scope conventions
-`<resource>` \
The top-level `<resource>` scopes, such as `users` or `groups`, grant read, write, and list permissions to the resource itself as well as its sub-resources. For example, the scope `users:activity` is included in the scope `users`.
-`read:<resource>` \
Limits permissions to read-only operations on single resources.
-`list:<resource>` \
Read-only access to listing endpoints.
Use `read:<resource>:<subresource>` to control what fields are returned.
-`admin:<resource>` \
Grants additional permissions such as create/delete on the corresponding resource in addition to read and write permissions.
-`access:<resource>` \
Grants access permissions to the `<resource>` via API or browser.
-`<resource>:<subresource>` \
The {ref}`vertically filtered <vertical-filtering-target>` scopes provide access to a subset of the information granted by the `<resource>` scope. E.g., the scope `users:activity` only provides permission to post user activity.
-`<resource>!<object>=<objectname>` \
{ref}`horizontal-filtering-target` is implemented by the `!<object>=<objectname>`scope structure. A resource (or sub-resource) can be filtered based on `user`, `server`, `group` or `service` name. For instance, `<resource>!user=charlie` limits access to only return resources of user `charlie`. \
Only one filter per scope is allowed, but filters for the same scope have an additive effect; a larger filter can be used by supplying the scope multiple times with different filters.
By adding a scope to an existing role, all role bearers will gain the associated permissions.
## Metascopes
Metascopes do not follow the general scope syntax. Instead, a metascope resolves to a set of scopes, which can refer to different resources, based on their owning entity. In JupyterHub, there are currently two metascopes:
1. default user scope `self`, and
2. default token scope `all`.
(default-user-scope-target)=
### Default user scope
Access to the user's own resources and subresources is covered by metascope `self`. This metascope includes the user's model, activity, servers and tokens. For example, `self` for a user named "gerard" includes:
-`users!user=gerard` where the `users` scope provides access to the full user model and activity. The filter restricts this access to the user's own resources.
-`servers!user=gerard` which grants the user access to their own servers without being able to create/delete any.
-`tokens!user=gerard` which allows the user to access, request and delete their own tokens.
-`access:servers!user=gerard` which allows the user to access their own servers via API or browser.
The `self` scope is only valid for user entities. In other cases (e.g., for services) it resolves to an empty set of scopes.
(default-token-scope-target)=
### Default token scope
The token metascope `all` covers the same scopes as the token owner's scopes during requests. For example, if a token owner has roles containing the scopes `read:groups` and `read:users`, the `all` scope resolves to the set of scopes `{read:groups, read:users}`.
If the token owner has default `user` role, the `all` scope resolves to `self`, which will subsequently be expanded to include all the user-specific scopes (or empty set in the case of services).
If the token owner is a member of any group with roles, the group scopes will also be included in resolving the `all` scope.
(horizontal-filtering-target)=
## Horizontal filtering
Horizontal filtering, also called _resource filtering_, is the concept of reducing the payload of an API call to cover only the subset of the _resources_ that the scopes of the client provides them access to.
Requested resources are filtered based on the filter of the corresponding scope. For instance, if a service requests a user list (guarded with scope `read:users`) with a role that only contains scopes `read:users!user=hannah` and `read:users!user=ivan`, the returned list of user models will be an intersection of all users and the collection `{hannah, ivan}`. In case this intersection is empty, the API call returns an HTTP 404 error, regardless if any users exist outside of the clients scope filter collection.
In case a user resource is being accessed, any scopes with _group_ filters will be expanded to filters for each _user_ in those groups.
### `!user` filter
The `!user` filter is a special horizontal filter that strictly refers to the **"owner only"** scopes, where _owner_ is a user entity. The filter resolves internally into `!user=<ownerusername>` ensuring that only the owner's resources may be accessed through the associated scopes.
For example, the `server` role assigned by default to server tokens contains `access:servers!user` and `users:activity!user` scopes. This allows the token to access and post activity of only the servers owned by the token owner.
The filter can be applied to any scope.
(vertical-filtering-target)=
## Vertical filtering
Vertical filtering, also called _attribute filtering_, is the concept of reducing the payload of an API call to cover only the _attributes_ of the resources that the scopes of the client provides them access to. This occurs when the client scopes are subscopes of the API endpoint that is called.
For instance, if a client requests a user list with the only scope being `read:users:groups`, the returned list of user models will contain only a list of groups per user.
In case the client has multiple subscopes, the call returns the union of the data the client has access to.
The payload of an API call can be filtered both horizontally and vertically simultaneously. For instance, performing an API call to the endpoint `/users/` with the scope `users:name!user=juliette` returns a payload of `[{name: 'juliette'}]` (provided that this name is present in the database).
(available-scopes-target)=
## Available scopes
Table below lists all available scopes and illustrates their hierarchy. Indented scopes indicate subscopes of the scope(s) above them.
There are four exceptions to the general {ref}`scope conventions <scope-conventions-target>`:
-`read:users:name` is a subscope of both `read:users` and `read:servers`. \
The `read:servers` scope requires access to the user name (server owner) due to named servers distinguished internally in the form `!server=username/servername`.
-`read:users:activity` is a subscope of both `read:users` and `users:activity`. \
Posting activity via the `users:activity`, which is not included in `users` scope, needs to check the last valid activity of the user.
-`read:roles:users` is a subscope of both `read:roles` and `admin:users`. \
Admin privileges to the _users_ resource include the information about user roles.
-`read:roles:groups` is a subscope of both `read:roles` and `admin:groups`. \
Similar to the `read:roles:users` above.
```{include} scope-table.md
```
```{Caution}
Note that only the {ref}`horizontal filtering <horizontal-filtering-target>` can be added to scopes to customize them. \
Metascopes `self` and `all`, `<resource>`, `<resource>:<subresource>`, `read:<resource>`, `admin:<resource>`, and `access:<resource>` scopes are predefined and cannot be changed otherwise.
```
### Scopes and APIs
The scopes are also listed in the [](../reference/rest-api.rst) documentation. Each API endpoint has a list of scopes which can be used to access the API; if no scopes are listed, the API is not authenticated and can be accessed without any permissions (i.e., no scopes).
Listed scopes by each API endpoint reflect the "lowest" permissions required to gain any access to the corresponding API. For example, posting user's activity (_POST /users/:name/activity_) needs `users:activity` scope. If scope `users` is passed during the request, the access will be granted as the required scope is a subscope of the `users` scope. If, on the other hand, `read:users:activity` scope is passed, the access will be denied.
Roles are stored in the database, where they are associated with users, services, etc., and can be added or modified as explained in {ref}`define-role-target` section. Users, services, groups, and tokens can gain, change, and lose roles. This is currently achieved via `jupyterhub_config.py` (see {ref}`define-role-target`) and will be made available via API in future. The latter will allow for changing a token's role, and thereby its permissions, without the need to issue a new token.
Roles and scopes utilities can be found in `roles.py` and `scopes.py` modules. Scope variables take on five different formats which is reflected throughout the utilities via specific nomenclature:
```{admonition} **Scope variable nomenclature**
:class: tip
- _scopes_ \
List of scopes with abbreviations (used in role definitions). E.g., `["users:activity!user"]`.
- _expanded scopes_ \
Set of expanded scopes without abbreviations (i.e., resolved metascopes, filters and subscopes). E.g., `{"users:activity!user=charlie", "read:users:activity!user=charlie"}`.
- _parsed scopes_ \
Dictionary JSON like format of expanded scopes. E.g., `{"users:activity": {"user": ["charlie"]}, "read:users:activity": {"users": ["charlie"]}}`.
- _intersection_ \
Set of expanded scopes as intersection of 2 expanded scope sets.
- _identify scopes_ \
Set of expanded scopes needed for identify (whoami) endpoints.
```
(resolving-roles-scopes-target)=
## Resolving roles and scopes
**Resolving roles** refers to determining which roles a user, service, token, or group has, extracting the list of scopes from each role and combining them into a single set of scopes.
**Resolving scopes** involves expanding scopes into all their possible subscopes (_expanded scopes_), parsing them into format used for access evaluation (_parsed scopes_) and, if applicable, comparing two sets of scopes (_intersection_). All procedures take into account the scope hierarchy, {ref}`vertical <vertical-filtering-target>` and {ref}`horizontal filtering <horizontal-filtering-target>`, limiting or elevated permissions (`read:<resource>` or `admin:<resource>`, respectively), and metascopes.
Roles and scopes are resolved on several occasions, for example when requesting an API token with specific roles or making an API request. The following sections provide more details.
(requesting-api-token-target)=
### Requesting API token with specific roles
API tokens grant access to JupyterHub's APIs. The RBAC framework allows for requesting tokens with specific existing roles. To date, it is only possible to add roles to a token through the _POST /users/:name/tokens_ API where the roles can be specified in the token parameters body (see [](../reference/rest-api.rst)).
RBAC adds several steps into the token issue flow.
If no roles are requested, the token is issued with the default `token` role (providing the requester is allowed to create the token).
If the token is requested with any roles, the permissions of requesting entity are checked against the requested permissions to ensure the token would not grant its owner additional privileges.
If, due to modifications of roles or entities, at API request time a token has any scopes that its owner does not, those scopes are removed. The API request is resolved without additional errors using the scopes _intersection_, but the Hub logs a warning (see {ref}`Figure 2 <api-request-chart>`).
Resolving a token's roles (yellow box in {ref}`Figure 1 <token-request-chart>`) corresponds to resolving all the token's owner roles (including the roles associated with their groups) and the token's requested roles into a set of scopes. The two sets are compared (Resolve the scopes box in orange in {ref}`Figure 1 <token-request-chart>`), taking into account the scope hierarchy but, solely for role assignment, omitting any {ref}`horizontal filter <horizontal-filtering-target>` comparison. If the token's scopes are a subset of the token owner's scopes, the token is issued with the requested roles; if not, JupyterHub will raise an error.
{ref}`Figure 1 <token-request-chart>` below illustrates the steps involved. The orange rectangles highlight where in the process the roles and scopes are resolved.
Figure 1. Resolving roles and scopes during API token request
```
### Making an API request
With the RBAC framework each authenticated JupyterHub API request is guarded by a scope decorator that specifies which scopes are required to gain the access to the API.
When an API request is performed, the requesting API token's roles are again resolved (yellow box in {ref}`Figure 2 <api-request-chart>`) to ensure the token does not grant more permissions than its owner has at the request time (e.g., due to changing/losing roles).
If the owner's roles do not include some scopes of the token's scopes, only the _intersection_ of the token's and owner's scopes will be used. For example, using a token with scope `users` whose owner's role scope is `read:users:name` will result in only the `read:users:name` scope being passed on. In the case of no _intersection_, an empty set of scopes will be used.
The passed scopes are compared to the scopes required to access the API as follows:
- if the API scopes are present within the set of passed scopes, the access is granted and the API returns its "full" response
- if that is not the case, another check is utilized to determine if subscopes of the required API scopes can be found in the passed scope set:
- if found, the RBAC framework employs the {ref}`filtering <vertical-filtering-target>` procedures to refine the API response to access only resource attributes corresponding to the passed scopes. For example, providing a scope `read:users:activity!group=class-C` for the _GET /users_ API will return a list of user models from group `class-C` containing only the `last_activity` attribute for each user model
- if not found, the access to API is denied
{ref}`Figure 2 <api-request-chart>` illustrates this process highlighting the steps where the role and scope resolutions as well as filtering occur in orange.
```{figure} ../images/rbac-api-request-chart.png
:align: center
:name: api-request-chart
Figure 2. Resolving roles and scopes when an API request is made
RBAC framework requires different database setup than any previous JupyterHub versions due to eliminating the distinction between OAuth and API tokens (see {ref}`oauth-vs-api-tokens-target` for more details). This requires merging the previously two different database tables into one. By doing so, all existing tokens created before the upgrade no longer comply with the new database version and must be replaced.
This is achieved by the Hub deleting all existing tokens during the database upgrade and recreating the tokens loaded via the `jupyterhub_config.py` file with updated structure. However, any manually issued or stored tokens are not recreated automatically and must be manually re-issued after the upgrade.
No other database records are affected.
(rbac-upgrade-steps-target)=
## Upgrade steps
1. All running **servers must be stopped** before proceeding with the upgrade.
2. To upgrade the Hub, follow the [Upgrading JupyterHub](../admin/upgrading.rst) instructions.
```{attention}
We advise against defining any new roles in the `jupyterhub.config.py` file right after the upgrade is completed and JupyterHub restarted for the first time. This preserves the 'current' state of the Hub. You can define and assign new roles on any other following startup.
```
3. After restarting the Hub **re-issue all tokens that were previously issued manually** (i.e., not through the `jupyterhub_config.py` file).
When the JupyterHub is restarted for the first time after the upgrade, all users, services and tokens stored in the database or re-loaded through the configuration file will be assigned their default role. Any newly added entities after that will be assigned their default role only if no other specific role is requested for them.
## Changing the permissions after the upgrade
Once all the {ref}`upgrade steps <rbac-upgrade-steps-target>` above are completed, the RBAC framework will be available for utilization. You can define new roles, modify default roles (apart from `admin`) and assign them to entities as described in the {ref}`define-role-target` section.
We recommended the following procedure to start with RBAC:
1. Identify which admin users and services you would like to grant only the permissions they need through the new roles.
2. Strip these users and services of their admin status via API or UI. This will change their roles from `admin` to `user`.
```{note}
Stripping entities of their roles is currently available only via `jupyterhub_config.py` (see {ref}`removing-roles-target`).
```
3. Define new roles that you would like to start using with appropriate scopes and assign them to these entities in `jupyterhub_config.py`.
4. Restart the JupyterHub for the new roles to take effect.
(oauth-vs-api-tokens-target)=
## OAuth vs API tokens
### Before RBAC
Previous JupyterHub versions utilize two types of tokens, OAuth token and API token.
OAuth token is issued by the Hub to a single-user server when the user logs in. The token is stored in the browser cookie and is used to identify the user who owns the server during the OAuth flow. This token by default expires when the cookie reaches its expiry time of 2 weeks (or after 1 hour in JupyterHub versions < 1.3.0).
API token is issued by the Hub to a single-user server when launched and is used to communicate with the Hub's APIs such as posting activity or completing the OAuth flow. This token has no expiry by default.
API tokens can also be issued to users via API ([_/hub/token_](../reference/urls.md) or [_POST /users/:username/tokens_](../reference/rest-api.rst)) and services via `jupyterhub_config.py` to perform API requests.
### With RBAC
The RBAC framework allows for granting tokens different levels of permissions via scopes attached to roles. The 'only identify' purpose of the separate OAuth tokens is no longer required. API tokens can be used used for every action, including the login and authentication, for which an API token with no role (i.e., no scope in {ref}`available-scopes-target`) is used.
OAuth tokens are therefore dropped from the Hub upgraded with the RBAC framework.
Note that in the RBAC system the `admin` field in the `idle-culler` service definition is omitted. Instead, the `idle-culler` role provides the service with only the permissions it needs.
If the optional actions of deleting the idle servers and/or removing inactive users are desired, **change the following scopes** in the `idle-culler` role definition:
- `servers` to `admin:servers` for deleting servers
- `read:users:name`, `read:users:activity` to `admin:users` for deleting users.
```
3. Restart JupyterHub to complete the process.
## API launcher
A service capable of creating/removing users and launching multiple servers should have access to:
1. _POST_ and _DELETE /users_
2. _POST_ and _DELETE /users/:name/server_ or _/users/:name/servers/:server_name_
3. Creating/deleting servers
The scopes required to access the API enpoints:
1. `admin:users`
2. `servers`
3. `admin:servers`
From the above, the role definition is:
```python
# in jupyterhub_config.py
c.JupyterHub.load_roles = [
{
"name": "api-launcher",
"description": "Manages servers",
"scopes": ["admin:users", "admin:servers"],
"services": [<service_name>]
}
]
```
If needed, the scopes can be modified to limit the permissions to e.g. a particular group with `!group=groupname` filter.
## Group admin roles
Roles can be used to specify different group member privileges.
For example, a group of students `class-A` may have a role allowing all group members to access information about their group. Teacher `johan`, who is a student of `class-A` but a teacher of another group of students `class-B`, can have additional role permitting him to access information about `class-B` students as well as start/stop their servers.
The roles can then be defined as follows:
```python
# in jupyterhub_config.py
c.JupyterHub.load_groups = {
'class-A': ['johan', 'student1', 'student2'],
'class-B': ['student3', 'student4']
}
c.JupyterHub.load_roles = [
{
'name': 'class-A-student',
'description': 'Grants access to information about the group',
'scopes': ['read:groups!group=class-A'],
'groups': ['class-A']
},
{
'name': 'class-B-student',
'description': 'Grants access to information about the group',
'scopes': ['read:groups!group=class-B'],
'groups': ['class-B']
},
{
'name': 'teacher',
'description': 'Allows for accessing information about teacher group members and starting/stopping their servers',
In the above example, `johan` has privileges inherited from `class-A-student` role and the `teacher` role on top of those.
```{note}
The scope filters (`!group=`) limit the privileges only to the particular groups. `johan` can access the servers and information of `class-B` group members only.
JupyterHub uses OAuth 2 internally as a mechanism for authenticating users.
As such, JupyterHub itself always functions as an OAuth **provider**.
More on what that means [below](oauth-terms).
Additionally, JupyterHub is _often_ deployed with [oauthenticator](https://oauthenticator.readthedocs.io),
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
When this is the case, there are _two_ nested oauth flows:
an _internal_ oauth flow where JupyterHub is the **provider**,
and and _external_ oauth flow, where JupyterHub is a **client**.
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
Some relevant points:
- Single-user servers _never_ need to communicate with or be aware of the upstream provider configured in your Authenticator.
As far as they are concerned, only JupyterHub is an OAuth provider,
and how users authenticate with the Hub itself is irrelevant.
- When talking to a single-user server,
there are ~always two tokens:
a token issued to the server itself to communicate with the Hub API,
and a second per-user token in the browser to represent the completed login process and authorized permissions.
More on this [later](two-tokens).
(oauth-terms)=
## Key OAuth terms
Here are some key definitions to keep in mind when we are talking about OAuth.
You can also read more detail [here](https://www.oauth.com/oauth2-servers/definitions/).
- **provider** the entity responsible for managing identity and authorization,
always a web server.
JupyterHub is _always_ an oauth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an oauth provider.
- **client** An entity that requests OAuth **tokens** on a user's behalf,
generally a web server of some kind.
OAuth **clients** are services that _delegate_ authentication and/or authorization
to an OAuth **provider**.
JupyterHub _services_ or single-user _servers_ are OAuth **clients** of the JupyterHub **provider**.
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external oauth **provider**, e.g. GitHub.
- **browser** A user's web browser, which makes requests and stores things like cookies
- **token** The secret value used to represent a user's authorization. This is the final product of the OAuth process.
- **code** A short-lived temporary secret that the **client** exchanges
for a **token** at the conclusion of oauth,
in what's generally called the "oauth callback handler."
## One oauth flow
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorized access to a service or single-user server.
A single oauth flow generally goes like this:
### OAuth request and redirect
1. A **browser** makes an HTTP request to an oauth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the oauth **provider** with some extra information:
- the oauth **client id** of the client itself
- the **redirect uri** to be redirected back to after completion
- the **scopes** requested, which the user should be presented with to confirm.
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
3. During this authorize step,
the browser must be _authenticated_ with the provider.
This is often already stored in a cookie,
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
This _may_ even begin another oauth flow!
4. After the user tells the provider that they want to proceed with the authorization,
the provider records this authorization in a short-lived record called an **oauth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect uri"
(or "oauth callback uri"),
with the oauth code in a url parameter.
That's the end of the requests made between the **browser** and the **provider**.
### State after redirect
At this point:
- The browser is authenticated with the _provider_
- The user's authorized permissions are recorded in an _oauth code_
- The _provider_ knows that the given oauth client's requested permissions have been granted, but the client doesn't know this yet.
- All requests so far have been made directly by the browser.
No requests have originated at the client or provider.
### OAuth Client Handles Callback Request
Now we get to finish the OAuth process.
Let's dig into what the oauth client does when it handles
the oauth callback request with the
- The OAuth client receives the _code_ and makes an API request to the _provider_ to exchange the code for a real _token_.
This is the first direct request between the OAuth _client_ and the _provider_.
- Once the token is retrieved, the client _usually_
makes a second API request to the _provider_
to retrieve information about the owner of the token (the user).
This is the step where behavior diverges for different OAuth providers.
Up to this point, all oauth providers are the same, following the oauth specification.
However, oauth does not define a standard for exchanging tokens for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
so this step may be different for each OAuth provider.
- Finally, the oauth client stores its own record that the user is authorized in a cookie.
This could be the token itself, or any other appropriate representation of successful authentication.
- Last of all, now that credentials have been established,
the browser can be redirected to the _original_ URL where it started,
to try the request again.
If the client wasn't able to keep track of the original URL all this time
(not always easy!),
you might end up back at a default landing page instead of where you started the login process. This is frustrating!
😮💨 _phew_.
So that's _one_ OAuth process.
## Full sequence of OAuth in JupyterHub
Let's go through the above oauth process in JupyterHub,
with specific examples of each HTTP request and what information is contained.
For bonus points, we are using the double-oauth example of JupyterHub configured with GitHubOAuthenticator.
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal oauth,"
and the one with JupyterHub as a **client** "external oauth."
Our starting point:
- a user's single-user server is running. Let's call them `danez`
- jupyterhub is running with GitHub as an oauth provider (this means two full instances of oauth),
- Danez has a fresh browser session with no cookies yet
First request:
- browser->single-user server running JupyterLab or Jupyter Classic
-`GET /user/danez/notebooks/mynotebook.ipynb`
- no credentials, so single-user server (as an oauth **client**) starts internal oauth process with JupyterHub (the **provider**)
For retrieval, you only *need* to implement a single method that retrieves all
For retrieval, you only _need_ to implement a single method that retrieves all
routes. The return value for this function should be a dictionary, keyed by
`routespect`, of dicts whose keys are the same three arguments passed to
`add_route` (`routespec`, `target`, `data`)
@@ -204,7 +204,7 @@ setup(
```
If you have added this metadata to your package,
users can select your authenticator with the configuration:
users can select your proxy with the configuration:
```python
c.JupyterHub.proxy_class='mything'
@@ -220,3 +220,11 @@ previously required.
Additionally, configurable attributes for your proxy will
appear in jupyterhub help output and auto-generated configuration files
via `jupyterhub --generate-config`.
### Index of proxies
A list of the proxies that are currently available for JupyterHub (that we know about).
1. [`jupyterhub/configurable-http-proxy`](https://github.com/jupyterhub/configurable-http-proxy) The default proxy which uses node-http-proxy
2. [`jupyterhub/traefik-proxy`](https://github.com/jupyterhub/traefik-proxy) The proxy which configures traefik proxy server for jupyterhub
3. [`AbdealiJK/configurable-http-proxy`](https://github.com/AbdealiJK/configurable-http-proxy) A pure python implementation of the configurable-http-proxy
The thing which users directly connect to is the proxy, by default
@@ -22,7 +21,6 @@ The default JupyterHub proxy is
and that page has some docs. If you are using a different proxy, such
as Traefik, these instructions are probably not relevant to you.
## Configuration options
`c.JupyterHub.cleanup_servers = False` should be set, which tells the
@@ -37,16 +35,12 @@ it yourself).
token for authenticating communication with the proxy.
`c.ConfigurableHTTPProxy.api_url = 'http://localhost:8001'` should be
set to the URL which the hub uses to connect *to the proxy's API*.
set to the URL which the hub uses to connect _to the proxy's API_.
## Proxy configuration
You need to configure a service to start the proxy. An example
command line for this is `configurable-http-proxy --ip=127.0.0.1
--port=8000 --api-ip=127.0.0.1 --api-port=8001
--default-target=http://localhost:8081
--error-target=http://localhost:8081/hub/error`. (Details for how to
command line for this is `configurable-http-proxy --ip=127.0.0.1 --port=8000 --api-ip=127.0.0.1 --api-port=8001 --default-target=http://localhost:8081 --error-target=http://localhost:8081/hub/error`. (Details for how to
do this is out of scope for this tutorial - for example it might be a
systemd service on within another docker cotainer). The proxy has no
configuration files, all configuration is via the command line and
@@ -54,7 +48,7 @@ environment variables.
`--api-ip` and `--api-port` (which tells the proxy where to listen) should match the hub's `ConfigurableHTTPProxy.api_url`.
`--ip`, `-port`, and other options configure the *user* connections to the proxy.
`--ip`, `-port`, and other options configure the _user_ connections to the proxy.
`--default-target` and `--error-target` should point to the hub, and used when users navigate to the proxy originally.
@@ -67,14 +61,12 @@ what other options are needed, for example SSL options. Note that
these are configured in the hub if the hub is starting the proxy - you
need to move the options to here.
## Docker image
You can use [jupyterhub configurable-http-proxy docker
@@ -50,11 +50,8 @@ A Service may have the following properties:
If a service is also to be managed by the Hub, it has a few extra options:
-`command: (str/Popen list`) - Command for JupyterHub to spawn the service.
- Only use this if the service should be a subprocess.
- If command is not specified, the Service is assumed to be managed
externally.
- If a command is specified for launching the Service, the Service will
-`command: (str/Popen list)` - Command for JupyterHub to spawn the service. - Only use this if the service should be a subprocess. - If command is not specified, the Service is assumed to be managed
externally. - If a command is specified for launching the Service, the Service will
be started and managed by the Hub.
-`environment: dict` - additional environment variables for the Service.
-`user: str` - the name of a system user to manage the Service. If
@@ -89,11 +86,20 @@ Hub-Managed Service would include:
This example would be configured as follows in `jupyterhub_config.py`:
```python
c.JupyterHub.load_roles=[
{
"name":"idle-culler",
"scopes":[
"read:users:activity",# read user last_activity
"servers",# start and stop servers
# 'admin:users' # needed if culling idle users as well
We recommend looking at the [`HubOAuth`][huboauth] class implementation for reference,
and taking note of the following process:
1. retrieve the cookie `jupyterhub-services` from the request.
2. Make an API request `GET /hub/api/authorizations/cookie/jupyterhub-services/cookie-value`,
where cookie-value is the url-encoded value of the `jupyterhub-services` cookie.
This request must be authenticated with a Hub API token in the `Authorization` header.
1. retrieve the token from the request.
2. Make an API request `GET /hub/api/user`,
with the token in the `Authorization` header.
For example, with [requests][]:
```python
r = requests.get(
'/'.join((["http://127.0.0.1:8081/hub/api",
"authorizations/cookie/jupyterhub-services",
quote(encrypted_cookie, safe=''),
]),
"http://127.0.0.1:8081/hub/api/user",
headers = {
'Authorization' : 'token %s' % api_token,
'Authorization' : f'token {api_token}',
},
)
r.raise_for_status()
@@ -350,25 +325,39 @@ and taking note of the following process:
3. On success, the reply will be a JSON model describing the user:
```json
```python
{
"name": "inara",
# groups may be omitted, depending on permissions
"groups": ["serenity", "guild"],
# scopes is new in JupyterHub 2.0
"scopes": [
"access:services",
"read:users:name",
"read:users!user=inara",
"..."
]
}
```
The `scopes` field can be used to manage access.
Note: a user will have access to a service to complete oauth access to the service for the first time.
Individual permissions may be revoked at any later point without revoking the token,
in which case the `scopes` field in this model should be checked on each access.
The default required scopes for access are available from `hub_auth.oauth_scopes` or `$JUPYTERHUB_OAUTH_SCOPES`.
An example of using an Externally-Managed Service and authentication is
in [nbviewer README][nbviewer example] section on securing the notebook viewer,
and an example of its configuration is found [here](https://github.com/jupyter/nbviewer/blob/master/nbviewer/providers/base.py#L94).
and an example of its configuration is found [here](https://github.com/jupyter/nbviewer/blob/ed942b10a52b6259099e2dd687930871dc8aac22/nbviewer/providers/base.py#L95).
nbviewer can also be run as a Hub-Managed Service as described [nbviewer README][nbviewer example]
`Spawner.poll` should check if the spawner is still running.
@@ -72,13 +115,12 @@ It should return `None` if it is still running,
and an integer exit status, otherwise.
For the local process case, `Spawner.poll` uses `os.kill(PID, 0)`
to check if the local process is still running.
to check if the local process is still running. On Windows, it uses `psutil.pid_exists`.
### Spawner.stop
`Spawner.stop` should stop the process. It must be a tornado coroutine, which should return when the process has finished exiting.
## Spawner state
JupyterHub should be able to stop and restart without tearing down
@@ -110,7 +152,6 @@ def clear_state(self):
self.pid=0
```
## Spawner options form
(new in 0.4)
@@ -127,7 +168,7 @@ If the `Spawner.options_form` is defined, when a user tries to start their serve
If `Spawner.options_form` is undefined, the user's server is spawned directly, and no spawn page is rendered.
See [this example](https://github.com/jupyterhub/jupyterhub/blob/master/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
See [this example](https://github.com/jupyterhub/jupyterhub/blob/HEAD/examples/spawn-form/jupyterhub_config.py) for a form that allows custom CLI args for the local spawner.
### `Spawner.options_from_form`
@@ -168,8 +209,7 @@ which would return:
When `Spawner.start` is called, this dictionary is accessible as `self.user_options`.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.