sevices/auth prevents calling check_xsrf_cookie,
but if the Handler itself called it the newly strict check would still be applied
this ensures the check is actually allowed for navigate GET requests
rather than attempting to clear multiple tokens (too complicated, breaks named servers)
look for and accept first valid token
have to do our own cookie parsing because existing cookie implementations only return a single value for each key
and default to selecting the _least_ likely to be correct, according to RFCs.
set updated xsrf cookie on login to avoid needing two requests to get the right cookie
need to inject our override into the base class,
rather than at the instance level,
to avoid clobbering any overrides in extensions like jupyter-server-proxy
- suggest token page first
- remove caveat about JupyterHub 0.8, which can be assumed now
- undocument `jupyterhub token`
- refresh token page screenshots, and remove duplicate screenshot of the token page
- minor improvements to language in token page
`Configuration Reference` sounds like it's the place to go to see the full list of JupyterHub config options.
However it's not very readable as it's a plain-text dump of the output of `jupyterhub --generate-config`.
This links to some of the API doc pages instead, which present most of the information in an easier to read format. Unfortunately it also includes a lot of non-traitlets documentation.
Added Playwright tests which are covered Login, Spawning, Home and Token pages
Removed Selenium cases which are covered Login, Spawning, Home and Token pages
consistent behavior with oauth_client_allowed_scopes,
where the _intersection_ of requested and owner-held permissions is granted,
instead of failing
Enables different users to have different permissions in $JUPTYERHUB_API_TOKEN,
either via callables or via requesting as much as you may want and only granting the subset.
Additionally, the !server filter can now be correctly applied to the server token
default behavior is unchanged
According to the review updated cases covered the Admin UI:
- rename the function,
- add docstring,
- add parametrization,
- use the fixture admin_user instead of the function
mostly a copy (fork) of singleuser app
using public APIs instead of lots of patching.
opt-in via `JUPYTERHUB_SINGLEUSER_EXTENSION=1`
related changes:
- stop running a test single-user server in a thread. It's complicated and fragile.
Instead, run it normally, and get the info we need from a custom handler registered via an extension
via the `full_spawn` fixture
use async fixtures for simpler event-loop integration
several of these fixtures were written before fixtures themselves could be async,
but now they can, which means we can use async/await instead of run_sync.
removes some workarounds needed for sqlalchemy 1.1 + 2.0 support
1.4 backports most 2.0 behavior, keeping it off-by-default for an easier opt-in transition
opt-in with `session.future = True` flag
- avoid backref warnings by adding objects to session explicitly before creating any relationships
- remove unnecessary `[]` around scalar query
- use `text()` wrapper on connection.execute
- engine.execute is removed
- update import of declarative_base
- ensure RemovedIn20Warning is available for warnings filters on sqlalchemy < 1.4 (needs editable install to avoid pytest path mismatch)
- explicitly relay password in engine.url to alembic
Removes all Referer checks, which have proven unreliable and have never been particularly strong
We can use XSRF on paths for more robust inter-path protections.
- `_xsrf` is added for forms via hidden input
- xsrf check is additionally applied to GET requests on API endpoints
- jupyter.chameleoncloud SSL is failing (I can reproduce with conda curl, but not /usr/bin/curl, so seems to be a CA issue)
- remove dead arnesund tag link (keep single article link)
- Add html language attribute
- Rename logo's alt text so it clearly states the image's purpose
- Fix missing first level heading for Login, Home and Token page
- Fix missing header level 1 of Login page
- Fix low contrast issue of navbar
Co-authored-by: Min RK <benjaminrk@gmail.com>
Added table-as-dict function instead of few functions for working with the tokens table
replaced static value from locators.py by locator itself in test_browser
simplified menu-bar case
allows us to wait for the javascript to finish loading,
since clicking buttons won't do anything if we click before the js has registered click handlers
- several http->https
- a few page moves
- miniconda->miniforge
- remove rochester from gallery, which doesn't apepar to be publicly documented (may be accessible internally, but that's not for a public gallery)
adding checks token from db, adding some explanations to async sleep(still wip), renamed few cases (spawn_panding, token), adding "cleanup_after" fixture into def browser()
- use interval instead of period to match other interval config
- avoid redundant `metric` in both class and method/attr names
- interval is separate per metric group (only one for now)
- Stops ambiguous wording on 'monthly', use more precise '30d'
- Add a 7d active users metric
- Allows us to make this configurable in the future if needed
This extension helps us restructure our documentation without creating
dead links. It requires us to explicitly declare what should be
redirected where though.
It seems better to have it in place ahead of time than to be something
we ask a contributor add just in time when its needed.
- E### warnings were already ignored by E
- Sorting of import warnings won't matter as we have black and isort
- D400: First line should end with a period
This is not flake8 configuration, but related to `pydocstyle`, which
isn't used.
These are *extremely useful* for people advocating for
more resources for their JupyterHubs, but a little difficult
to calculate without a full scale log ingestion and analytics
pipeline (such as ELK or equivalent). However, these are easy
to calculate on the JupyterHub side at any given instance -
these are fairly quick SQL queries. Prometheus can capture and
store this as a timeseries, and provide valuble advocacy data
that is hard to get otherwise.
This turns the metrics on by default, but only updates them every
hour - which seems fine for metrics that don't change that often.
This should reduce performance impact. Admins can also turn this
off if needed.
I've had to implement this in many different ways in many different
contexts, and it's also important to be able to *trust* these. Getting
this from other grafana data can be tricky to validate - we had an
experimental one in our grafana dashboards at some point in the past
that we had to kill due to it being hard to validate
(https://github.com/jupyterhub/grafana-dashboards/pull/45). This metric
will provide an authoritative source of truth for this.
Ref https://github.com/2i2c-org/infrastructure/issues/1888
When I needed to set up Linux on my windows, i found out the easier to do so was through Windows Subsystem for Linux. Hence, I needed to add the guide to the development setup
Hi @minrk, I went through the reference files you sent and made the necessary adjustments. I think everything should be fine now. I also added a new correction. Please review and revert.
Worked on the documentation page (jupyterhub/docs/source/rbac/use-case.md)
Added a wikipedia reference to [RBCA framework] and also emphasized on the solution under the *service to cull idle servers*
Adjusted the Heading sudospawner - Sudospawner to suite sentence case
Under proxy Settings, i adjusted the sentence "a organization to an organization" also singleuser to single-user
under Toree intergration, the sentence is not clear so I adjusted to Toree kernel will raise and issue when running with jupyterHub,
removed time.sleep(), removed inrelevant comments, removed unused packages, added few explanations for some functions (in progress), documenting in progress, added to conftest.py package with Options class
move offset to redux state, rather than independent,
since it can come from two places (user_page and pagination footer). Keeps things in sync.
Adds reducers for setting offset, name filter explicitly.
- sort keys for consistent presentation
- use text list for roles, groups, which aren't well rendered by the table-formatter (number index isn't helpful)
- render timestamps
- leave empty name for default server, instead of '[MAIN]' which isn't terminology used anywhere else
- define our own init_ioloop
- call it ASAP
- put init_httpserver in IOLoop.run_sync because instantiating a server accesses the current loop
- run cleanup via asyncio.run
3.7 adds ~ to the 'unreserved' (always safe) set,
but it's not safe in domain names.
so do it ourselves. Formalize in a `_dns_quote` private function,
with notes about issues.
The only usernames that change in this PR are those containing `_` or `/`,
the latter of which would have failed.
pass ssl.Purpose explicitly, deprecate verify/check_hostname
3.10 disallows 'purpose=SERVER_AUTH' from creating server sockets.
Instead:
- pass purpose directly
- always verify
- no need to set check_hostname, already covered by purpose defaults
Switches requests to tornado AsyncHTTPClient instead of requests
For backward-compatibility, use opt-in `sync=False` arg for all public methods that _may_ be async
When sync=True (default), async functions still used, but blocking via ThreadPool + asyncio run_until_complete
rather than roles, matching tokens
because oauth clients are mostly involved with issuing tokens,
they don't have roles themselves (their owners do).
This deprecates the `oauth_roles` config on Spawners and Services, in favor of `oauth_allowed_scopes`.
The ambiguously named `oauth_scopes` is renamed to `oauth_access_scopes`.
otherwise, index isn't used
note: this means changing the token prefix size requires revoking all tokens,
where before only _increasing_ the token prefix size required doing that.
The `admin_access` variable is still referenced elsewhere and I considered removing it, but I couldn't work out exactly how/if it's used so this is just a docs change for now.
allows oauth clients to issue scopes that only grant access to the issuing service
e.g. access:service!service or access:servers!server
especially useful with custom scopes
This was part of an attempt to get the url from self.server.bind_url that didn't end up getting used
shouldn't mutate db state when getting the environment
we expand/parse the same scopes _a lot_.
We can save time with some caching.
Main change: cached functions must return immutable frozenset instead of mutable set,
to avoid mutating the result of subsequent returns.
Some functions can only be cached _sometimes_ (e.g. group lookups in db cannot be cached),
for which we have a DoNotCache(result) exception
tokens have scopes
instead of roles, which allow tokens to change permissions over time
This is mostly a low-level change,
with little outward-facing effects.
- on upgrade, evaluate all token role assignments to their current scopes,
and store those scopes on the tokens
- assigning roles to tokens still works, but scopes are evaluated and validated immediately,
rather than lazily stored as roles
- no longer need to check for role permission changes on startup, because token permissions aren't affected
- move a few scope utilities from roles to scopes
- oauth allows specifying scopes, not just roles.
But these are still at the level specified in roles,
not fully-resolved scopes.
- more granular APIs for working with scopes and roles
Still to do later:
- expose scopes config for Spawner/service
- compute 'full' intersection of requested scopes, rather than on the 'raw' scope list in roles
removes need for our own implementation of the same behavior
but keep it around while we still support Python 3.6,
since the version (0.17) introducing asyncio_mode drops support for Python 3.6
instead of roles, which allow tokens to change permissions over time
This is mostly a low-level change,
with little outward-facing effects.
- on upgrade, evaluate all token role assignments to their current scopes,
and store those scopes on the tokens
- assigning roles to tokens still works, but scopes are evaluated and validated immediately,
rather than lazily stored as roles
- no longer need to check for role permission changes on startup, because token permissions aren't affected
- move a few scope utilities from roles to scopes
- oauth allows specifying scopes, not just roles.
But these are still at the level specified in roles,
not fully-resolved scopes.
- more granular APIs for working with scopes and roles
- oauth clients can request a list of roles
- authorization will proceed with the _subset_ of those roles held by the user
- in the future, this subsetting will be refined to the scope level
defined with
c.JupyterHub.custom_scopes = {
'custom:scope': {'description': "text shown on oauth confirm"}
}
Allows injecting custom scopes to roles,
allowing extension of granular permissions to service-defined custom scopes.
Custom scopes:
- MUST start with `custom:`
- MUST only contain ascii lowercase, numbers, colon, hyphen, asterisk, underscore
- MUST define a `description`
- MAY also define `subscopes` list(s), each of which must also be explicitly defined
HubAuth can be used to retrieve and check for custom scopes to authorize requests.
When debugging errors and outages, looking at the logs emitted by
JupyterHub is very helpful. This document tries to document some common
log messages, and what they mean.
I currently added just one log message, but we can add more
over time.
Ref https://github.com/2i2c-org/infrastructure/issues/1081
where this would've been useful troubleshooting
Avoids leaving stale state when re-using a spawner that failed the last time it started
we keep failed spawners around to track their errors,
but we don't want to re-use them when it comes time to start a new launch.
adds User.get_spawner(server_name, replace_failed=True) to always get a non-failed Spawner
restores token field useful for javascript-originating API requests,
removed in 1.5 / 2.0 for security reasons because it was the wrong token.
This places the _user's_ token in PageConfig,
so it should have the right permissions.
requires jupyterlab_server 2.9, has no effect on earlier versions.
Useful for backend services that want to use the user's token.
Added `in_cookie` bool argument to exclude cookies (previous behavior),
since notebook servers do some things differently when auth is in query param or header vs cookies
We don't do it correctly, so don't try by default
It does work _sometimes_, but most of the time it does work, it's because it's a no-op.
Turning it off by default makes it more likely folks will see the caveat that it may not work.
When we run the proxy separately, defaults of `hub.bind_url` may be different from proxy's public url. Actually, the hub has no ways to know about which address the proxy is serving at if we do not configure its `bind_url` explicitly.
- tests
- docs
- ensure all group APIs are rejected when auth is in control
- use 'groups' field in return value of authenticate/refresh_user, instead of defining new method
- log group changes in sync_groups
- Added hook function stub to authenticator base class
- Added new config option `manage_groups` to base `Authenticator` class
- Call authenticator hook from `refresh_auth`-function in `Base` handler class
- Added example
Some non-api spawn and redirect checks still had `self or admin`,
when they should have checked directly for the appropriate permissions
This removes the long-deprecated redirect from `/user/other` -> `/user/self` _if_ the other server is not running.
The result is a more consistent behavior whether the requested server is running or not,
and whether the user has _access_ to the running server or not.
wee care about what the browser sees, so trust the outermost entry instead of the innermost
This is not secure _in general_, in that these values can be spoofed by malicious proxies,
but for CORS and cookie purposes, we only care about what the browser sees,
however many hops there may be.
A malicious proxy in the chain here isn't a concern because what matters is the immediate
hop from the _browser_, not the immediate hop from the _server_.
This patch is related to the implementation of the
MultiAuthenticator in jupyterhub/oauthenticator#459
The issue will be triggered when using more than one local provider
or mixing with oauth providers.
With multiple providers the template generates a set of buttons to
choose from to continue the login process.
For OAuth, the user will be sent to the provider login page and
the redirect at the end will continue nicely the process.
Now for the tricky part: using a local provider (e.g. PAM), the
user will be redirected to the "same page" thus the same template
will be rendered but this time to show the username/password dialog.
This will trip the workflow because of the action URL coming from
the settings and not from the authenticator. Therefore when the button
is clicked, the user will come back to the original multiple choice page
rather than continue the login.
- update models with 2.0.0
- different scopes for oauth, api
shows model depends on permissions
- update text with more details about scopes
- fix outdated reference to local-system credentials
I got confused with a variable called `rolename` that was actually an orm.Role
casting types in a signature is confusing,
but now `role` input can be Role or name,
and in the body it will always be a Role that exists
Behavior is unchanged
This isn't the only or even main thing likely to raise here,
so don't blame it, which is confusing, especially in a message shown to users.
Log the full exception, and show a more opaque message to the user to avoid confusion
Cover different combinations of:
- existing assignments in db
- additive allowed_users/admin_users config
- strict users membership assignment in load_roles
If the user role was defined but did not specify a user membership list,
users granted access by the Authenticator would lose their status
Instead, do nothing on an undefined user membership list,
leaving any users with their existing default role assignment
favor HubOAuth, as that should really be the default for most services
- Remove some outdated 'new in' text
- Remove docs for some deprecated features (hub_users, hub_groups)
- more detail on what's required
do not allow token-based access to pages
Tokens are only accepted via Authorization header, which doesn't make sense to pass to pages,
so disallow it explicitly to avoid surprises
- move spec to _static/rest-api.yml, since the original yaml must be served
- copy javascript rendering code from FastAPI (uses swagger-ui)
- remove link to pet store, since there isn't a big enough difference to duplicate it
- remove bootprint rendering with node
since it means 'inheriting' the owner's permissions
'all' prompted the question 'all of what, exactly?'
Additionally, fix some NameErrors that should have been KeyErrors
some things raise standard TimeoutError, others may raise tornado gen.TimeoutError (gen.with_timeout)
For consistency, add AnyTimeoutError tuple to allow catching any timeout, no matter what kind
Where we were raising `TimeoutError`,
we should have been raising `asyncio.TimeoutError`.
The base TimeoutError is an OSError for ETIMEO, which is for system calls
404 is also used to identify that a particular resource
(like a kernel or terminal) is not present, maybe because
it is deleted. That comes from the notebook server, while
here we are responding from JupyterHub. Saying that the
user server they are trying to request the resource (kernel, etc)
from does not exist seems right.
Non-running user servers making requests is a fairly
common occurance - user servers get culled while their
browser tabs are left open. So we now have a background level
of 503s responses on the hub *all* the time, making it
very difficult to detect *real* 503s, which should ideally
be closely monitored and alerted on.
I *think* 404 is a more appropriate response, as the resource
(API) being requested is no longer present.
swaps from default nbclassic and opt-in to lab, to now default to lab and opt-in to nbclassic
defaults to jupyterlab *if* lab 3.1 is available,
so should still work without configuration if lab is unavailable (or too old)
- remove mention of outdated nodejs-legacy
- mention nodesource for more recent node
- mention jupyterlab
- initial localhost request will be on http, not https
Of 18355 lines of logs in a 5day old hub instance,
8228 are just this message. That's 44% of the logs! We now
have prometheus metrics to monitor performance of this if
needed, and people can always turn on debug logging.
- circle CI no longer used
- ubuntu/debian nodejs may be too old (12.0+ required)
- remove mention of mailing list
- Python 3.6 required
- Emphasise JupyterLab over notebook
so it can be applied to all cookie-authenticated POST requests
also parse the content-type header to handle e.g. `Content-Type: application/json; charset`
The content-type of Hub API requests used for user management, specifically for creating a user
is not validated and so the ‘text/plain’ type is accepted, where it must be ‘application/json’.
This commit adds validation for `Content-type` header for the /hub/api/users endpoint to only
allow requests with content-type as `application/json`
improves backward compatibility for clients that haven't implemented pagination
by requesting the max page size by default instead of the new default page size
use `Accept: application/jupyterhub-pagination+json` to opt-in to the new response format
With a paginated API, we need to return pagination info (next page arguments, whether a next page exists, etc.),
but a simple list response doesn't give a good way to do that.
We can follow precedents and use a dict with an `items` field for the actual items,
and a `_pagination` field for info about pagination, including offset, limit, url for the next request
and govern GET /users|groups|services endpoints with these
Greatly simplifies filtering and pagination,
because these filters can be expressed in db filters,
unlike the potentially complex `read:users`.
Now the query itself will never return a model that should be excluded.
While writing the tests, I added more cleanup between tests.
We now ensure cleanup of all users and groups after each test,
which required updating some group tests which relied on this state leaking
distutils is slated for deprecation in the stdlib
we can use packaging for version parsing and setuptools in setup.py
packaging is technically an extra dependency, but rarely missing because it's so widespread
- ensure create_certs is called for managed services
- wait for services with http, which checks ssl connections (without http, only tcp was checked, which doesn't verify it works!)
adding `read:` to everything isn't right because not everything has a `read:` counterpart and not every `read:` has a write counterpart
includes a test verifying that every scope has a definition
- access:services for services
- access:users:servers for servers
- tokens automatically have access to their issuing client (if their owner does, too)
- Check access scope in HubAuth integration
check_db_locks checks for db lock state after the end of a function,
but wasn't properly waiting when it wrapped an async function,
meaning it would run the check while the async function was still outstanding,
causing possible spurious failures
and fix warning condition for intersection overlap
- only warn when there's a group only on one side and a user or server only on the other,
otherwise there is no lost information to warn about (group and/or defined on both sides)
- correctly resolve servers as sub-scopes of user
- update references to default branch name in docs, workflows
- use HEAD in github urls, which always works regardless of default branch name
- fix petstore URLs since the old petstore links seem to have stopped working
should never occur in real applications where only one loop is run,
but may occur in tests if the Proxy object lives longer than the loop in which it runs
I suspect this is the source of our intermittent test failures with
> got Future <Future pending> attached to a different loop
- remove long-deprecated `POST /api/authorizations/token` for creating tokens
- deprecate but do not remove `GET /api/authorizations/token/:token` in favor of GET /api/user
- remove shared-cookie auth for services from HubAuth, rely on OAuth for browser-auth instead
- use `/hub/api/user` to resolve user instead of `/authorizations/token` which is now deprecated
instead of on the test class
and fix the logic for when it is called a bit:
- call on *all* Spawners, not just the default
- call on named server deletion when remove=True
and clarify warning when a base handler isn't patched
- reorganize patch steps into functions for easier re-use
- patch notebook and jupyter_server handlers if they are already imported
- run patch after initialize to ensure extensions have done their importing before we check
- Attach role limit to OAuthClient
- Attach authorized roles to OAuthCode
- pass roles from code to API token on completion
standard 'scopes' in oauth process are matched against our 'roles' instead of our low-level scopes
These only affected servers upgrading directly from 0.8 or earlier with still-running servers
0.8 was a long time ago, it's okay to require restarting servers for an upgrade that long
- 3-255 characters
- ascii lowercase, numbers, -
- must start with letter
- must not end with -
this lets us avoid url escaping issues in e.g. oauth params
When the hub is running in API-only mode, it's
very useful to have the proxy know where to send
URLs that would normally be serviced by the hub.
For example, / might go to a service that renders
a home page, while `/user` might go to a service that
tells the user their server is dead.
Right now, this happens 'out of band', with a process
that has to talk to the proxy directly. This is a
bit messy - the routes need to be re-added when the
proxy restarts, the hub might try to remove them, etc.
By adding support for this in the hub itself, all
this complexity is now removed and the hub continues
to own all the routes in the proxy
This allows for more flexible customization of the login page,
since it allows to re-use the login form in an extending template
by reusing the new block.
This was not cleanly possible before since the main container
was part of the very same block as the form code.
fixes#3414
- merge oauth token fields into APITokens
- create oauth client 'jupyterhub' which owns current API tokens
- db upgrade is currently to drop both token tables, and force recreation on next start
so changing cookie age changes oauth token expiry,
since these are what are stored in those cookies anyway,
it makes sense for them to expire at the same time
When an oauth client changes, we delete all the tokens
associated with that client. This invalidates all user sessions
for that oauth client, and the oauth client's users will need to
go through the OAuth workflow again after the cache period (specified
by cache_max_age in HubAuth, 5min by default). This is fine in theory,
since oauth client information doesn't change frequently.
However, we were deleting and re-adding all oauth clients each time
the hub started! This was unnecessary, since the data was going to
be the same 99% of the time. Rest of the time, we should just update,
preventing unnecessary churn.
This PR does that.
Ref https://github.com/yuvipanda/jupyterhub-configurator/issues/2
Ref https://github.com/berkeley-dsep-infra/datahub/issues/2284
get_current_user returns a User model instead of a dict.
using cookies for Hub auth is deprecated, so removed
that option and refactored get_current_user
makes testing a PR even easier since we build an sdist and wheel for every PR and push
since artifacts are double-archived, it's not quite as simple as giving a URL to install from,
but this at least makes it available. To use:
- download and unpack zip
- `pip install path/to/whl`
apply patch directly to BaseHandler instead of each handler instance
so that overrides can still take effect (i.e. APIHandler raising 403 instead of redirecting)
Merge request #3257fixed#3256 only on getting-started/services-basics.md
There is still a reference to jupyterhub example cull-idle in reference/services.md
setting per_page in constructor resolves before max_per_page limit is updated from config,
preventing max_per_page from being increased beyond the default limit
we already loaded these values anyway in the first instance,
so remove the redundant Pagination object
Running the Curl as is return a 500 with ```json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
``` Converting the payload to a proper Json
The feature is disabled by default.
If enabled (by setting `login_term_url`), user will have to check the
checkbox to accept the terms and conditions in order to login.
jinja2's async support requires Python 3.6+. That should
be an implementation detail - so we render it in the main
thread (current behavior) but pretend we did not
write_error is a synchronous method called by an async
method from inside the event loop. This means we can't just
schedule an async render_templates in the same loop and wait
for it - that would deadlock.
jinja2 compiled your code differently based on wether you
enable async support or not. Templates compiled with async
support can't be used in cases like ours, where we already
have an event loop running and calling a sync function. So
we maintain two almost identical jinja2 environments
testing other branches is useful, and there's little cost to removing the conditions:
- we don't run PRs from our repo, so test runs aren't duplicated on the repo
- testing on a fork without opening a PR is still useful (I use this often)
- if we push a branch, it should probably be tested (e.g. backport branch), and filters make this extra work
- the cost of running a few extra tests is low, especially given actions' current quotas and parallelism
The jupyterhub/tests/test_spawner.py::test_spawner_routing[has~x] test
failed in py37+ but not in py36, and I think it is foundational to the
socket library of Python that has changed.
This is a stacktrace from Python/3.7.9/x64/lib/python3.7/site-packages/urllib3/util/connection.py:61
```
> for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
E socket.gaierror: [Errno -2] Name or service not known
```
Here is relevant documentation about socket.getaddrinfo.
https://docs.python.org/3.7/library/socket.html#socket.getaddrinfo
- Fixes typo (eolving -> evolving)
- re-use the word current instead of momentary for comprehensibility
- references JupyterHubs current state with its instead of the for comprehensibility
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
allows selecting users based on the 'ready' 'active' or 'inactive' states of their servers
- ready: users who have any servers in the 'ready' state
- active: users who have any servers in the 'active' state (i.e. ready OR pending)
- inactive: users who have *no* servers in the 'active' state (inactive + active = all users, no overlap)
Does not change the user model, so a user with *any* ready servers will still return all their servers
Fixes issues with OAuth flows when internal_ssl is enabled.
When internal_ssl was enabled requests to non-internal endpoints failed
because the system CAs were not being loaded.
This caused failures with public OAuth providers with public CAs since
they would fail to validate.
We are users of the napoleon sphinx extension, which helps us parse our
Google Style Python Docstrings, and its syntax suggest we should use
indentation when we use more then one string for an entry in an
Arguments: or Returns: list.
For more details, see: https://github.com/jupyterhub/jupyterhub/pull/3151#issuecomment-676186565
- Explicitly mention min-8-char constraint
- Connect the api_token in the configuration with the one mentioned in auth requests
Co-authored-by: Mike Situ <msitu@ceresimaging.net>
for easier reuse with jupyter_server
mixins have a lot of assumptions about the NotebookApp structure.
Need to make sure these are met by jupyter_server (that's what tests are for!)
When using the `KubeSpawner` it is typical to disable the
`slow_spawn_timeout` by setting it to 0. `zero-to-jupyterhub-k8s`
does this by default [1]. However, this causes an immediate `TimeoutError`
which gets logged as a warning like this:
>User hub-stress-test-123 is slow to start (timeout=0)
This avoids the warning by checking the value and if disabled simply
returns without logging the warning.
[1] https://github.com/jupyterhub/zero-to-jupyterhub-k8s/commit/b4738edc5Closes#3126
- Related issue: #3120. Closes: #3120.
- I realized that spawner.clear_state() is called before
spawner.post_stop_hook(). This caused was a bit surprising to me,
and caused some issues.
- I tried the naive strategy of moving clear_state to later and
setting the orm_state to `{}` at the point where it used to be
clear.
- This tries to maintain the exception behavior of clear_state and
post_stop_hook, but is exactly identical.
- To review:
- I'm not sure this is a good idea!
- Carefully consider the implications of this. I am not at all sure
about unintended side-effects or what intended semantics are.
This was added in PR #2721 and by default results in just printing
out "10" without any context when starting the hub service. This
simply removes the orphan print statement.
I'm open to changing this to a debug log statement with context if
someone finds that useful, e.g.:
`self.log.debug('Effective init_spawners_timeout: %s', init_spawners_timeout)`
Bug #2852 describes an issue where templates cannot be found by
JupyterHub when using the Docker images built out of this repo. The
issue turned out to be due to missing node_modules at the time of build.
There is a hook in the `package.json` that causes node_modules to be
copied to the static/components directory post-install. If this is not
run, those components are not in the static directory and thus are not
included in the wheel when it is built.
Fix#2905 fixed one problem--the `bower-lite` hook script wasn't copied
to the Docker image, and so the hook couldn't run, but the other issue
is that the client dependencies are never explicitly built. They must be
built prior to the wheel build, and the hook script must have run so
they are copied to the ./static folder, which is included in the wheel
build thanks to [MANIFEST.in][1]
.. note::
This removes the verbose flag from the wheel build command. The
reason is that it generates a lot of writes to stdout. It seems that
wheel can (or always) is switching to non-blocking mode, which can cause
EAGAIN to be raised, which leads to fun errors like:
BlockingIOError(.., 'write could not complete without blocking', ..)
The wheels fail to build if this error is raised. Removing the verbosity
flag is a quick solution (it drastically reduces writes to STDOUT), but
comes at the cost of more trouble debugging a failed wheel build. Adding
the "-v" back in the Dockerfile when debugging a build failure is still
possible. [Credit: @vbraun][2]
.. note::
This commit also removes some extraneous COPY operations during the
Docker build, in particular the /src/jupyterhub/share directory is
not used unless users have explicitly override their
jupyterhub_config.py to include it somehow. If the default
data_files_path behavior is used, JupyterHub should find the proper
static directory when the application loads.
Fixes: #2852
[1]: https://packaging.python.org/guides/using-manifest-in/
[2]:
https://github.com/travis-ci/travis-ci/issues/4704#issuecomment-348435959
- base Expiring class
- ensures expiring values (OAuthCode, OAuthAccessToken, APIToken) are not returned from `find`
- all expire appropriately via purge_expired
behaves more like one would expect (same as try get-key, except: return default)
without relying on cache presence or underlying key type (integer only)
This does some of the test with the latest traitlets.
We are looking into making a 5.0 release and would like to have some
confidence that it does not break too many things.
They are less relevant than other request and could very well end up
cluttering the logs. It is not uncomming for these requests to be made
every second or every other second.
In case there are multiple singleuser notebooks at different
versions we want to log each of those mismatches as a warning
so this changes the global _version_mismatch_warning_logged flag
from a bool to a dict keyed by the hub/singleuser version mismatch
combination. A test wrinkle is added for that scenario.
Part of #2970
As a new contributor to jupyterhub it took awhile to get
up and running locally mainly because I didn't have sqlite
installed but also because I was flipping between README,
CONTRIBUTING and the actual contributing docs which are all
a little bit different.
This does a few things:
- Updates the contributor sphinx docs to mention that how
one chooses to isolate their development environment is
up to them with a link to the detailed forum thread on
that topic.
- Updates the contributor sphinx docs to mention sqlite and
database setup in general. While in here some trailing
whitespaces are cleaned up.
- Leave a comment in CONTRIBUTING.md about the redundant
information in the docs on getting a development environment
setup. Long-term we should really get those merged so there
is a single authoritative document on how to get a dev env
setup for contributing to jupyterhub.
- Link to the jupyterhub gitter channel for asking questions.
If your jupyterhub and jupyterhub-singleuser instances
are running at different minor or greater versions a
warning gets logged per active server which can be a lot
when you have hundreds of active servers.
This adds a flag to that version mismatch logging logic
such that the warning is only logged once per restart
of the hub server.
Closes issue #2970
APIHandler.server_model unconditionally returns the Spawner's
user_options dict but it wasn't mentioned in the API reference
so it's added here. The description is taken from the docstring
on Spawner.user_options.
Closes issue #2965
Authorization header has the form "<type> <credentials>"
rather than checking for "token" only, preserve type value, which could be Bearer, Basic, etc.
query on Server objects instead of User objects
avoids lots of ORM work on startup since there are typically a small number of running servers
relative to the total number of users
this also means that the users dict is not fully populated. Is that okay? I hope so.
Not exactly all though as some will be ignored by the .dockerignore
file. This change ensures we don't get future issues caused by a failure
to update what needs to be copied to the build stage and not like we've
had recently.
This fixes#2852 by adding a script part of package.json. But is this
enough? Should we perhaps look in MANIFEST.in and copy some more files
listed there?
This is all thanks to people coming together and helping out figuring
out the issue in https://github.com/jupyterhub/jupyterhub/issues/2852.
Thank you @shingo78 for spotting that we missed bower-lite and its role
and all others who reported and helped debug this!
Updated capitalisation of names. Addressed revisions.
Fleshed out the prerequists and explanation of access control.
Added part of configuration section to set JupyterLab as the default interface.
corrected need for sudo
Added warning to reverse-proxy section to recommend use of HTTPS and firewall.
- A trivial bug caused by my last change to #2397 - made possible by
the fact we didn't have a way to reliable test PAM stuff.
- Thanks to @narnish for noticing.
- Closes: #2875
- We now default to ubuntu bionic (18.04) and try once with ubuntu xenial
(16.04).
- We now always test Python 3.8 but allow it to fail, as compared to not
allowing it to fail and only testing it on tagged commits. This is a
bugfix I'd say.
- We now no longer test Python 3.5 and Python 3.6 dedicatedly without
any custom configuration like usage of subdomain, which allows us to
reduce the number of build jobs in a way I think makes a great sense to
compromise.
Some notes:
- Added a conda-forge and DockerHub badge
- Added logo's and made us conform with the team-compass badges section
as can be found here:
https://jupyterhub-team-compass.readthedocs.io/en/latest/building-blocks/readme-badges.html
- Concluded that our CircleCI badge is good because it let's us overview
the repo's build systems, but that it is bad because it is only is about
documentation preview in PRs which isn't useful in a README's header in
a way.
- Noted there was a CircleCI token in the badge, that I believe is meant
to be used with private repo access rather than public repo access. I'm
not sure we need that but I made it a markdown/html comment for now.
- Decided to not manually add a line break between badges. I figured it
could make sense to break manually before the social badges instead of
automatically letting it wrap at some point, but we don't really know
the size of the window viewing so it felt like a bad idea to hardcode
that.
- When the Dockerfile was turned into a multi-stage build, it seems
the share/ directory was not copied to the final image. This
resulted in certain components (static/components/, static/css/)
being missing, which resulted in the JupyterHub share directory not
being findable (in jupyterhub/_data.py). This led to all kinds of
weird havoc, like templates not being findable (#2852).
- I am still unsure if this is the right fix, please check this well.
- Closes: #2852
- While debugging another problem, I noticed some failures to build
the C extensions in the logs. Adding build-essential should fix
that (also as mentioned in the logs themselves).
- Extensions failed for tornado, sqlalchemy, and pyrsistent(pvectorc)
and can be found by searching the previous output for "fail".
Closes#2819 by exiting JupyterHub directly with an error if a config
file has been specified for the config_file traitlet, for example
through the -f or --config flag, but isn't available on the file
system.
- In the cull script, the max_age and inactive_limit are used from the
outer scope. In the case that you add extra logic, one may want to
modify these values.
- In that case, you either have to rename them locally, or access the
outer scope with "nonlocal", the first of which is too much work,
the second of which has a high chance of introducing bugs (as it did
for me).
- This change introduces a fix for everyone. It doesn't change basic
functionality, but makes local modifications simpler.
- Pass in user object & request object only explicitly.
Much better interface that is harder to break by internal
refactoring. We can always add more parameters if needed?
/user-redirect/ is used to help link to a particular url
in the logged in user's authenticated notebook. For example,
if I'm logged in as user 'yuvipanda' and hit the URL
/hub/user-redirect/git-pull, it'll redirect me to
/user/yuvipanda/git-pull. This is extremely useful in
connecting hub links to notebook server extensions, such
as nbgitpuller.
Admins might want to customize how this redirection is done -
for example, redirect users to different running servers
based on the nbgitpuller repository they are linking from.
Adding a hook here helps accomplish that.
allows services to be explicitly blessed to skip the extra oauth confirmation page
added in 1.0
This confirmation page is unhelpful for many admin-managed services,
and is mainly intended for cross-user access.
The default behavior is unchanged, but services can now opt-out of confirmation
(as is done already for the user's own servers).
Use with caution, as this eliminates users' ability to confirm that a service
should be able to authenticate them.
- API requests to non-running servers are not uncommon when you cull
servers and people leave tabs open and active. It returns with 503
and logs all headers, which can take up half of our total log lines
- This avoids logging headers for all 502 and 503 return statuses.
#2747 presented an alternative (more complex) implementation, but this
turned out to be appropriate.
- Closes: #2747
In current versions of MySQL and MariaDB `innodb_file_format`
and `innodb_large_prefix` have been removed. This allows them to not
exist and makes sure the format for the rows are `Dynamic` (default
for current versions).
If init_spawners takes too long (default: 10 seconds) to complete,
app start will be allowed to continue while finishing in the background.
Adds new `check` pending state for the initial check.
Checking lots of spawners can take a long time,
so allowing this to be async limits the impact on startup time
at the expense of starting the Hub in a not-quite-fully-ready state.
- Introduce the EventLog class from BinderHub for emitting
structured event data
- Instrument server starts and stops to emit events
- Defaults to not saving any events anywhere
The flask example in the documentation was still using the
input argument `cookie_cache_max_age` when instantiating
`HubAuth` object. `cookie_cache_max_age` is deprecated since
JupyterHub 0.8 and should be replaced by `cache_max_age`.
- Install pip in the docs conda env (or conda complains).
- Do not override page.html, the next/previous buttons are now handled by
alabaster_jupyterhub (this actually remove the duplicated next/prev
buttons)
- use alabaster_jupyterhub when building locally, this make it easy for
new contributor to get the _exact_ same appearance than on
readthedocs.
- cull_idle_servers.py gets the full server state, so is capable of
doing any kind of arbitrary logic on the profile in order to be more
flexible in culling.
- This patch does not change anything, but gives an embedded
(commented out) example of how you can easily add custom logic to
the script.
- This was added as a tempate/demo for #2598.
* Add missing responses (doesn't include all possible responses yet)
* Refactor invalid multi in body parameters into a single parameter
* Change form type into valid formData
* Fix use of required fields
* Apply a few other minor fixes
Fixes https://github.com/jupyterhub/jupyterhub/issues/2566 to some
degree by making the announcement stand out using twitter-bootstrap
classes `alert` and `alert-warning`. Perhaps we could theme twitter
bootstrap or this alert specifically with jupyter related colors as well
though?
Windows doesn't have support for signal handling so it can't use the
signal handling capabilities of asyncio. Use the previous atexit
strategy on the Windows case instead.
Signed-off-by: Alejandro Del Castillo <alejandro.delcastillo@ni.com>
Big thanks to Erik, Tim, and Min for the great comments!
Change names to be more clear, add function doc comments,
change scoping on some functions, add handle_logout to let
people take custom logout actions, extract
render_logout_page from get method, add TODO.
AS A developer of a Logout handler
I WANT to be able to call a function to kill spawners and
do other backend logout stuff and a separate function to
forward the user along the logout chain.
I believe this PR adds (moderately private) methods to the
Logout Handler to do just that.
update several links (html targets don't work anymore)
had to add rest-api redirect so link would resolve,
since there isn't a ref for files in _static
- /user/:name no longer triggers implicit spawn at any point
- add /spawn-pending/:user/:server handler for pending page. This page has no side effects.
- spawn links point to /spawn/:user/:server to finish hooking up links for named servers and options_form handling
- It took me a bit longer than I would have liked for me to figure out
how to run the proxy separate from the hub. When I had to do this a
second time for a different hub, it also took me too long.
- This adds a page dedicated to running the proxy separate from the
hub, since it is relatively easy and has a high usability
improvement.
- Currently work in progress.
TEXT is wrong on Oracle, LargeBinary is wrong everywhere else.
Text seems to be the high-level type that maps to the right thing both places.
This results in no change on supported implementations, as Text == TEXT there.
- We don't need the extra normalization of that function.
- Also add in username_map support here. It probably isn't needed
most of the time with PAM, but it keeps things consistent and is
easier than documenting an exception.
Traitlets require quotes around literals, to avoid interpreting them as
as datatypes other than string. However, quotes are problematic on the
notebook_dir case. On Windows, Popen will mis-interpret the quotes and
escape them, which trips the process spawn. To avoid problems, only
quote if necessary.
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
adds Authenticator.auth_refresh_age and Authenticator.refresh_pre_spawn config
- auth_refresh_age allows auth to expire (default: 5 minutes) before calling Authenticator.refresh_user.
- refresh_pre_spawn forces refresh prior to spawn (in case of auth tokens, etc.)
this introduces a race between the early RuntimeError being tested
and the no_patience causing handlers to return early if async start isn’t complete.
With tornado coroutines, an early RuntimeError could be guaranteed to resolve promptly, but asyncio isn’t as consistent,
possibly causing some of the recent flaky tests.
Windows doesn't have a pwd module. To avoid an import error on Windows,
move import statement inside functions that use pwd.
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
The current request handler might be needed to determine if the auth
data needs to be refreshed.
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
Use setuptools console_scripts functionality to create top level jupyter
& jupyterhub-single user entry point scripts on *nix, and executables on
Windows.
Signed-off-by: Alejandro del Castillo <alejandro.delcastillo@ni.com>
when a token doesn't identify a user, the response is None.
These results are cached, but the cache checked for `is None`,
causing failed-auth responses to effectively not be cached.
Hoist admin status determination from authentication to a secondary function called by get_authenticated_user
Create mock objects for struct_group and struct_passwd, migrate existing mock group objects to it
Remove old admin mock stuff for authenticate
- trust subdomain_host by default
- JupyterHub.trusted_alt_names is inherited by Spawners by default. Do we need Spawner.ssl_alt_names to be separately configurable?
One of the example was using quotes instead of backticks.
Backticks are the "older" way of doing things, which has a number of
disadvantes:
http://mywiki.wooledge.org/BashFAQ/082
Here I'm more worried about readability as depending on font and "smart"
editor helping on the web, many people may confuse ` with ', it could
end up modifying formatting on makrdown powered website... etc...
jupyterhub.authenticators for authenticators, jupyterhub.spawners for spawners
This has the effect that authenticators and spawners can be selected by name instead of full import string (e.g. 'github' or 'dummy' or 'kubernetes')
and, perhaps more importantly, the autogenerated configuration file will include a section for each installed and registered class.
- Expands the previous documentation on upgrading JupyterHub
to include more information.
- Remove specific documentation on 0.7 -> 0.8 upgrade, since
this seems to be a straight copy of the markdown version of
upgrading docs. The important thing about the 0.7 -> 0.8 upgrade
(requiring versions of JupyterHub to match) is now in the
main document.
- Move from markdown to rst
Info on upgrading is important & relevant. This consolidates
the index to be a bit better. Next step is to consolidate the
documentation into one page.
Removes the 'tutorials' index page as well, since that only
had a reference to z2jh (which is now referenced from the
'distribution' section). The distribution section has
better visibility too
Currently, the sections in index.rst are using ** for bold,
rather than true section headers. This prevents them from being
linkable. Since we'd like to link to the 'contributing' section
from CONTRIBUTING.md, we change this by moving everything to
section headers. We also move to the toctree directive, since
it keeps the bullets aligned properly (they were hanging if
we used simple * markers)
This also replaces CONTRIBUTING.md content with a link to
the docs.
- Move from CONTRIBUTING.md to a subdirectory in docs, so
we can expand and add more documentation.
- Move from markdown to reStructuredTest
- Add a direct blurb in the JupyterHub docs index page on
how contribution.
- More prominent link to the Code of Conduct
- Add section on getting in touch with the JupyterHub community
define some pending/ready helpers as static constants on orm.Spawner
allows treating orm.Spawner the same as Spawner wrappers,
as long as `.active` etc. checks are performed first
and generate no events if not pending
Reason: race condition is unavoidable between first pending check and check inside _generate_progress.
In this event, return immediately.
The current list in the docs is out of date. The list
in the wiki is more up-to-date, and easier for folks
to change over time. In the long run, we should decide
where lists like this belong.
- delete oauth clients for servers when they shutdown
- avoid deleting oauth clients for servers still running across an 0.8 -> 0.9 upgrade, when the oauth client ids changed from `user-NAME` to `jupyterhub-user-NAME`
- refresh_user may return True in the common case, identifying that everything is up-to-date
- return False for "needs login"
- return auth_data dict when an update can be performed without logging in again
- `.get_current_user` is called in the `prepare` stage for all handlers
- use `.current_user` to access current user in methods
- adds Authenticator.refresh_user for refreshing user auth (unused at this point)
With changes to CHP requiring a second, different
authority, the complexity of managing trust within
JupyterHub has risen. To solve this, Certipy now
has a feature to specify what components should
trust what and builds trust bundles accordingly.
Mainly small fixes, but the token page could be completely broken
This release will include the spawner.handler addition,
but not the oauthlib change currently in master
To better accommodate external certificate management
as well as building of trust, Certipy was refactored.
This included general improvements to file and
record handling. In the process, some of Certipy's
APIs changed slightly, but should be more stable now
going forward.
Setup general ssl request, not just to api
Basic tests comprised of non-ssl test copies
Create the context only when request is http
Refactor ssl key, cert, ca names
Configure the AsyncHTTPClient at app start
Change tests to import existing ones with ssl on
Override __new__ in MockHub to turn on SSL
Add Localhost to trusted alt names
Update to match refactored certipy names
Add the FQDN to cert alt names for hub
Ensure notebooks do not trust each other
Drop certs in user's home directory
Refactor cert creation and movement
Make alt names configurable
Make attaching alt names more generic
Setup ssl_context for the singleuser hub check
If you are reporting an issue with JupyterHub, please use the [GitHub issue](https://github.com/jupyterhub/jupyterhub/issues) search feature to check if your issue has been asked already. If it has, please add your comments to the existing issue.
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
- Running `jupyter troubleshoot` from the command line, if possible, and posting
its output would also be helpful.
- Running in `--debug` mode can also be helpful for troubleshooting.
I closed this issue because it was labelled as a support question.
Please help us organize discussion by posting this on the http://discourse.jupyter.org/ forum.
Our goal is to sustain a positive experience for both users and developers. We use GitHub issues for specific discussions related to changing a repository's content, and let the forum be where we can more generally help and inspire each other.
Thank you for being an active member of our community! :heart:
Welcome! As a [Jupyter](https://jupyter.org) project, we follow the [Jupyter contributor guide](https://jupyter.readthedocs.io/en/latest/contributor/content-contributor.html).
Welcome! As a [Jupyter](https://jupyter.org) project,
you can follow the [Jupyter contributor guide](https://jupyter.readthedocs.io/en/latest/contributing/content-contributor.html).
Make sure to also follow [Project Jupyter's Code of Conduct](https://github.com/jupyter/governance/blob/HEAD/conduct/code_of_conduct.md)
for a friendly and welcoming collaborative environment.
## Set up your development system
Please see our documentation on
For a development install, clone the [repository](https://github.com/jupyterhub/jupyterhub)
- [Setting up a development install](https://jupyterhub.readthedocs.io/en/latest/contributing/setup.html)
and then install from source:
- [Testing JupyterHub and linting code](https://jupyterhub.readthedocs.io/en/latest/contributing/tests.html)
```bash
If you need some help, feel free to ask on [Gitter](https://gitter.im/jupyterhub/jupyterhub) or [Discourse](https://discourse.jupyter.org/).
[](https://github.com/jupyterhub/jupyterhub/actions)
If you believe you’ve found a security vulnerability in a Jupyter
project, please report it to security@ipython.org. If you prefer to
encrypt your security reports, you can use [this PGP public key](https://jupyter-notebook.readthedocs.io/en/stable/_downloads/1d303a645f2505a8fd283826fafc9908/ipython_security.asc).
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
description:Used by single-user notebook servers to hand off cookie authentication to the Hub
parameters:
- name:cookie_name
in:path
required:true
type:string
- name:cookie_value
in:path
required:true
type:string
responses:
'200':
description:The user identified by the cookie
schema:
$ref:'#/definitions/User'
'404':
description:A user is not found.
/oauth2/authorize:
get:
summary:'OAuth 2.0 authorize endpoint'
description:|
Redirect users to this URL to begin the OAuth process.
It is not an API endpoint.
parameters:
- name:client_id
description:The client id
in:query
required:true
type:string
- name:response_type
description:The response type (always 'code')
in:query
required:true
type:string
- name:state
description:A state string
in:query
required:false
type:string
- name:redirect_uri
description:The redirect url
in:query
required:true
type:string
/oauth2/token:
post:
summary:Request an OAuth2 token
description:|
Request an OAuth2 token from an authorization code.
This request completes the OAuth process.
consumes:
- application/x-www-form-urlencoded
parameters:
- name:client_id
description:The client id
in:form
required:true
type:string
- name:client_secret
description:The client secret
in:form
required:true
type:string
- name:grant_type
description:The grant type (always 'authorization_code')
in:form
required:true
type:string
- name:code
description:The code provided by the authorization redirect
in:form
required:true
type:string
- name:redirect_uri
description:The redirect url
in:form
required:true
type:string
responses:
'200':
description:JSON response including the token
schema:
type:object
properties:
access_token:
type:string
description:The new API token for the user
token_type:
type:string
description:Will always be 'Bearer'
/shutdown:
post:
summary:Shutdown the Hub
parameters:
- name:proxy
in:body
type:boolean
description:Whether the proxy should be shutdown as well (default from Hub config)
- name:servers
in:body
type:boolean
description:Whether users' notebook servers should be shutdown as well (default from Hub config)
definitions:
User:
type:object
properties:
name:
type:string
description:The user's name
admin:
type:boolean
description:Whether the user is an admin
groups:
type:array
description:The names of groups where this user is a member
items:
type:string
server:
type:string
description:The user's notebook server's base URL, if running; null if not.
pending:
type:string
enum:["spawn","stop",null]
description:The currently pending action, if any
last_activity:
type:string
format:date-time
description:Timestamp of last-seen activity from the user
servers:
type:object
description:The active servers for this user.
items:
schema:
$ref:'#/definitions/Server'
Server:
type:object
properties:
name:
type:string
description:The server's name. The user's default server has an empty name ('')
ready:
type:boolean
description:|
Whether the server is ready for traffic.
Will always be false when any transition is pending.
pending:
type:string
enum:["spawn","stop",null]
description:|
The currently pending action, if any.
A server is not ready if an action is pending.
url:
type:string
description:|
The URL where the server can be accessed
(typically /user/:name/:server.name/).
progress_url:
type:string
description:|
The URL for an event-stream to retrieve events during a spawn.
started:
type:string
format:date-time
description:UTC timestamp when the server was last started.
last_activity:
type:string
format:date-time
description:UTC timestamp last-seen activity on this server.
state:
type:object
description:Arbitrary internal state from this server's spawner. Only available on the hub's users list or get-user-by-name method, and only if a hub admin. None otherwise.
Group:
type:object
properties:
name:
type:string
description:The group's name
users:
type:array
description:The names of users who are members of this group
items:
type:string
Service:
type:object
properties:
name:
type:string
description:The service's name
admin:
type:boolean
description:Whether the service is an admin
url:
type:string
description:The internal url where the service is running
prefix:
type:string
description:The proxied URL prefix to the service's url
pid:
type:number
description:The PID of the service process (if managed)
command:
type:array
description:The command used to start the service (if managed)
items:
type:string
info:
type:object
description:|
Additional information a deployment can attach to a service.
JupyterHub does not use this field.
Token:
type:object
properties:
token:
type:string
description:The token itself. Only present in responses to requests for a new token.
id:
type:string
description:The id of the API token. Used for modifying or deleting the token.
user:
type:string
description:The user that owns a token (undefined if owned by a service)
service:
type:string
description:The service that owns the token (undefined of owned by a user)
note:
type:string
description:A note about the token, typically describing what it was created for.
created:
type:string
format:date-time
description:Timestamp when this token was created
expires_at:
type:string
format:date-time
description:Timestamp when this token expires. Null if there is no expiry.
JupyterHub also provides a REST API for administration of the Hub and users.
The documentation on `Using JupyterHub's REST API <../reference/rest.html>`_ provides
information on:
- what you can do with the API
- creating an API token
- adding API tokens to the config files
- making an API request programmatically using the requests library
- learning more about JupyterHub's API
The same JupyterHub API spec, as found here, is available in an interactive form
`here (on swagger's petstore) <http://petstore.swagger.io/?url=https://raw.githubusercontent.com/jupyterhub/jupyterhub/master/docs/rest-api.yml#!/default>`__.
The `OpenAPI Initiative`_ (fka Swagger™) is a project used to describe
For detailed changes from the prior release, click on the version number, and
its link will bring up a GitHub listing of changes. Use `git log` on the
command line for details.
## [Unreleased]
## 0.9
### [0.9.2] 2018-08-10
JupyterHub 0.9.2 contains small bugfixes and improvements.
- Documentation and example improvements
- Add `Spawner.consecutive_failure_limit` config for aborting the Hub if too many spawns fail in a row.
- Fix for handling SIGTERM when run with asyncio (tornado 5)
- Windows compatibility fixes
### [0.9.1] 2018-07-04
JupyterHub 0.9.1 contains a number of small bugfixes on top of 0.9.
- Use a PID file for the proxy to decrease the likelihood that a leftover proxy process will prevent JupyterHub from restarting
-`c.LocalProcessSpawner.shell_cmd` is now configurable
- API requests to stopped servers (requests to the hub for `/user/:name/api/...`) fail with 404 rather than triggering a restart of the server
- Compatibility fix for notebook 5.6.0 which will introduce further
security checks for local connections
- Managed services always use localhost to talk to the Hub if the Hub listening on all interfaces
- When using a URL prefix, the Hub route will be `JupyterHub.base_url` instead of unconditionally `/`
- additional fixes and improvements
### [0.9.0] 2018-06-15
JupyterHub 0.9 is a major upgrade of JupyterHub.
There are several changes to the database schema,
so make sure to backup your database and run:
jupyterhub upgrade-db
after upgrading jupyterhub.
The biggest change for 0.9 is the switch to asyncio coroutines everywhere
instead of tornado coroutines. Custom Spawners and Authenticators are still
free to use tornado coroutines for async methods, as they will continue to
work. As part of this upgrade, JupyterHub 0.9 drops support for Python <3.5
andtornado<5.0.
#### Changed
-RequirePython>= 3.5
- Require tornado >= 5.0
- Use asyncio coroutines throughout
- Set status 409 for conflicting actions instead of 400,
e.g. creating users or groups that already exist.
- timestamps in REST API continue to be UTC, but now include 'Z' suffix
to identify them as such.
- REST API User model always includes `servers` dict,
not just when named servers are enabled.
-`server` info is no longer available to oauth identification endpoints,
only user info and group membership.
-`User.last_activity` may be None if a user has not been seen,
rather than starting with the user creation time
which is now separately stored as `User.created`.
- static resources are now found in `$PREFIX/share/jupyterhub` instead of `share/jupyter/hub` for improved consistency.
- Deprecate `.extra_log_file` config. Use pipe redirection instead:
jupyterhub &>> /var/log/jupyterhub.log
- Add `JupyterHub.bind_url` config for setting the full bind URL of the proxy.
Sets ip, port, base_url all at once.
- Add `JupyterHub.hub_bind_url` for setting the full host+port of the Hub.
`hub_bind_url` supports unix domain sockets, e.g.
`unix+http://%2Fsrv%2Fjupyterhub.sock`
- Deprecate `JupyterHub.hub_connect_port` config in favor of `JupyterHub.hub_connect_url`. `hub_connect_ip` is not deprecated
and can still be used in the common case where only the ip address of the hub differs from the bind ip.
#### Added
- Spawners can define a `.progress` method which should be an async generator.
The generator should yield events of the form:
```python
{
"message": "some-state-message",
"progress": 50,
}
```
These messages will be shown with a progress bar on the spawn-pending page.
The `async_generator` package can be used to make async generators
compatible with Python 3.5.
- track activity of individual API tokens
- new REST API for managing API tokens at `/hub/api/user/tokens[/token-id]`
- allow viewing/revoking tokens via token page
- User creation time is available in the REST API as `User.created`
- Server start time is stored as `Server.started`
- `Spawner.start` may return a URL for connecting to a notebook instead of `(ip, port)`. This enables Spawners to launch servers that setup their own HTTPS.
- Optimize database performance by disabling sqlalchemy expire_on_commit by default.
- Add `python -m jupyterhub.dbutil shell` entrypoint for quickly
launching an IPython session connected to your JupyterHub database.
- Include `User.auth_state` in user model on single-user REST endpoints for admins only.
- Include `Server.state` in server model on REST endpoints for admins only.
- Add `Authenticator.blacklist` for blacklisting users instead of whitelisting.
- Pass `c.JupyterHub.tornado_settings['cookie_options']` down to Spawners
so that cookie options (e.g. `expires_days`) can be set globally for the whole application.
- SIGINFO (`ctrl-t`) handler showing the current status of all running threads,
coroutines, and CPU/memory/FD consumption.
- Add async `Spawner.get_options_form` alternative to `.options_form`, so it can be a coroutine.
- Add `JupyterHub.redirect_to_server` config to govern whether
users should be sent to their server on login or the JuptyerHub home page.
- html page templates can be more easily customized and extended.
- Allow registering external OAuth clients for using the Hub as an OAuth provider.
- Add basic prometheus metrics at `/hub/metrics` endpoint.
- Add session-id cookie, enabling immediate revocation of login tokens.
- Authenticators may specify that users are admins by specifying the `admin` key when return the user model as a dict.
- Added "Start All" button to admin page for launching all user servers at once.
- Services have an `info` field which is a dictionary.
This is accessible via the REST API.
- `JupyterHub.extra_handlers` allows defining additional tornado RequestHandlers attached to the Hub.
- API tokens may now expire.
Expiry is available in the REST model as `expires_at`,
and settable when creating API tokens by specifying `expires_in`.
#### Fixed
- Remove green from theme to improve accessibility
- Fix error when proxy deletion fails due to route already being deleted
- clear `?redirects` from URL on successful launch
- disable send2trash by default, which is rarely desirable for jupyterhub
- Put PAM calls in a thread so they don't block the main application
in cases where PAM is slow (e.g. LDAP).
- Remove implicit spawn from login handler,
instead relying on subsequent request for `/user/:name` to trigger spawn.
- Fixed several inconsistencies for initial redirects,
depending on whether server is running or not and whether the user is logged in or not.
- Admin requests for `/user/:name` (when admin-access is enabled) launch the right server if it's not running instead of redirecting to their own.
- Major performance improvement starting up JupyterHub with many users,
especially when most are inactive.
- Various fixes in race conditions and performance improvements with the default proxy.
- Fixes for CORS headers
- Stop setting `.form-control` on spawner form inputs unconditionally.
- Better recovery from database errors and database connection issues
without having to restart the Hub.
- Fix handling of `~` character in usernames.
- Fix jupyterhub startup when `getpass.getuser()` would fail,
e.g. due to missing entry in passwd file in containers.
## 0.8
### [0.8.1] 2017-11-07
JupyterHub 0.8.1 is a collection of bugfixes and small improvements on 0.8.
#### Added
- Run tornado with AsyncIO by default
- Add `jupyterhub --upgrade-db` flag for automatically upgrading the database as part of startup.
This is useful for cases where manually running `jupyterhub upgrade-db`
as a separate step is unwieldy.
- Avoid creating backups of the database when no changes are to be made by
`jupyterhub upgrade-db`.
#### Fixed
- Add some further validation to usernames - `/` is not allowed in usernames.
- Fix empty logout page when using auto_login
- Fix autofill of username field in default login form.
- Fix listing of users on the admin page who have not yet started their server.
- Fix ever-growing traceback when re-raising Exceptions from spawn failures.
- Remove use of deprecated `bower` for javascript client dependencies.
### [0.8.0] 2017-10-03
JupyterHub 0.8 is a big release!
Perhaps the biggest change is the use of OAuth to negotiate authentication
between the Hub and single-user services.
Due to this change, it is important that the single-user server
and Hub are both running the same version of JupyterHub.
If you are using containers (e.g. via DockerSpawner or KubeSpawner),
this means upgrading jupyterhub in your user images at the same time as the Hub.
In most cases, a
pip install jupyterhub==version
in your Dockerfile is sufficient.
#### Added
- JupyterHub now defined a `Proxy` API for custom
proxy implementations other than the default.
The defaults are unchanged,
but configuration of the proxy is now done on the `ConfigurableHTTPProxy` class instead of the top-level JupyterHub.
TODO: docs for writing a custom proxy.
- Single-user servers and services
(anything that uses HubAuth)
can now accept token-authenticated requests via the Authentication header.
- Authenticators can now store state in the Hub's database.
To do so, the `authenticate` method should return a dict of the form
```python
{
'username': 'name',
'state': {}
}
```
This data will be encrypted and requires `JUPYTERHUB_CRYPT_KEY` environment variable to be set
and the `Authenticator.enable_auth_state` flag to be True.
If these are not set, auth_state returned by the Authenticator will not be stored.
- There is preliminary support for multiple (named) servers per user in the REST API.
Named servers can be created via API requests, but there is currently no UI for managing them.
- Add `LocalProcessSpawner.popen_kwargs` and `LocalProcessSpawner.shell_cmd`
for customizing how user server processes are launched.
- Add `Authenticator.auto_login` flag for skipping the "Login with..." page explicitly.
- Add `JupyterHub.hub_connect_ip` configuration
for the ip that should be used when connecting to the Hub.
This is promoting (and deprecating) `DockerSpawner.hub_ip_connect`
for use by all Spawners.
- Add `Spawner.pre_spawn_hook(spawner)` hook for customizing
pre-spawn events.
- Add `JupyterHub.active_server_limit` and `JupyterHub.concurrent_spawn_limit`
for limiting the total number of running user servers and the number of pending spawns, respectively.
#### Changed
- more arguments to spawners are now passed via environment variables (`.get_env()`)
rather than CLI arguments (`.get_args()`)
- internally generated tokens no longer get extra hash rounds,
significantly speeding up authentication.
The hash rounds were deemed unnecessary because the tokens were already
generated with high entropy.
- `JUPYTERHUB_API_TOKEN` env is available at all times,
rather than being removed during single-user start.
The token is now accessible to kernel processes,
enabling user kernels to make authenticated API requests to Hub-authenticated services.
- Cookie secrets should be 32B hex instead of large base64 secrets.
- pycurl is used by default, if available.
#### Fixed
So many things fixed!
- Collisions are checked when users are renamed
- Fix bug where OAuth authenticators could not logout users
due to being redirected right back through the login process.
- If there are errors loading your config files,
JupyterHub will refuse to start with an informative error.
Previously, the bad config would be ignored and JupyterHub would launch with default configuration.
- Raise 403 error on unauthorized user rather than redirect to login,
which could cause redirect loop.
- Set `httponly` on cookies because it's prudent.
- Improve support for MySQL as the database backend
- Many race conditions and performance problems under heavy load have been fixed.
- Fix alembic tagging of database schema versions.
#### Removed
- End support for Python 3.3
## 0.7
### [0.7.2] - 2017-01-09
#### Added
- Support service environment variables and defaults in `jupyterhub-singleuser`
for easier deployment of notebook servers as a Service.
- Add `--group` parameter for deploying `jupyterhub-singleuser` as a Service with group authentication.
- Include URL parameters when redirecting through `/user-redirect/`
### Fixed
- Fix group authentication for HubAuthenticated services
### [0.7.1] - 2017-01-02
#### Added
- `Spawner.will_resume` for signaling that a single-user server is paused instead of stopped.
This is needed for cases like `DockerSpawner.remove_containers = False`,
where the first API token is re-used for subsequent spawns.
- Warning on startup about single-character usernames,
caused by common `set('string')` typo in config.
#### Fixed
- Removed spurious warning about empty `next_url`, which is AOK.
### [0.7.0] - 2016-12-2
#### Added
- Implement Services API [\#705](https://github.com/jupyterhub/jupyterhub/pull/705)
- Add `/api/` and `/api/info` endpoints [\#675](https://github.com/jupyterhub/jupyterhub/pull/675)
- Add documentation for JupyterLab, pySpark configuration, troubleshooting,
and more.
- Add logging of error if adding users already in database. [\#689](https://github.com/jupyterhub/jupyterhub/pull/689)
- Add HubAuth class for authenticating with JupyterHub. This class can
be used by any application, even outside tornado.
- Add user groups.
- Add `/hub/user-redirect/...` URL for redirecting users to a file on their own server.
#### Changed
- Always install with setuptools but not eggs (effectively require
- statsd is an optional dependency, only needed if in use
- Notice more quickly when servers have crashed
- Better error pages for proxy errors
- Add Stop All button to admin panel for stopping all servers at once
### [0.6.0] - 2016-04-25
- JupyterHub has moved to a new `jupyterhub` namespace on GitHub and Docker. What was `juptyer/jupyterhub` is now `jupyterhub/jupyterhub`, etc.
- `jupyterhub/jupyterhub` image on DockerHub no longer loads the jupyterhub_config.py in an ONBUILD step. A new `jupyterhub/jupyterhub-onbuild` image does this
- Add statsd support, via `c.JupyterHub.statsd_{host,port,prefix}`
- Update to traitlets 4.1 `@default`, `@observe` APIs for traits
- Allow disabling PAM sessions via `c.PAMAuthenticator.open_sessions = False`. This may be needed on SELinux-enabled systems, where our PAM session logic often does not work properly
- Add `Spawner.environment` configurable, for defining extra environment variables to load for single-user servers
- JupyterHub API tokens can be pregenerated and loaded via `JupyterHub.api_tokens`, a dict of `token: username`.
- JupyterHub API tokens can be requested via the REST API, with a POST request to `/api/authorizations/token`.
This can only be used if the Authenticator has a username and password.
- Various fixes for user URLs and redirects
## [0.5] - 2016-03-07
- Single-user server must be run with Jupyter Notebook ≥ 4.0
- Require `--no-ssl` confirmation to allow the Hub to be run without SSL (e.g. behind SSL termination in nginx)
- Add lengths to text fields for MySQL support
- Add `Spawner.disable_user_config` for preventing user-owned configuration from modifying single-user servers.
- Fixes for MySQL support.
- Add ability to run each user's server on its own subdomain. Requires wildcard DNS and wildcard SSL to be feasible. Enable subdomains by setting `JupyterHub.subdomain_host = 'https://jupyterhub.domain.tld[:port]'`.
- Use `127.0.0.1` for local communication instead of `localhost`, avoiding issues with DNS on some systems.
- Fix race that could add users to proxy prematurely if spawning is slow.
## 0.4
### [0.4.1] - 2016-02-03
Fix removal of `/login` page in 0.4.0, breaking some OAuth providers.
### [0.4.0] - 2016-02-01
- Add `Spawner.user_options_form` for specifying an HTML form to present to users,
allowing users to influence the spawning of their own servers.
- Add `Authenticator.pre_spawn_start` and `Authenticator.post_spawn_stop` hooks,
so that Authenticators can do setup or teardown (e.g. passing credentials to Spawner,
mounting data sources, etc.).
These methods are typically used with custom Authenticator+Spawner pairs.
- 0.4 will be the last JupyterHub release where single-user servers running IPython 3 is supported instead of Notebook ≥ 4.0.
## [0.3] - 2015-11-04
- No longer make the user starting the Hub an admin
- start PAM sessions on login
- hooks for Authenticators to fire before spawners start and after they stop,
allowing deeper interaction between Spawner/Authenticator pairs.
- login redirect fixes
## [0.2] - 2015-07-12
- Based on standalone traitlets instead of IPython.utils.traitlets
We use different channels of communication for different purposes. Whichever one you use will depend on what kind of communication you want to engage in.
## Discourse (recommended)
We use [Discourse](https://discourse.jupyter.org) for online discussions and support questions.
You can ask questions here if you are a first-time contributor to the JupyterHub project.
Everyone in the Jupyter community is welcome to bring ideas and questions there.
We recommend that you first use our Discourse as all past and current discussions on it are archived and searchable. Thus, all discussions remain useful and accessible to the whole community.
## Gitter
We use [our Gitter channel](https://gitter.im/jupyterhub/jupyterhub) for online, real-time text chat; a place for more ephemeral discussions. When you're not on Discourse, you can stop here to have other discussions on the fly.
## Github Issues
[Github issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues) are used for most long-form project discussions, bug reports and feature requests.
- Issues related to a specific authenticator or spawner should be opened in the appropriate repository for the authenticator or spawner.
- If you are using a specific JupyterHub distribution (such as [Zero to JupyterHub on Kubernetes](https://github.com/jupyterhub/zero-to-jupyterhub-k8s) or [The Littlest JupyterHub](https://github.com/jupyterhub/the-littlest-jupyterhub/)), you should open issues directly in their repository.
- If you cannot find a repository to open your issue in, do not worry! Open the issue in the [main JupyterHub repository](https://github.com/jupyterhub/jupyterhub/) and our community will help you figure it out.
```{note}
Our community is distributed across the world in various timezones, so please be patient if you do not get a response immediately!
Documentation is often more important than code. This page helps
you get set up on how to contribute to JupyterHub's documentation.
## Building documentation locally
We use [sphinx](https://www.sphinx-doc.org) to build our documentation. It takes
our documentation source files (written in [markdown](https://daringfireball.net/projects/markdown/) or [reStructuredText](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html) &
stored under the `docs/source` directory) and converts it into various
formats for people to read. To make sure the documentation you write or
change renders correctly, it is good practice to test it locally.
1. Make sure you have successfully completed {ref}`contributing/setup`.
2. Install the packages required to build the docs.
```bash
python3 -m pip install -r docs/requirements.txt
```
3. Build the html version of the docs. This is the most commonly used
output format, so verifying it renders correctly is usually good
enough.
```bash
cd docs
make html
```
This step will display any syntax or formatting errors in the documentation,
along with the filename / line number in which they occurred. Fix them,
and re-run the `make html` command to re-render the documentation.
4. View the rendered documentation by opening `_build/html/index.html` in
a web browser.
:::{tip}
**On Windows**, you can open a file from the terminal with `start <path-to-file>`.
**On macOS**, you can do the same with `open <path-to-file>`.
**On Linux**, you can do the same with `xdg-open <path-to-file>`.
After opening index.html in your browser you can just refresh the page whenever
you rebuild the docs via `make html`
:::
(contributing-docs-conventions)=
## Documentation conventions
This section lists various conventions we use in our documentation. This is a
living document that grows over time, so feel free to add to it / change it!
Our entire documentation does not yet fully conform to these conventions yet,
so help in making it so would be appreciated!
### `pip` invocation
There are many ways to invoke a `pip` command, we recommend the following
approach:
```bash
python3 -m pip
```
This invokes pip explicitly using the python3 binary that you are
currently using. This is the **recommended way** to invoke pip
in our documentation, since it is least likely to cause problems
with python3 and pip being from different environments.
For more information on how to invoke `pip` commands, see
We want you to contribute to JupyterHub in ways that are most exciting
and useful to you. We value documentation, testing, bug reporting & code equally,
and are glad to have your contributions in whatever form you wish.
Be sure to first check our [Code of Conduct](https://github.com/jupyter/governance/blob/HEAD/conduct/code_of_conduct.md)
([reporting guidelines](https://github.com/jupyter/governance/blob/HEAD/conduct/reporting_online.md)), which help keep our community welcoming to as many people as possible.
This section covers information about our community, as well as ways that you can connect and get involved.
for development & collaboration. You need to [install git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) to work on
JupyterHub. We also recommend getting a free account on GitHub.com.
## Setting up a development install
When developing JupyterHub, you would need to make changes and be able to instantly view the results of the changes. To achieve that, a developer install is required.
:::{note}
This guide does not attempt to dictate _how_ development
environments should be isolated since that is a personal preference and can
be achieved in many ways, for example, `tox`, `conda`, `docker`, etc. See this
[forum thread](https://discourse.jupyter.org/t/thoughts-on-using-tox/3497) for
a more detailed discussion.
:::
1. Clone the [JupyterHub git repository](https://github.com/jupyterhub/jupyterhub)
- `admin_user`: creates a new temporary admin user
- single user servers
\- `cleanup_after`: allows cleanup of single user servers between tests
- mocked service
\- `MockServiceSpawner`: a spawner that mocks services for testing with a short poll interval
\- `` mockservice` ``: mocked service with no external service url
\- `mockservice_url`: mocked service with a url to test external services
And fixtures to add functionality or spawning behavior:
- `admin_access`: grants admin access
- `` no_patience` ``: sets slow-spawning timeouts to zero
- `slow_spawn`: enables the SlowSpawner (a spawner that takes a few seconds to start)
- `never_spawn`: enables the NeverSpawner (a spawner that will never start)
- `bad_spawn`: enables the BadSpawner (a spawner that fails immediately)
- `slow_bad_spawn`: enables the SlowBadSpawner (a spawner that fails after a short delay)
Refer to the [pytest fixtures documentation](https://pytest.readthedocs.io/en/latest/fixture.html) to learn how to use fixtures that exists already and to create new ones.
### The Pytest-Asyncio Plugin
When testing the various JupyterHub components and their various implementations, it sometimes becomes necessary to have a running instance of JupyterHub to test against.
The [`app`](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/conftest.py#L60) fixture mocks a JupyterHub application for use in testing by:
- enabling ssl if internal certificates are available
- creating an instance of [MockHub](https://github.com/jupyterhub/jupyterhub/blob/270b61992143b29af8c2fab90c4ed32f2f6fe209/jupyterhub/tests/mocking.py#L221) using any provided configurations as arguments
- initializing the mocked instance
- starting the mocked instance
- finally, a registered finalizer function performs a cleanup and stops the mocked instance
The JupyterHub test suite uses the [pytest-asyncio plugin](https://pytest-asyncio.readthedocs.io/en/latest/) that handles [event-loop](https://docs.python.org/3/library/asyncio-eventloop.html) integration in [Tornado](https://www.tornadoweb.org/en/stable/) applications. This allows for the use of top-level awaits when calling async functions or [fixtures](https://docs.pytest.org/en/6.2.x/fixture.html#what-fixtures-are) during testing. All test functions and fixtures labelled as `async` will run on the same event loop.
```{note}
With the introduction of [top-level awaits](https://piccolo-orm.com/blog/top-level-await-in-python/), the use of the `io_loop` fixture of the [pytest-tornado plugin](https://www.tornadoweb.org/en/stable/ioloop.html) is no longer necessary. It was initially used to call coroutines. With the upgrades made to `pytest-asyncio`, this usage is now deprecated. It is now, only utilized within the JupyterHub test suite to ensure complete cleanup of resources used during testing such as open file descriptors. This is demonstrated in this [pull request](https://github.com/jupyterhub/jupyterhub/pull/4332).
More information is provided below.
```
One of the general goals of the [JupyterHub Pytest Plugin project](https://github.com/jupyterhub/pytest-jupyterhub) is to ensure the MockHub cleanup fully closes and stops all utilized resources during testing so the use of the `io_loop` fixture for teardown is not necessary. This was highlighted in this [issue](https://github.com/jupyterhub/pytest-jupyterhub/issues/30)
For more information on asyncio and event-loops, here are some resources:
- **Read**: [Introduction to the Python event loop](https://www.pythontutorial.net/python-concurrency/python-event-loop)
- **Read**: [Overview of Async IO in Python 3.7](https://stackabuse.com/overview-of-async-io-in-python-3-7)
- **Watch**: [Asyncio: Understanding Async / Await in Python](https://www.youtube.com/watch?v=bs9tlDFWWdQ)
- **Watch**: [Learn Python's AsyncIO #2 - The Event Loop](https://www.youtube.com/watch?v=E7Yn5biBZ58)
## Troubleshooting Test Failures
### All the tests are failing
Make sure you have completed all the steps in {ref}`contributing/setup` successfully, and are able to access JupyterHub from your browser at http://localhost:8000 after starting `jupyterhub` in your command line.
## Code formatting and linting
JupyterHub automatically enforces code formatting. This means that pull requests
with changes breaking this formatting will receive a commit from pre-commit.ci
automatically.
To automatically format code locally, you can install pre-commit and register a
_git hook_ to automatically check with pre-commit before you make a commit if
the formatting is okay.
```bash
pip install pre-commit
pre-commit install --install-hooks
```
To run pre-commit manually you would do:
```bash
# check for changes to code not yet committed
pre-commit run
# check for changes also in already committed code
pre-commit run --all-files
```
You may also install [black integration](https://github.com/psf/black#editor-integration)
into your text editor to format code automatically.
| Requests too high | Unnecessarily high cost and/or low capacity. |
| CPU limit too low | Poor performance experienced by users |
| CPU oversubscribed (too-low request + too-high limit) | Poor performance across the system; may crash, if severe |
| Memory limit too low | Servers killed by Out-of-Memory Killer (OOM); lost work for users |
| Memory oversubscribed (too-low request + too-high limit) | System memory exhaustion - all kinds of hangs and crashes and weird errors. Very bad. |
Note that the 'oversubscribed' problem case is where the request is lower than _typical_ usage,
meaning that the total reserved resources isn't enough for the total _actual_ consumption.
This doesn't mean that _all_ your users exceed the request,
just that the _limit_ gives enough room for the _average_ user to exceed the request.
All of these considerations are important _per node_.
Larger nodes means more users per node, and therefore more users to average over.
It also means more chances for multiple outliers on the same node.
### Example case for oversubscribing memory
Take for example, this system and sampling of user behavior:
- System memory = 8G
- memory request = 1G, limit = 3G
- typical 'heavy' user: 2G
- typical 'light' user: 0.5G
This will assign 8 users to those 8G of RAM (remember: only requests are used for deciding when a machine is 'full').
As long as the total of 8 users _actual_ usage is under 8G, everything is fine.
But the _limit_ allows a total of 24G to be used,
which would be a mess if everyone used their full limit.
But _not_ everyone uses the full limit, which is the point!
This pattern is fine if 1/8 of your users are 'heavy' because _typical_ usage will be ~0.7G,
and your total usage will be ~5G (`1 × 2 + 7 × 0.5 = 5.5`).
But if _50%_ of your users are 'heavy' you have a problem because that means your users will be trying to use 10G (`4 × 2 + 4 × 0.5 = 10`),
which you don't have.
You can make guesses at these numbers, but the only _real_ way to get them is to measure (see [](measuring)).
### CPU:memory ratio
Most of the time, you'll find that only one resource is the limiting factor for your users.
Most often it's memory, but for certain tasks, it could be CPU (or even GPUs).
Many cloud deployments have just one or a few fixed ratios of cpu to memory
(e.g. 'general purpose', 'high memory', and 'high cpu').
Setting your secondary resource allocation according to this ratio
after selecting the more important limit results in a balanced resource allocation.
For instance, some of Google Cloud's ratios are:
| node type | GB RAM / CPU core |
| ----------- | ----------------- |
| n2-highmem | 8 |
| n2-standard | 4 |
| n2-highcpu | 1 |
(idleness)=
### Idleness
Jupyter being an interactive tool means people tend to spend a lot more time reading and thinking than actually running resource-intensive code.
This significantly affects how much _cpu_ resources a typical active user needs,
but often does not significantly affect the _memory_.
Ways to think about this:
- More idle users means unused CPU.
This generally means setting your CPU _limit_ higher than your CPU _request_.
- What do your users do when they _are_ running code?
Is it typically single-threaded local computation in a notebook?
If so, there's little reason to set a limit higher than 1 CPU core.
- Do typical computations take a long time, or just a few seconds?
Longer typical computations means it's more likely for users to be trying to use the CPU at the same moment,
suggesting a higher _request_.
- Even with idle users, parallel computation adds up quickly - one user fully loading 4 cores and 3 using almost nothing still averages to more than a full CPU core per user.
Again, using mybinder.org as an example—we run around 100 users on 8-core nodes,
and still see fairly _low_ overall CPU usage on each user node.
The limit here is actually Kubernetes' pods per node, not memory _or_ CPU.
This is likely a extreme case, as many Binder users come from clicking links on webpages
without any actual intention of running code.
```{figure} /images/mybinder-load5.png
mybinder.org node CPU usage is low with 50-150 users sharing just 8 cores
```
### Concurrent users and culling idle servers
Related to [][idleness], all of these resource consumptions and limits are calculated based on **concurrently active users**,
not total users.
You might have 10,000 users of your JupyterHub deployment, but only 100 of them running at any given time.
That 100 is the main number you need to use for your capacity planning.
JupyterHub costs scale very little based on the number of _total_ users,
up to a point.
There are two important definitions for **active user**:
- Are they _actually_ there (i.e. a human interacting with Jupyter, or running code that might be )
- Is their server running (this is where resource reservations and limits are actually applied)
Connecting those two definitions (how long are servers running if their humans aren't using them) is an important area of deployment configuration, usually implemented via the [JupyterHub idle culler service][idle-culler].
There are a lot of considerations when it comes to culling idle users that will depend:
- How much does it save me to shut down user servers? (e.g. keeping an elastic cluster small, or keeping a fixed-size deployment available to active users)
- How much does it cost my users to have their servers shut down? (e.g. lost work if shutdown prematurely)
- How easy do I want it to be for users to keep their servers running? (e.g. Do they want to run unattended simulations overnight? Do you want them to?)
Like many other things in this guide, there are many correct answers leading to different configuration choices.
For more detail on culling configuration and considerations, consult the [JupyterHub idle culler documentation][idle-culler].
## More tips
### Start strict and generous, then measure
A good tip, in general, is to give your users as much resources as you can afford that you think they _might_ use.
Then, use resource usage metrics like prometheus to analyze what your users _actually_ need,
and tune accordingly.
Remember: **Limits affect your user experience and stability. Requests mostly affect your costs**.
For example, a sensible starting point (lacking any other information) might be:
```yaml
request:
cpu: 0.5
mem: 2G
limit:
cpu: 1
mem: 2G
```
(more memory if significant computations are likely - machine learning models, data analysis, etc.)
Some actions
- If you see out-of-memory killer events, increase the limit (or talk to your users!)
- If you see typical memory well below your limit, reduce the request (but not the limit)
- If _nobody_ uses that much memory, reduce your limit
- If CPU is your limiting scheduling factor and your CPUs are mostly idle,
reduce the cpu request (maybe even to 0!).
- If CPU usage continues to be low, increase the limit to 2 or 4 to allow bursts of parallel execution.
(measuring)=
### Measuring user resource consumption
It is _highly_ recommended to deploy monitoring services such as [Prometheus][]
and [Grafana][] to get a view of your users' resource usage.
This is the only way to truly know what your users need.
JupyterHub has some experimental [grafana dashboards][] you can use as a starting point,
to keep an eye on your resource usage.
Here are some sample charts from (again from mybinder.org),
showing >90% of users using less than 10% CPU and 200MB,
but a few outliers near the limit of 1 CPU and 2GB of RAM.
This is the kind of information you can use to tune your requests and limits.

JupyterHub uses a database to store information about users, services, and other data needed for operating the Hub.
This is the **state** of the Hub.
## Why does JupyterHub have a database?
JupyterHub is a **stateful** application (more on that 'state' later).
Updating JupyterHub's configuration or upgrading the version of JupyterHub requires restarting the JupyterHub process to apply the changes.
We want to minimize the disruption caused by restarting the Hub process, so it can be a mundane, frequent, routine activity.
Storing state information outside the process for later retrieval is necessary for this, and one of the main thing databases are for.
A lot of the operations in JupyterHub are also **relationships**, which is exactly what SQL databases are great at.
For example:
- Given an API token, what user is making the request?
- Which users don't have running servers?
- Which servers belong to user X?
- Which users have not been active in the last 24 hours?
Finally, a database allows us to have more information stored without needing it all loaded in memory,
e.g. supporting a large number (several thousands) of inactive users.
## What's in the database?
The short answer of what's in the JupyterHub database is "everything."
JupyterHub's **state** lives in the database.
That is, everything JupyterHub needs to be aware of to function that _doesn't_ come from the configuration files, such as
- users, roles, role assignments
- state, urls of running servers
- Hashed API tokens
- Short-lived state related to OAuth flow
- Timestamps for when users, tokens, and servers were last used
### What's _not_ in the database
Not _quite_ all of JupyterHub's state is in the database.
This mostly involves transient state, such as the 'pending' transitions of Spawners (starting, stopping, etc.).
Anything not in the database must be reconstructed on Hub restart, and the only sources of information to do that are the database and JupyterHub configuration file(s).
## How does JupyterHub use the database?
JupyterHub makes some _unusual_ choices in how it connects to the database.
These choices represent trade-offs favoring single-process simplicity and performance at the expense of horizontal scalability (multiple Hub instances).
We often say that the Hub 'owns' the database.
This ownership means that we assume the Hub is the only process that will talk to the database.
This assumption enables us to make several caching optimizations that dramatically improve JupyterHub's performance (i.e. data written recently to the database can be read from memory instead of fetched again from the database) that would not work if multiple processes could be interacting with the database at the same time.
Database operations are also synchronous, so while JupyterHub is waiting on a database operation, it cannot respond to other requests.
This allows us to avoid complex locking mechanisms, because transaction races can only occur during an `await`, so we only need to make sure we've completed any given transaction before the next `await` in a given request.
:::{note}
We are slowly working to remove these assumptions, and moving to a more traditional db session per-request pattern.
This will enable multiple Hub instances and enable scaling JupyterHub, but will significantly reduce the number of active users a single Hub instance can serve.
:::
### Database performance in a typical request
Most authenticated requests to JupyterHub involve a few database transactions:
1. look up the authenticated user (e.g. look up token by hash, then resolve owner and permissions)
2. record activity
3. perform any relevant changes involved in processing the request (e.g. create the records for a running server when starting one)
This means that the database is involved in almost every request, but only in quite small, simple queries, e.g.:
- lookup one token by hash
- lookup one user by name
- list tokens or servers for one user (typically 1-10)
- etc.
### The database as a limiting factor
As a result of the above transactions in most requests, database performance is the _leading_ factor in JupyterHub's baseline requests-per-second performance, but that cost does not scale significantly with the number of users, active or otherwise.
However, the database is _rarely_ a limiting factor in JupyterHub performance in a practical sense, because the main thing JupyterHub does is start, stop, and monitor whole servers, which take far more time than any small database transaction, no matter how many records you have or how slow your database is (within reason).
Additionally, there is usually _very_ little load on the database itself.
By far the most taxing activity on the database is the 'list all users' endpoint, primarily used by the [idle-culling service](https://github.com/jupyterhub/jupyterhub-idle-culler).
Database-based optimizations have been added to make even these operations feasible for large numbers of users:
1. State filtering on [GET /hub/api/users?state=active](../reference/rest-api.html#/default/get_users){.external},
which limits the number of results in the query to only the relevant subset (added in JupyterHub 1.3), rather than all users.
2. [Pagination](api-pagination) of all list endpoints, allowing the request of a large number of resources to be more fairly balanced with other Hub activities across multiple requests (added in 2.0).
:::{note}
It's important to note when discussing performance and limiting factors and that all of this only applies to requests to `/hub/...`.
The Hub and its database are not involved in most requests to single-user servers (`/user/...`), which is by design, and largely motivated by the fact that the Hub itself doesn't _need_ to be fast because its operations are infrequent and large.
:::
## Database backends
JupyterHub supports a variety of database backends via [SQLAlchemy][].
The default is sqlite, which works great for many cases, but you should be able to use many backends supported by SQLAlchemy.
Usually, this will mean PostgreSQL or MySQL, both of which are officially supported and well tested with JupyterHub, but others may work as well.
See [SQLAlchemy's docs][sqlalchemy-dialect] for how to connect to different database backends.
Doing so generally involves:
1. installing a Python package that provides a client implementation, and
2. setting [](JupyterHub.db_url) to connect to your database with the specified implementation
The default database backend for JupyterHub is [SQLite](https://sqlite.org).
We have chosen SQLite as JupyterHub's default because it's simple (the 'database' is a single file) and ubiquitous (it is in the Python standard library).
It works very well for testing, small deployments, and workshops.
For production systems, SQLite has some disadvantages when used with JupyterHub:
-`upgrade-db` may not always work, and you may need to start with a fresh database
-`downgrade-db`**will not** work if you want to rollback to an earlier
version, so backup the `jupyterhub.sqlite` file before upgrading (JupyterHub automatically creates a date-stamped backup file when upgrading sqlite)
The sqlite documentation provides a helpful page about [when to use SQLite and
where traditional RDBMS may be a better choice](https://sqlite.org/whentouse.html).
### Picking your database backend (PostgreSQL, MySQL)
When running a long term deployment or a production system, we recommend using a full-fledged relational database, such as [PostgreSQL](https://www.postgresql.org) or [MySQL](https://www.mysql.com), that supports the SQL `ALTER TABLE` statement, which is used in some database upgrade steps.
In general, you select your database backend with [](JupyterHub.db_url), and can further configure it (usually not necessary) with [](JupyterHub.db_kwargs).
## Notes and Tips
### SQLite
The SQLite database should not be used on NFS. SQLite uses reader/writer locks
to control access to the database. This locking mechanism might not work
correctly if the database file is kept on an NFS filesystem. This is because
`fcntl()` file locking is broken on many NFS implementations. Therefore, you
should avoid putting SQLite database files on NFS since it will not handle well
multiple processes which might try to access the file at the same time.
### PostgreSQL
We recommend using PostgreSQL for production if you are unsure whether to use
MySQL or PostgreSQL or if you do not have a strong preference.
There is additional configuration required for MySQL that is not needed for PostgreSQL.
For example, to connect to a postgres database with psycopg2:
1. install psycopg2: `pip instal psycopg2` (or `psycopg2-binary` to avoid compilation, which is [not recommended for production][psycopg2-binary])
2. set authentication via environment variables `PGUSER` and `PGPASSWORD`
- You should probably use the `pymysql` or `mysqlclient` sqlalchemy provider, or another backend [recommended by sqlalchemy](https://docs.sqlalchemy.org/en/20/dialects/mysql.html#dialect-mysql)
- You also need to set `pool_recycle` to some value (typically 60 - 300, JupyterHub will default to 60)
which depends on your MySQL setup. This is necessary since MySQL kills
connections serverside if they've been idle for a while, and the connection
from the hub will be idle for longer than most connections. This behavior
will lead to frustrating 'the connection has gone away' errors from
sqlalchemy if `pool_recycle` is not set.
- If you use `utf8mb4` collation with MySQL earlier than 5.7.7 or MariaDB
earlier than 10.2.1 you may get an `1709, Index column size too large` error.
To fix this you need to set `innodb_large_prefix` to enabled and
`innodb_file_format` to `Barracuda` to allow for the index sizes jupyterhub
uses. `row_format` will be set to `DYNAMIC` as long as those options are set
correctly. Later versions of MariaDB and MySQL should set these values by
default, as well as have a default `DYNAMIC` `row_format` and pose no trouble
to users.
For example, to connect to a mysql database with mysqlclient:
_Explanation_ documentation provide big-picture descriptions of how JupyterHub works. This section is meant to build your understanding of particular topics.
JupyterHub uses [OAuth 2](https://oauth.net/2/) as an internal mechanism for authenticating users.
As such, JupyterHub itself always functions as an OAuth **provider**.
You can find out more about what that means [below](oauth-terms).
Additionally, JupyterHub is _often_ deployed with [OAuthenticator](https://oauthenticator.readthedocs.io),
where an external identity provider, such as GitHub or KeyCloak, is used to authenticate users.
When this is the case, there are _two_ nested OAuth flows:
an _internal_ OAuth flow where JupyterHub is the **provider**,
and an _external_ OAuth flow, where JupyterHub is the **client**.
This means that when you are using JupyterHub, there is always _at least one_ and often two layers of OAuth involved in a user logging in and accessing their server.
The following points are noteworthy:
- Single-user servers _never_ need to communicate with or be aware of the upstream provider configured in your Authenticator.
As far as the servers are concerned, only JupyterHub is an OAuth provider,
and how users authenticate with the Hub itself is irrelevant.
- When interacting with a single-user server,
there are ~always two tokens:
first, a token issued to the server itself to communicate with the Hub API,
and second, a per-user token in the browser to represent the completed login process and authorized permissions.
More on this [later](two-tokens).
(oauth-terms)=
## Key OAuth terms
Here are some key definitions to keep in mind when we are talking about OAuth.
You can also read more in detail [here](https://www.oauth.com/oauth2-servers/definitions/).
- **provider**: The entity responsible for managing identity and authorization;
always a web server.
JupyterHub is _always_ an OAuth provider for JupyterHub's components.
When OAuthenticator is used, an external service, such as GitHub or KeyCloak, is also an OAuth provider.
- **client**: An entity that requests OAuth **tokens** on a user's behalf;
generally a web server of some kind.
OAuth **clients** are services that _delegate_ authentication and/or authorization
to an OAuth **provider**.
JupyterHub _services_ or single-user _servers_ are OAuth **clients** of the JupyterHub **provider**.
When OAuthenticator is used, JupyterHub is itself _also_ an OAuth **client** for the external OAuth **provider**, e.g. GitHub.
- **browser**: A user's web browser, which makes requests and stores things like cookies.
- **token**: The secret value used to represent a user's authorization. This is the final product of the OAuth process.
- **code**: A short-lived temporary secret that the **client** exchanges
for a **token** at the conclusion of OAuth,
in what's generally called the "OAuth callback handler."
## One oauth flow
OAuth **flow** is what we call the sequence of HTTP requests involved in authenticating a user and issuing a token, ultimately used for authorizing access to a service or single-user server.
A single OAuth flow typically goes like this:
### OAuth request and redirect
1. A **browser** makes an HTTP request to an OAuth **client**.
2. There are no credentials, so the client _redirects_ the browser to an "authorize" page on the OAuth **provider** with some extra information:
- the OAuth **client ID** of the client itself.
- the **redirect URI** to be redirected back to after completion.
- the **scopes** requested, which the user should be presented with to confirm.
This is the "X would like to be able to Y on your behalf. Allow this?" page you see on all the "Login with ..." pages around the Internet.
3. During this authorize step,
the browser must be _authenticated_ with the provider.
This is often already stored in a cookie,
but if not the provider webapp must begin its _own_ authentication process before serving the authorization page.
This _may_ even begin another OAuth flow!
4. After the user tells the provider that they want to proceed with the authorization,
the provider records this authorization in a short-lived record called an **OAuth code**.
5. Finally, the oauth provider redirects the browser _back_ to the oauth client's "redirect URI"
(or "OAuth callback URI"),
with the OAuth code in a URL parameter.
That marks the end of the requests made between the **browser** and the **provider**.
### State after redirect
At this point:
- The browser is authenticated with the _provider_.
- The user's authorized permissions are recorded in an _OAuth code_.
- The _provider_ knows that the permissions requested by the OAuth client have been granted, but the client doesn't know this yet.
- All the requests so far have been made directly by the browser.
No requests have originated from the client or provider.
### OAuth Client Handles Callback Request
At this stage, we get to finish the OAuth process.
Let's dig into what the OAuth client does when it handles
the OAuth callback request.
- The OAuth client receives the _code_ and makes an API request to the _provider_ to exchange the code for a real _token_.
This is the first direct request between the OAuth _client_ and the _provider_.
- Once the token is retrieved, the client _usually_
makes a second API request to the _provider_
to retrieve information about the owner of the token (the user).
This is the step where behavior diverges for different OAuth providers.
Up to this point, all OAuth providers are the same, following the OAuth specification.
However, OAuth does not define a standard for issuing tokens in exchange for information about their owner or permissions ([OpenID Connect](https://openid.net/connect/) does that),
so this step may be different for each OAuth provider.
- Finally, the OAuth client stores its own record that the user is authorized in a cookie.
This could be the token itself, or any other appropriate representation of successful authentication.
- Now that credentials have been established,
the browser can be redirected to the _original_ URL where it started,
to try the request again.
If the client wasn't able to keep track of the original URL all this time
(not always easy!),
you might end up back at a default landing page instead of where you started the login process. This is frustrating!
😮💨 _phew_.
So that's _one_ OAuth process.
## Full sequence of OAuth in JupyterHub
Let's go through the above OAuth process in JupyterHub,
with specific examples of each HTTP request and what information it contains.
For bonus points, we are using the double-OAuth example of JupyterHub configured with GitHubOAuthenticator.
To disambiguate, we will call the OAuth process where JupyterHub is the **provider** "internal OAuth,"
and the one with JupyterHub as a **client** "external OAuth."
Our starting point:
- a user's single-user server is running. Let's call them `danez`
- Jupyterhub is running with GitHub as an OAuth provider (this means two full instances of OAuth),
- Danez has a fresh browser session with no cookies yet.
First request:
- browser->single-user server running JupyterLab or Jupyter Classic
-`GET /user/danez/notebooks/mynotebook.ipynb`
- no credentials, so single-user server (as an OAuth **client**) starts internal OAuth process with JupyterHub (the **provider**)
When a user logs into JupyterHub, they get a 'server', which we usually call the **single-user server**, because it's a server that's meant for a single JupyterHub user.
Each JupyterHub user gets a different one (or more than one!).
A single-user server is a process running somewhere that is:
1. accessible over http[s],
2. authenticated via JupyterHub using OAuth 2.0,
3. started by a [Spawner](spawners), and
4. 'owned' by a single JupyterHub user
## The single-user server command
The Spawner's default single-user server startup command, `jupyterhub-singleuser`, launches `jupyter-server`, the same program used when you run `jupyter lab` on your laptop.
(_It can also launch the legacy `jupyter-notebook` server_).
That's why JupyterHub looks familiar to folks who are already using Jupyter at home or elsewhere.
It's the same!
`jupyterhub-singleuser`_customizes_ that program to change (approximately) one thing: **authenticate requests with JupyterHub**.
(singleuser-auth)=
## Single-user server authentication
Implementation-wise, JupyterHub single-user servers are a special-case of {ref}`services`
and as such use the same (OAuth) authentication mechanism (more on OAuth in JupyterHub at [](oauth)).
This is primarily implemented in the {class}`~.HubOAuth` class.
This code resides in `jupyterhub.singleuser` subpackage of JupyterHub.
The main task of this code is to:
1. resolve a JupyterHub token to a JupyterHub user (authenticate)
2. check permissions (`access:servers`) for the token to make sure the request should be allowed (authorize)
3. if not authorized, begin the OAuth process with a redirect to the Hub
4. after login, store OAuth tokens in a cookie only used by this single-user server
5. implement logout to clear the cookie
Most of this is implemented in the {class}`~.HubOAuth` class. `jupyterhub.singleuser` is responsible for _adapting_ the base Jupyter Server to use HubOAuth for these tasks.
### JupyterHub authentication extension
By default, `jupyter-server` uses its own cookie to authenticate.
If that cookie is not present, the server redirects you a login page and asks you to enter a password or token.
Jupyter Server 2.0 introduces two new _APIs_ for customizing authentication: the [IdentityProvider](inv:jupyter-server#jupyter_server.auth.IdentityProvider) and the [Authorizer](inv:jupyter-server#jupyter_server.auth.Authorizer).
More information can be found in the [Jupyter Server documentation](https://jupyter-server.readthedocs.io).
JupyterHub implements these APIs in `jupyterhub.singleuser.extension`.
The IdentityProvider is responsible for _authenticating_ requests.
In JupyterHub, that means extracting OAuth tokens from the request and resolving them to a JupyterHub user.
The Authorizer is a _separate_ API for _authorizing_ actions on particular resources.
Because the JupyterHub IdentityProvider only allows _authenticating_ users who already have the necessary `access:servers` permission to access the server, the default Authorizer only contains a redundant check for this same permission, and ignores the resource inputs.
However, specifying a _custom_ Authorizer allows for granular permissions, such as read-only access to subsets of a shared server.
### JupyterHub authentication via subclass
Prior to Jupyter Server 2 (i.e. Jupyter Server 1.x or the legacy `jupyter-notebook` server), JupyterHub authentication is applied via _subclass_.
Originally a subclass of `NotebookApp`,
this approach works with both `jupyter-server` and `jupyter-notebook`.
Instead of using the extension mechanisms above,
the server application is _subclassed_. This worked well in the `jupyter-notebook` days,
but doesn't fit well with Jupyter Server's extension-based architecture.
Using the JupyterHub singleuser-server extension is the default behavior of JupyterHub 4 and Jupyter Server 2, otherwise the subclass approach is taken.
You can opt-out of the extension by setting the environment variable `JUPYTERHUB_SINGLEUSER_EXTENSION=0`:
```python
c.Spawner.environment.update(
{
"JUPYTERHUB_SINGLEUSER_EXTENSION":"0",
}
)
```
The subclass approach will also be taken if you've opted to use the classic notebook server with:
```
JUPYTERHUB_SINGLEUSER_APP=notebook
```
which was introduced in JupyterHub 2.
## Other customizations
`jupyterhub-singleuser` makes other small customizations to how the single-user server behaves:
1. logs activity on the single-user server, used in [idle-culling](https://github.com/jupyterhub/jupyterhub-idle-culler).
2. disables some features that don't make sense in JupyterHub (trash, retrying ports)
3. loading options such as URLs and SSL configuration from the environment
4. customize logging for consistency with JupyterHub logs
## Running a single-user server that's not `jupyterhub-singleuser`
By default, `jupyterhub-singleuser` is the same `jupyter-server` used by JupyterLab, Jupyter notebook (>= 7), etc.
But technically, all JupyterHub cares about is that it is:
1. an http server at the prescribed URL, accessible from the Hub and proxy, and
2. authenticated via [OAuth](oauth) with the Hub (it doesn't even have to do this, if you want to do your own authentication, as is done in BinderHub)
which means that you can customize JupyterHub to launch _any_ web application that meets these criteria, by following the specifications in {ref}`services`.
Most of the time, though, it's easier to use [jupyter-server-proxy](https://jupyter-server-proxy.readthedocs.io) if you want to launch additional web applications in JupyterHub.
The **Security Overview** section helps you learn about:
- the design of JupyterHub with respect to web security
- the semi-trusted user
- the available mitigations to protect untrusted users from each other
- the value of periodic security audits
This overview also helps you obtain a deeper understanding of how JupyterHub
works.
## Semi-trusted and untrusted users
JupyterHub is designed to be a _simple multi-user server for modestly sized
groups_ of **semi-trusted** users. While the design reflects serving
semi-trusted users, JupyterHub can also be suitable for serving **untrusted** users,
but **is not suitable for untrusted users** in its default configuration.
As a result, using JupyterHub with **untrusted** users means more work by the
administrator, since much care is required to secure a Hub, with extra caution on
protecting users from each other.
One aspect of JupyterHub's _design simplicity_ for **semi-trusted** users is that
the Hub and single-user servers are placed in a _single domain_, behind a
[_proxy_][configurable-http-proxy]. If the Hub is serving untrusted
users, many of the web's cross-site protections are not applied between
single-user servers and the Hub, or between single-user servers and each
other, since browsers see the whole thing (proxy, Hub, and single user
servers) as a single website (i.e. single domain).
## Protect users from each other
To protect users from each other, a user must **never** be able to write arbitrary
HTML and serve it to another user on the Hub's domain. This is prevented by JupyterHub's
authentication setup because only the owner of a given single-user notebook server is
allowed to view user-authored pages served by the given single-user notebook
server.
To protect all users from each other, JupyterHub administrators must
ensure that:
- A user **does not have permission** to modify their single-user notebook server,
including:
- the installation of new packages in the Python environment that runs
their single-user server;
- the creation of new files in any `PATH` directory that precedes the
directory containing `jupyterhub-singleuser` (if the `PATH` is used
to resolve the single-user executable instead of using an absolute path);
- the modification of environment variables (e.g. PATH, PYTHONPATH) for
their single-user server;
- the modification of the configuration of the notebook server
(the `~/.jupyter` or `JUPYTER_CONFIG_DIR` directory).
- unrestricted selection of the base environment (e.g. the image used in container-based Spawners)
If any additional services are run on the same domain as the Hub, the services
**must never** display user-authored HTML that is neither _sanitized_ nor _sandboxed_
to any user that lacks authentication as the author of a file.
### Sharing access to servers
Because sharing access to servers (via `access:servers` scopes or the sharing feature in JupyterHub 5) by definition means users can serve each other files, enabling sharing is not suitable for untrusted users without also enabling per-user domains.
JupyterHub does not enable any sharing by default.
## Mitigate security issues
The several approaches to mitigating security issues with configuration
options provided by JupyterHub include:
### Enable user subdomains
JupyterHub provides the ability to run single-user servers on their own
domains. This means the cross-origin protections between servers has the
desired effect, and user servers and the Hub are protected from each other.
**Subdomains are the only way to reliably isolate user servers from each other.**
When subdomains are enabled, each user's single-user server will be at e.g. `https://username.jupyter.example.org`.
This also requires all user subdomains to point to the same address,
which is most easily accomplished with wildcard DNS, where a single A record points to your server and a wildcard CNAME record points to your A record:
```
A jupyter.example.org 192.168.1.123
CNAME *.jupyter.example.org jupyter.example.org
```
Since this spreads the service across multiple domains, you will likely need wildcard SSL as well,
matching `*.jupyter.example.org`.
Unfortunately, for many institutional domains, wildcard DNS and SSL may not be available.
We also **strongly encourage** serving JupyterHub and user content on a domain that is _not_ a subdomain of any sensitive content.
For reasoning, see [GitHub's discussion of moving user content to github.io from \*.github.com](https://github.blog/2013-04-09-yummy-cookies-across-domains/).
**If you do plan to serve untrusted users, enabling subdomains is highly encouraged**,
as it resolves many security issues, which are difficult to unavoidable when JupyterHub is on a single-domain.
:::{important}
JupyterHub makes no guarantees about protecting users from each other unless subdomains are enabled.
If you want to protect users from each other, you **_must_** enable per-user domains.
:::
### Disable user config
If subdomains are unavailable or undesirable, JupyterHub provides a
configuration option `Spawner.disable_user_config = True`, which can be set to prevent
the user-owned configuration files from being loaded. After implementing this
option, `PATH`s and package installation are the other things that the
admin must enforce.
### Prevent spawners from evaluating shell configuration files
For most Spawners, `PATH` is not something users can influence, but it's important that
the Spawner should _not_ evaluate shell configuration files prior to launching the server.
### Isolate packages in a read-only environment
The user must not have permission to install packages into the environment where the singleuser-server runs.
On a shared system, package isolation is most easily handled by running the single-user server in
a root-owned virtualenv with disabled system-site-packages.
The user must not have permission to install packages into this environment.
The same principle extends to the images used by container-based deployments.
If users can select the images in which their servers run, they can disable all security for their own servers.
It is important to note that the control over the environment is only required for the
single-user server, and not the environment(s) in which the users' kernel(s)
may run. Installing additional packages in the kernel environment does not
pose additional risk to the web application's security.
### Encrypt internal connections with SSL/TLS
By default, all communications within JupyterHub—between the proxy, hub, and single
-user notebooks—are performed unencrypted. Setting the `internal_ssl` flag in
`jupyterhub_config.py` secures the aforementioned routes. Turning this
feature on does require that the enabled `Spawner` can use the certificates
generated by the `Hub` (the default `LocalProcessSpawner` can, for instance).
It is also important to note that this encryption **does not** cover the
`zmq tcp` sockets between the Notebook client and kernel yet. While users cannot
submit arbitrary commands to another user's kernel, they can bind to these
sockets and listen. When serving untrusted users, this eavesdropping can be
mitigated by setting `KernelManager.transport` to `ipc`. This applies standard
Unix permissions to the communication sockets thereby restricting
communication to the socket owner. The `internal_ssl` option will eventually
extend to securing the `tcp` sockets as well.
### Mitigating same-origin deployments
While per-user domains are **required** for robust protection of users from each other,
you can mitigate many (but not all) cross-user issues.
First, it is critical that users cannot modify their server environments, as described above.
Second, it is important that users do not have `access:servers` permission to any server other than their own.
If users can access each others' servers, additional security measures must be enabled, some of which come with distinct user-experience costs.
Without the [Same-Origin Policy] (SOP) protecting user servers from each other,
each user server is considered a trusted origin for requests to each other user server (and the Hub itself).
Servers _cannot_ meaningfully distinguish requests originating from other user servers,
because SOP implies a great deal of trust, losing many restrictions applied to cross-origin requests.
That means pages served from each user server can:
1. arbitrarily modify the path in the Referer
2. make fully authorized requests with cookies
3. access full page contents served from the hub or other servers via popups
JupyterHub uses distinct xsrf tokens stored in cookies on each server path to attempt to limit requests across.
This has limitations because not all requests are protected by these XSRF tokens,
and unless additional measures are taken, the XSRF tokens from other user prefixes may be retrieved.
`allow-popups` is not disabled by default because disabling it breaks legitimate functionality, like "Open this in a new tab", and the "JupyterHub Control Panel" menu item.
To reiterate, the right way to avoid these issues is to enable per-user domains, where none of these concerns come up.
Note: even this level of protection requires administrators maintaining full control over the user server environment.
If users can modify their server environment, these methods are ineffective, as users can readily disable them.
### Cookie tossing
Cookie tossing is a technique where another server on a subdomain or peer subdomain can set a cookie
which will be read on another domain.
This is not relevant unless there are other user-controlled servers on a peer domain.
"Domain-locked" cookies avoid this issue, but have their own restrictions:
- JupyterHub must be served over HTTPS
- All secure cookies must be set on `/`, not on sub-paths, which means they are shared by all JupyterHub components in a single-domain deployment.
As a result, this option is only recommended when per-user subdomains are enabled,
to prevent sending all jupyterhub cookies to all user servers.
To enable domain-locked cookies, set:
```python
c.JupyterHub.cookie_host_prefix_enabled=True
```
```{versionadded} 4.1
```
### Forced-login
Jupyter servers can share links with `?token=...`.
JupyterHub prior to 5.0 will accept this request and persist the token for future requests.
This is useful for enabling admins to create 'fully authenticated' links bypassing login.
However, it also means users can share their own links that will log other users into their own servers,
enabling them to serve each other notebooks and other arbitrary HTML, depending on server configuration.
```{versionadded} 4.1
Setting environment variable `JUPYTERHUB_ALLOW_TOKEN_IN_URL=0` in the single-user environment can opt out of accepting token auth in URL parameters.
```
```{versionadded} 5.0
Accepting tokens in URLs is disabled by default, and `JUPYTERHUB_ALLOW_TOKEN_IN_URL=1` environment variable must be set to _allow_ token auth in URL parameters.
```
## Security audits
We recommend that you do periodic reviews of your deployment's security. It's
good practice to keep [JupyterHub](https://readthedocs.org/projects/jupyterhub/), [configurable-http-proxy][], and [nodejs
versions](https://github.com/nodejs/Release) up to date.
and can look different depending on what you mean by 'share.'
Your first instinct might be to copy the URL you see in the browser,
e.g. `jupyterhub.example/user/yourname/notebooks/coolthing.ipynb`,
but this usually won't work, depending on the permissions of the person you share the link with.
Unfortunately, 'share' means at least a few things to people in a JupyterHub context.
We'll cover 3 common cases here, when they are applicable, and what assumptions they make:
1. sharing links that will open the same file on the visitor's own server
2. sharing links that will bring the visitor to _your_ server (e.g. for real-time collaboration, or RTC)
3. publishing notebooks and sharing links that will download the notebook into the user's server
### link to the same file on the visitor's server
This is for the case where you have JupyterHub on a shared (or sufficiently similar) filesystem, where you want to share a link that will cause users to login and start their _own_ server, to view or edit the file.
**Assumption:** the same path on someone else's server is valid and points to the same file
This is useful in e.g. classes where you know students have certain files in certain locations, or collaborations where you know you have a shared filesystem where everyone has access to the same files.
A link should look like `https://jupyterhub.example/hub/user-redirect/lab/tree/foo.ipynb`.
You can hand-craft these URLs from the URL you are looking at, where you see `/user/name/lab/tree/foo.ipynb` use `/hub/user-redirect/lab/tree/foo.ipynb` (replace `/user/name/` with `/hub/user-redirect/`).
Or you can use JupyterLab's "copy shareable link" in the context menu in the file browser:

which will produce a correct URL with `/hub/user-redirect/` in it.
### link to the file on your server
This is for the case where you want to both be using _your_ server, e.g. for real-time collaboration (RTC).
**Assumption:** the user has (or should have) access to your server.
**Assumption:** your server is running _or_ the user has permission to start it.
By default, JupyterHub users don't have access to each other's servers, but JupyterHub 2.0 administrators can grant users limited access permissions to each other's servers.
If the visitor doesn't have access to the server, these links will result in a 403 Permission Denied error.
In many cases, for this situation you can copy the link in your URL bar (`/user/yourname/lab`), or you can add `/tree/path/to/specific/notebook.ipynb` to open a specific file.
The [jupyterlab-link-share] JupyterLab extension generates these links, and even can _grant_ other users access to your server.
### How can I kill ports from JupyterHub-managed services that have been orphaned?
I started JupyterHub + nbgrader on the same host without containers. When I try to restart JupyterHub + nbgrader with this configuration, errors appear that the service accounts cannot start because the ports are being used.
How can I kill the processes that are using these ports?
Run the following command:
sudo kill -9 $(sudo lsof -t -i:<service_port>)
Where `<service_port>` is the port used by the nbgrader course service. This configuration is specified in `jupyterhub_config.py`.
### Why am I getting a Spawn failed error message?
After successfully logging in to JupyterHub with a compatible authenticator, I get a 'Spawn failed' error message in the browser. The JupyterHub logs have `jupyterhub KeyError: "getpwnam(): name not found: <my_user_name>`.
This issue occurs when the authenticator requires a local system user to exist. In these cases, you need to use a spawner
that does not require an existing system user account, such as `DockerSpawner` or `KubeSpawner`.
### How can I run JupyterHub with sudo but use my current environment variables and virtualenv location?
When launching JupyterHub with `sudo jupyterhub` I get import errors and my environment variables don't work.
When launching services with `sudo ...` the shell won't have the same environment variables or `PATH`s in place. The most direct way to solve this issue is to use the full path to your python environment and add environment variables. For example:
```bash
sudo MY_ENV=abc123 \
/home/foo/venv/bin/python3 \
/srv/jupyterhub/jupyterhub
```
## Errors
## Errors
### 500 error after spawning my single-user server
### Error 500 after spawning my single-user server
You receive a 500 error when accessing the URL `/user/<your_name>/...`.
You receive a 500 error while accessing the URL `/user/<your_name>/...`.
This is often seen when your single-user server cannot verify your user cookie
This is often seen when your single-user server cannot verify your user cookie
with the Hub.
with the Hub.
@@ -88,11 +97,11 @@ There are two likely reasons for this:
1. The single-user server cannot connect to the Hub's API (networking
1. The single-user server cannot connect to the Hub's API (networking
configuration problems)
configuration problems)
2. The single-user server cannot *authenticate* its requests (invalid token)
2. The single-user server cannot _authenticate_ its requests (invalid token)
#### Symptoms
#### Symptoms
The main symptom is a failure to load *any* page served by the single-user
The main symptom is a failure to load _any_ page served by the single-user
server, met with a 500 error. This is typically the first page at `/user/<your_name>`
server, met with a 500 error. This is typically the first page at `/user/<your_name>`
after logging in or clicking "Start my server". When a single-user notebook server
after logging in or clicking "Start my server". When a single-user notebook server
receives a request, the notebook server makes an API request to the Hub to
receives a request, the notebook server makes an API request to the Hub to
@@ -106,9 +115,9 @@ If everything is working, the response logged will be similar to this:
You should see a similar 200 message, as above, in the Hub log when you first
You should see a similar 200 message, as above, in the Hub log when you first
visit your single-user notebook server. If you don't see this message in the log, it
visit your single-user notebook server. If you don't see this message in the log, it
may mean that your single-user notebook server isn't connecting to your Hub.
may mean that your single-user notebook server is not connecting to your Hub.
If you see 403 (forbidden) like this, it's a token problem:
If you see 403 (forbidden) like this, it is likely a token problem:
```
```
403 GET /hub/api/authorizations/cookie/jupyterhub-token-name/[secret] (@10.0.1.4) 4.14ms
403 GET /hub/api/authorizations/cookie/jupyterhub-token-name/[secret] (@10.0.1.4) 4.14ms
@@ -138,10 +147,10 @@ If you receive a 403 error, the API token for the single-user server is likely
invalid. Commonly, the 403 error is caused by resetting the JupyterHub
invalid. Commonly, the 403 error is caused by resetting the JupyterHub
database (either removing jupyterhub.sqlite or some other action) while
database (either removing jupyterhub.sqlite or some other action) while
leaving single-user servers running. This happens most frequently when using
leaving single-user servers running. This happens most frequently when using
DockerSpawner, because Docker's default behavior is to stop/start containers
DockerSpawner because Docker's default behavior is to stop/start containers
which resets the JupyterHub database, rather than destroying and recreating
that reset the JupyterHub database, rather than destroying and recreating
the container every time. This means that the same API token is used by the
the container every time. This means that the same API token is used by the
server for its whole life, until the container is rebuilt.
server for its whole life until the container is rebuilt.
The fix for this Docker case is to remove any Docker containers seeing this
The fix for this Docker case is to remove any Docker containers seeing this
issue (typically all containers created before a certain point in time):
issue (typically all containers created before a certain point in time):
@@ -152,6 +161,42 @@ After this, when you start your server via JupyterHub, it will build a
new container. If this was the underlying cause of the issue, you should see
new container. If this was the underlying cause of the issue, you should see
your server again.
your server again.
##### Proxy settings (403 GET)
When your whole JupyterHub sits behind an organization proxy (_not_ a reverse proxy like NGINX as part of your setup and _not_ the configurable-http-proxy) the environment variables `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` might be set. This confuses the JupyterHub single-user servers: When connecting to the Hub for authorization they connect via the proxy instead of directly connecting to the Hub on localhost. The proxy might deny the request (403 GET). This results in the single-user server thinking it has the wrong auth token. To circumvent this you should add `<hub_url>,<hub_ip>,localhost,127.0.0.1` to the environment variables `NO_PROXY` and `no_proxy`.
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
{ref}`services` allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
- **Private Dashboards**: share dashboards with certain group members.
If possible, try to run the Jupyter Notebook as an externally managed service with one of the provided [jupyter/docker-stacks](https://github.com/jupyter/docker-stacks).
Standard JupyterHub installations include a [jupyterhub-singleuser](https://github.com/jupyterhub/jupyterhub/blob/9fdab027daa32c9017845572ad9d5ba1722dbc53/setup.py#L116) command which is built from the `jupyterhub.singleuser:main` method. The `jupyterhub-singleuser` command is the default command when JupyterHub launches single-user Jupyter Notebooks. One of the goals of this command is to make sure the version of JupyterHub installed within the Jupyter Notebook coincides with the version of the JupyterHub server itself.
If you launch a Jupyter Notebook with the `jupyterhub-singleuser` command directly from the command line, the Jupyter Notebook won't have access to the `JUPYTERHUB_API_TOKEN` and will return:
```
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser.
Did you launch it manually?
```
If you plan on testing `jupyterhub-singleuser` independently from JupyterHub, then you can set the API token environment variable. For example, if you were to run the single-user Jupyter Notebook on the host, then:
export JUPYTERHUB_API_TOKEN=my_secret_token
jupyterhub-singleuser
With a docker container, pass in the environment variable with the run command:
docker run -d \
-p 8888:8888 \
-e JUPYTERHUB_API_TOKEN=my_secret_token \
jupyter/datascience-notebook:latest
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
## How do I...?
## How do I...?
@@ -160,7 +205,7 @@ your server again.
Some certificate providers, i.e. Entrust, may provide you with a chained
Some certificate providers, i.e. Entrust, may provide you with a chained
certificate that contains multiple files. If you are using a chained
certificate that contains multiple files. If you are using a chained
certificate you will need to concatenate the individual files by appending the
certificate you will need to concatenate the individual files by appending the
The important thing is that jupyterlab is installed and enabled in the
The important thing is that JupyterLab is installed and enabled in the
single-user notebook server environment. For system users, this means
single-user notebook server environment. For system users, this means
system-wide, as indicated above. For Docker containers, it means inside
system-wide, as indicated above. For Docker containers, it means inside
the single-user docker image, etc.
the single-user docker image, etc.
@@ -252,15 +296,14 @@ notebook servers to default to JupyterLab:
### How do I set up JupyterHub for a workshop (when users are not known ahead of time)?
### How do I set up JupyterHub for a workshop (when users are not known ahead of time)?
1. Set up JupyterHub using OAuthenticator for GitHub authentication
1. Set up JupyterHub using OAuthenticator for GitHub authentication
2. Configure whitelist to be an empty list in` jupyterhub_config.py`
2. Configure the admin list to have workshop leaders listed with administrator privileges.
3. Configure admin list to have workshop leaders be listed with administrator privileges.
Users will need a GitHub account to login and be authenticated by the Hub.
Users will need a GitHub account to login and be authenticated by the Hub.
### How do I set up rotating daily logs?
### How do I set up rotating daily logs?
You can do this with [logrotate](https://linux.die.net/man/8/logrotate),
You can do this with [logrotate](https://linux.die.net/man/8/logrotate),
or pipe to `logger` to use syslog instead of directly to a file.
or pipe to `logger` to use Syslog instead of directly to a file.
For example, with this logrotate config file:
For example, with this logrotate config file:
@@ -281,35 +324,9 @@ Or use syslog:
jupyterhub | logger -t jupyterhub
jupyterhub | logger -t jupyterhub
## Troubleshooting commands
The following commands provide additional detail about installed packages,
versions, and system information that may be helpful when troubleshooting
a JupyterHub deployment. The commands are:
- System and deployment information
```bash
jupyter troubleshooting
```
- Kernel information
```bash
jupyter kernelspec list
```
- Debug logs when running JupyterHub
```bash
jupyterhub --debug
```
### Toree integration with HDFS rack awareness script
### Toree integration with HDFS rack awareness script
The Apache Toree kernel will an issue, when running with JupyterHub, if the standard HDFS
The Apache Toree kernel will have an issue when running with JupyterHub if the standard HDFS rack awareness script is used. This will materialize in the logs as a repeated WARN:
rack awareness script is used. This will materialize in the logs as a repeated WARN:
@@ -324,16 +341,55 @@ SyntaxError: Missing parentheses in call to 'print'
In order to resolve this issue, there are two potential options.
In order to resolve this issue, there are two potential options.
1. Update HDFS core-site.xml, so the parameter "net.topology.script.file.name" points to a custom
1. Update HDFS core-site.xml, so the parameter "net.topology.script.file.name" points to a custom
script (e.g. /etc/hadoop/conf/custom_topology_script.py). Copy the original script and change the first line point
script (e.g. /etc/hadoop/conf/custom_topology_script.py). Copy the original script and change the first line point
to a python two installation (e.g. /usr/bin/python).
to a python two installation (e.g. /usr/bin/python).
2. In spark-env.sh add a Python 2 installation to your path (e.g. export PATH=/opt/anaconda2/bin:$PATH).
2. In spark-env.sh add a Python 2 installation to your path (e.g. export PATH=/opt/anaconda2/bin:$PATH).
### Where do I find Docker images and Dockerfiles related to JupyterHub?
### Where do I find Docker images and Dockerfiles related to JupyterHub?
Docker images can be found at the [JupyterHub organization on DockerHub](https://hub.docker.com/u/jupyterhub/).
Docker images can be found at the [JupyterHub organization on Quay.io](https://quay.io/organization/jupyterhub).
The Docker image [jupyterhub/singleuser](https://hub.docker.com/r/jupyterhub/singleuser/)
The Docker image [jupyterhub/singleuser](https://quay.io/repository/jupyterhub/singleuser)
provides an example singleuser notebook server for use with DockerSpawner.
provides an example single-user notebook server for use with DockerSpawner.
Additional singleuser notebook server images can be found at the [Jupyter
Additional single-user notebook server images can be found at the [Jupyter
organization on DockerHub](https://hub.docker.com/r/jupyter/) and information
organization on Quay.io](https://quay.io/organization/jupyter) and information
about each image at the [jupyter/docker-stacks repo](https://github.com/jupyter/docker-stacks).
about each image at the [jupyter/docker-stacks repo](https://github.com/jupyter/docker-stacks).
### How can I view the logs for JupyterHub or the user's Notebook servers when using the DockerSpawner?
Use `docker logs <container>` where `<container>` is the container name defined within `docker-compose.yml`. For example, to view the logs of the JupyterHub container use:
docker logs hub
By default, the user's notebook server is named `jupyter-<username>` where `username` is the user's username within JupyterHub's database.
So if you wanted to see the logs for user `foo` you would use:
docker logs jupyter-foo
You can also tail logs to view them in real-time using the `-f` option:
docker logs -f hub
## Troubleshooting commands
The following commands provide additional detail about installed packages,
versions, and system information that may be helpful when troubleshooting
- [Press release on Jupyter and Cori](http://www.nersc.gov/news-publications/nersc-news/nersc-center-news/2016/jupyter-notebooks-will-open-up-new-possibilities-on-nerscs-cori-supercomputer/)
- [Moving and sharing data](https://www.nersc.gov/assets/Uploads/03-MovingAndSharingData-Cholia.pdf)
- [Research IT](http://research-it.berkeley.edu)
- [JupyterHub server supports campus research computation](http://research-it.berkeley.edu/blog/17/01/24/free-fully-loaded-jupyterhub-server-supports-campus-research-computation)
### University of California Davis
- [Spinning up multiple Jupyter Notebooks on AWS for a tutorial](https://github.com/mblmicdiv/course2017/blob/master/exercises/sourmash-setup.md)
Although not technically a JupyterHub deployment, this tutorial setup
may be helpful to others in the Jupyter community.
Thank you C. Titus Brown for sharing this with the Software Carpentry
mailing list.
```
* I started a big Amazon machine;
* I installed Docker and built a custom image containing my software of
interest;
* I ran multiple containers, one connected to port 8000, one on 8001,
etc. and gave each student a different port;
* students could connect in and use the Terminal program in Jupyter to
execute commands, and could upload/download files via the Jupyter
console interface;
* in theory I could have used notebooks too, but for this I didn’t have
need.
I am aware that JupyterHub can probably do all of this including manage
the containers, but I’m still a bit shy of diving into that; this was
fairly straightforward, gave me disposable containers that were isolated
for each individual student, and worked almost flawlessly. Should be
easy to do with RStudio too.
```
### Cal Poly San Luis Obispo
- [jupyterhub-deploy-teaching](https://github.com/jupyterhub/jupyterhub-deploy-teaching) based on work by Brian Granger for Cal Poly's Data Science 301 Course
### Clemson University
- Advanced Computing
- [Palmetto cluster and JupyterHub](http://citi.sites.clemson.edu/2016/08/18/JupyterHub-for-Palmetto-Cluster.html)
### University of Colorado Boulder
- (CU Research Computing) CURC
- [JupyterHub User Guide](https://www.rc.colorado.edu/support/user-guide/jupyterhub.html)
- Slurm job dispatched on Crestone compute cluster
- log troubleshooting
- Profiles in IPython Clusters tab
- [Parallel Processing with JupyterHub tutorial](https://www.rc.colorado.edu/support/examples-and-tutorials/parallel-processing-with-jupyterhub.html)
- [Parallel Programming with JupyterHub document](https://www.rc.colorado.edu/book/export/html/833)
- Earth Lab at CU
- [Tutorial on Parallel R on JupyterHub](https://earthdatascience.org/tutorials/parallel-r-on-jupyterhub/)
### HTCondor
- [HTCondor Python Bindings Tutorial from HTCondor Week 2017 includes information on their JupyterHub tutorials](https://research.cs.wisc.edu/htcondor/HTCondorWeek2017/presentations/TueBockelman_Python.pdf)
### University of Illinois
- https://datascience.business.illinois.edu
### MIT and Lincoln Labs
### Michigan State University
- [Setting up JupyterHub](https://mediaspace.msu.edu/media/Setting+Up+Your+JupyterHub+Password/1_hgv13aag/11980471)
- [Using Tensorflow and JupyterHub in Classrooms](https://cloud.google.com/solutions/using-tensorflow-jupyterhub-classrooms)
- [using-tensorflow-and-jupyterhub blog post](https://opensource.googleblog.com/2016/10/using-tensorflow-and-jupyterhub.html)
### Everware
[Everware](https://github.com/everware) Reproducible and reusable science powered by jupyterhub and docker. Like nbviewer, but executable. CERN, Geneva [website](http://everware.xyz/)
Visit the [Github OAuthenticator reference](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.github.html) to see the full list of options for configuring Github OAuth with JupyterHub.
@@ -6,14 +6,14 @@ Only do this if you are very sure you must.
## Overview
## Overview
There are many Authenticators and Spawners available for JupyterHub. Some, such
There are many [Authenticators](authenticators) and [Spawners](spawners) available for JupyterHub. Some, such
as DockerSpawner or OAuthenticator, do not need any elevated permissions. This
as [DockerSpawner](https://github.com/jupyterhub/dockerspawner) or [OAuthenticator](https://github.com/jupyterhub/oauthenticator), do not need any elevated permissions. This
document describes how to get the full default behavior of JupyterHub while
document describes how to get the full default behavior of JupyterHub while
running notebook servers as real system users on a shared system without
running notebook servers as real system users on a shared system, without
running the Hub itself as root.
running the Hub itself as root.
Since JupyterHub needs to spawn processes as other users, the simplest way
Since JupyterHub needs to spawn processes as other users, the simplest way
is to run it as root, spawning user servers with [setuid](http://linux.die.net/man/2/setuid).
is to run it as root, spawning user servers with [setuid](https://linux.die.net/man/2/setuid).
But this isn't especially safe, because you have a process running on the
But this isn't especially safe, because you have a process running on the
public web as root.
public web as root.
@@ -37,7 +37,7 @@ Next, you will need [sudospawner](https://github.com/jupyter/sudospawner)
to enable monitoring the single-user servers with sudo:
to enable monitoring the single-user servers with sudo:
```bash
```bash
sudo pip install sudospawner
sudo python3 -m pip install sudospawner
```
```
Now we have to configure sudo to allow the Hub user (`rhea`) to launch
Now we have to configure sudo to allow the Hub user (`rhea`) to launch
@@ -53,11 +53,10 @@ To do this we add to `/etc/sudoers` (use `visudo` for safe editing of sudoers):
- give `rhea` permission to run `JUPYTER_CMD` on behalf of `JUPYTER_USERS`
- give `rhea` permission to run `JUPYTER_CMD` on behalf of `JUPYTER_USERS`
without entering a password
without entering a password
For example:
For example:
```bash
```bash
# comma-separated whitelist of users that can spawn single-user servers
# comma-separated list of users that can spawn single-user servers
To deploy JupyterHub means you are providing Jupyter notebook environments for
multiple users. Often, this includes a desire to configure the user
environment in a custom way.
Since the `jupyterhub-singleuser` server extends the standard Jupyter notebook
server, most configuration and documentation that applies to Jupyter Notebook
applies to the single-user environments. Configuration of user environments
typically does not occur through JupyterHub itself, but rather through system-wide
configuration of Jupyter, which is inherited by `jupyterhub-singleuser`.
**Tip:** When searching for configuration tips for JupyterHub user environments, you might want to remove JupyterHub from your search because there are a lot more people out there configuring Jupyter than JupyterHub and the configuration is the same.
This section will focus on user environments, which includes the following:
- [Installing packages](#installing-packages)
- [Configuring Jupyter and IPython](#configuring-jupyter-and-ipython)
and [IPython](https://ipython.readthedocs.io/en/stable/development/config.html)
have their own configuration systems.
As a JupyterHub administrator, you will typically want to install and configure environments for all JupyterHub users. For example, let's say you wish for each student in a class to have the same user environment configuration.
Jupyter and IPython support **"system-wide"** locations for configuration, which is the logical place to put global configuration that you want to affect all users. It's generally more efficient to configure user environments "system-wide", and it's a good practice to avoid creating files in the users' home directories.
The typical locations for these config files are:
- **system-wide** in `/etc/{jupyter|ipython}`
- **env-wide** (environment wide) in `{sys.prefix}/etc/{jupyter|ipython}`.
### Jupyter environment configuration priority
When Jupyter runs in an environment (conda or virtualenv), it prefers to load configuration from the environment over each user's own configuration (e.g. in `~/.jupyter`).
This may cause issues if you use a _shared_ conda environment or virtualenv for users, because e.g. jupyterlab may try to write information like workspaces or settings to the environment instead of the user's own directory.
This could fail with something like `Permission denied: $PREFIX/etc/jupyter/lab`.
To avoid this issue, set `JUPYTER_PREFER_ENV_PATH=0` in the user environment:
```python
c.Spawner.environment.update(
{
"JUPYTER_PREFER_ENV_PATH":"0",
}
)
```
which tells Jupyter to prefer _user_ configuration paths (e.g. in `~/.jupyter`) to configuration set in the environment.
### Example: Enable an extension system-wide
For example, to enable the `cython` IPython extension for all of your users, create the file `/etc/ipython/ipython_config.py`:
```python
c.InteractiveShellApp.extensions.append("cython")
```
### Example: Enable a Jupyter notebook configuration setting for all users
:::{note}
These examples configure the Jupyter ServerApp, which is used by JupyterLab, the default in JupyterHub 2.0.
If you are using the classing Jupyter Notebook server,
the same things should work,
with the following substitutions:
- Search for `jupyter_server_config`, and replace with `jupyter_notebook_config`
- Search for `NotebookApp`, and replace with `ServerApp`
:::
To enable Jupyter notebook's internal idle-shutdown behavior (requires notebook ≥ 5.4), set the following in the `/etc/jupyter/jupyter_server_config.py` file:
```python
# shutdown the server after no activity for an hour
c.ServerApp.shutdown_no_activity_timeout=60*60
# shutdown kernels after no activity for 20 minutes
c.MappingKernelManager.cull_idle_timeout=20*60
# check for idle kernels every two minutes
c.MappingKernelManager.cull_interval=2*60
```
## Installing kernelspecs
You may have multiple Jupyter kernels installed and want to make sure that they are available to all of your users. This means installing kernelspecs either system-wide (e.g. in /usr/local/) or in the `sys.prefix` of JupyterHub
itself.
Jupyter kernelspec installation is system-wide by default, but some kernels
may default to installing kernelspecs in your home directory. These will need
to be moved system-wide to ensure that they are accessible.
To see where your kernelspecs are, you can use the following command:
```bash
jupyter kernelspec list
```
### Example: Installing kernels system-wide
Let's assume that I have a Python 2 and Python 3 environment that I want to make sure are available, I can install their specs **system-wide** (in /usr/local) using the following command:
There are two broad categories of user environments that depend on what
Spawner you choose:
- Multi-user hosts (shared system)
- Container-based
How you configure user environments for each category can differ a bit
depending on what Spawner you are using.
The first category is a **shared system (multi-user host)** where
each user has a JupyterHub account, a home directory as well as being
a real system user. In this example, shared configuration and installation
must be in a 'system-wide' location, such as `/etc/`, or `/usr/local`
or a custom prefix such as `/opt/conda`.
When JupyterHub uses **container-based** Spawners (e.g. KubeSpawner or
DockerSpawner), the 'system-wide' environment is really the container image used for users.
In both cases, you want to _avoid putting configuration in user home
directories_ because users can change those configuration settings. Also, home directories typically persist once they are created, thereby making it difficult for admins to update later.
## Named servers
By default, in a JupyterHub deployment, each user has one server only.
JupyterHub can, however, have multiple servers per user.
This is mostly useful in deployments where users can configure the environment in which their server will start (e.g. resource requests on an HPC cluster), so that a given user can have multiple configurations running at the same time, without having to stop and restart their own server.
To allow named servers, include this code snippet in your config file:
```python
c.JupyterHub.allow_named_servers=True
```
Named servers were implemented in the REST API in JupyterHub 0.8,
and JupyterHub 1.0 introduces UI for managing named servers via the user home page:

as well as the admin page:

Named servers can be accessed, created, started, stopped, and deleted
from these pages. Activity tracking is now per server as well.
To limit the number of **named server** per user by setting a constant value, include this code snippet in your config file:
```python
c.JupyterHub.named_server_limit_per_user=5
```
Alternatively, to use a callable/awaitable based on the handler object, include this code snippet in your config file:
This can be useful for quota service implementations. The example above limits the number of named servers for non-admin users only.
If `named_server_limit_per_user` is set to `0`, no limit is enforced.
When using named servers, Spawners may need additional configuration to take the `servername` into account. Whilst `KubeSpawner` takes the `servername` into account by default in [`pod_name_template`](https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html#kubespawner.KubeSpawner.pod_name_template), other Spawners may not. Check the documentation for the specific Spawner to see how singleuser servers are named, for example in `DockerSpawner` this involves modifying the [`name_template`](https://jupyterhub-dockerspawner.readthedocs.io/en/latest/api/index.html) setting to include `servername`, eg. `"{prefix}-{username}-{servername}"`.
(classic-notebook-ui)=
## Switching back to the classic notebook
By default, the single-user server launches JupyterLab,
which is based on [Jupyter Server][].
This is the default server when running JupyterHub ≥ 2.0.
To switch to using the legacy Jupyter Notebook server (notebook <7.0),youcansetthe`JUPYTERHUB_SINGLEUSER_APP`environmentvariable
The _How-to_ guides provide practical step-by-step details to help you achieve a particular goal. They are useful when you are trying to get something done but require you to understand and adapt the steps to your specific usecase.
Use the following guides when:
```{toctree}
:maxdepth: 1
api-only
proxy
rest
separate-proxy
templates
upgrading
log-messages
```
(config-examples)=
## Configuration
The following guides provide examples, including configuration files and tips, for the
`delete_route` should still succeed, but a warning may be issued.
`delete_route` should still succeed, but a warning may be issued.
```python
```python
@gen.coroutine
async def delete_route(self, routespec):
def delete_route(self, routespec):
"""Delete the route"""
"""Delete the route"""
```
```
### Retrieving routes
### Retrieving routes
For retrieval, you only *need* to implement a single method that retrieves all
For retrieval, you only _need_ to implement a single method that retrieves all
routes. The return value for this function should be a dictionary, keyed by
routes. The return value for this function should be a dictionary, keyed by
`routespect`, of dicts whose keys are the same three arguments passed to
`routespec`, of dicts whose keys are the same three arguments passed to
`add_route` (`routespec`, `target`, `data`)
`add_route` (`routespec`, `target`, `data`)
```python
```python
@gen.coroutine
async def get_all_routes(self):
def get_all_routes(self):
"""Return all routes, keyed by routespec"""
"""Return all routes, keyed by routespec"""
```
```
@@ -179,3 +188,46 @@ tracked, and services such as cull-idle will not work.
Now that `notebook-5.0` tracks activity internally, we can retrieve activity
Now that `notebook-5.0` tracks activity internally, we can retrieve activity
information from the single-user servers instead, removing the need to track
information from the single-user servers instead, removing the need to track
activity in the proxy. But this is not yet implemented in JupyterHub 0.8.0.
activity in the proxy. But this is not yet implemented in JupyterHub 0.8.0.
### Registering custom Proxies via entry points
As of JupyterHub 1.0, custom proxy implementations can register themselves via
the `jupyterhub.proxies` entry point metadata.
To do this, in your `setup.py` add:
```python
setup(
...
entry_points={
'jupyterhub.proxies': [
'mything = mypackage:MyProxy',
],
},
)
```
If you have added this metadata to your package,
admins can select your authenticator with the configuration:
```python
c.JupyterHub.proxy_class = 'mything'
```
instead of the full
```python
c.JupyterHub.proxy_class = 'mypackage:MyProxy'
```
as previously required.
Additionally, configurable attributes for your proxy will
appear in jupyterhub help output and auto-generated configuration files
via `jupyterhub --generate-config`.
### Index of proxies
A list of the proxies that are currently available for JupyterHub (that we know about).
1. [`jupyterhub/configurable-http-proxy`](https://github.com/jupyterhub/configurable-http-proxy) The default proxy which uses node-http-proxy
2. [`jupyterhub/traefik-proxy`](https://github.com/jupyterhub/traefik-proxy) The proxy which configures traefik proxy server for jupyterhub
3. [`AbdealiJK/configurable-http-proxy`](https://github.com/AbdealiJK/configurable-http-proxy) A pure python implementation of the configurable-http-proxy
- Making an API request programmatically using the requests library
- Paginating API requests
- Enabling users to spawn multiple named-servers via the API
- Learn more about JupyterHub's API
Before we discuss about JupyterHub's REST API, you can learn about [REST APIs here](https://en.wikipedia.org/wiki/Representational_state_transfer). A REST
API provides a standard way for users to get and send information to the
Hub.
## What you can do with the API
Using the [JupyterHub REST API](jupyterhub-rest-API), you can perform actions on the Hub,
such as:
- Checking which users are active
- Adding or removing users
- Stopping or starting single user notebook servers
- Authenticating services
- Communicating with an individual Jupyter server's REST API
## Create an API token
To send requests using the JupyterHub API, you must pass an API token with
the request.
While JupyterHub is running, any JupyterHub user can request a token via the `token` page.
This is accessible via a `token` link in the top nav bar from the JupyterHub home page,
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.