Merge branch 'main' into copyediting

This commit is contained in:
Min RK
2022-11-24 09:40:21 +01:00
committed by GitHub
328 changed files with 35662 additions and 7475 deletions

View File

@@ -1,40 +1,51 @@
# Authentication and User Basics
The default Authenticator uses [PAM][] to authenticate system users with
The default Authenticator uses [PAM][] (Pluggable Authentication Module) to authenticate system users with
their username and password. With the default Authenticator, any user
with an account and password on the system will be allowed to login.
## Create a whitelist of users
You can restrict which users are allowed to login with a whitelist,
`Authenticator.whitelist`:
## Create a set of allowed users (`allowed_users`)
You can restrict which users are allowed to login with a set,
`Authenticator.allowed_users`:
```python
c.Authenticator.whitelist = {'mal', 'zoe', 'inara', 'kaylee'}
c.Authenticator.allowed_users = {'mal', 'zoe', 'inara', 'kaylee'}
```
Users in the whitelist are added to the Hub database when the Hub is
Users in the `allowed_users` set are added to the Hub database when the Hub is
started.
```{warning}
If this configuration value is not set, then **all authenticated users will be allowed into your hub**.
```
## Configure admins (`admin_users`)
```{note}
As of JupyterHub 2.0, the full permissions of `admin_users`
should not be required.
Instead, you can assign [roles](define-role-target) to users or groups
with only the scopes they require.
```
Admin users of JupyterHub, `admin_users`, can add and remove users from
the user `whitelist`. `admin_users` can take actions on other users'
the user `allowed_users` set. `admin_users` can take actions on other users'
behalf, such as stopping and restarting their servers.
A set of initial admin users, `admin_users` can configured be as follows:
A set of initial admin users, `admin_users` can be configured as follows:
```python
c.Authenticator.admin_users = {'mal', 'zoe'}
```
Users in the admin list are automatically added to the user `whitelist`,
Users in the admin set are automatically added to the user `allowed_users` set,
if they are not already present.
Each authenticator may have different ways of determining whether a user is an
administrator. By default JupyterHub use the PAMAuthenticator which provide the
`admin_groups` option and can determine administrator status base on a user
groups. For example we can let any users in the `wheel` group be admin:
Each Authenticator may have different ways of determining whether a user is an
administrator. By default, JupyterHub uses the PAMAuthenticator which provides the
`admin_groups` option and can set administrator status based on a user
group. For example, we can let any user in the `wheel` group be an admin:
```python
c.PAMAuthenticator.admin_groups = {'wheel'}
@@ -42,35 +53,35 @@ c.PAMAuthenticator.admin_groups = {'wheel'}
## Give admin access to other users' notebook servers (`admin_access`)
Since the default `JupyterHub.admin_access` setting is False, the admins
Since the default `JupyterHub.admin_access` setting is `False`, the admins
do not have permission to log in to the single user notebook servers
owned by *other users*. If `JupyterHub.admin_access` is set to True,
then admins have permission to log in *as other users* on their
respective machines, for debugging. **As a courtesy, you should make
owned by _other users_. If `JupyterHub.admin_access` is set to `True`,
then admins have permission to log in _as other users_ on their
respective machines for debugging. **As a courtesy, you should make
sure your users know if admin_access is enabled.**
## Add or remove users from the Hub
Users can be added to and removed from the Hub via either the admin
Users can be added to and removed from the Hub via the admin
panel or the REST API. When a user is **added**, the user will be
automatically added to the whitelist and database. Restarting the Hub
will not require manually updating the whitelist in your config file,
automatically added to the `allowed_users` set and database. Restarting the Hub
will not require manually updating the `allowed_users` set in your config file,
as the users will be loaded from the database.
After starting the Hub once, it is not sufficient to **remove** a user
from the whitelist in your config file. You must also remove the user
from the allowed users set in your config file. You must also remove the user
from the Hub's database, either by deleting the user from JupyterHub's
admin page, or you can clear the `jupyterhub.sqlite` database and start
fresh.
## Use LocalAuthenticator to create system users
The `LocalAuthenticator` is a special kind of authenticator that has
The `LocalAuthenticator` is a special kind of Authenticator that has
the ability to manage users on the local system. When you try to add a
new user to the Hub, a `LocalAuthenticator` will check if the user
already exists. If you set the configuration value, `create_system_users`,
to `True` in the configuration file, the `LocalAuthenticator` has
the privileges to add users to the system. The setting in the config
the ability to add users to the system. The setting in the config
file is:
```python
@@ -80,7 +91,7 @@ c.LocalAuthenticator.create_system_users = True
Adding a user to the Hub that doesn't already exist on the system will
result in the Hub creating that user via the system `adduser` command
line tool. This option is typically used on hosted deployments of
JupyterHub, to avoid the need to manually create all your users before
JupyterHub to avoid the need to manually create all your users before
launching the service. This approach is not recommended when running
JupyterHub in situations where JupyterHub users map directly onto the
system's UNIX users.
@@ -90,27 +101,25 @@ system's UNIX users.
JupyterHub's [OAuthenticator][] currently supports the following
popular services:
- Auth0
- Bitbucket
- CILogon
- GitHub
- GitLab
- Globus
- Google
- MediaWiki
- Okpy
- OpenShift
- [Auth0](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.auth0.html#module-oauthenticator.auth0)
- [Azure AD](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.azuread.html#module-oauthenticator.azuread)
- [Bitbucket](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.bitbucket.html#module-oauthenticator.bitbucket)
- [CILogon](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.cilogon.html#module-oauthenticator.cilogon)
- [GitHub](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.github.html#module-oauthenticator.github)
- [GitLab](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.gitlab.html#module-oauthenticator.gitlab)
- [Globus](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.globus.html#module-oauthenticator.globus)
- [Google](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.google.html#module-oauthenticator.google)
- [MediaWiki](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.mediawiki.html#module-oauthenticator.mediawiki)
- [Okpy](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.okpy.html#module-oauthenticator.okpy)
- [OpenShift](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.openshift.html#module-oauthenticator.openshift)
NOTE: Open issue asking for more details on this generic implementation.
It's not clear if this is a different implementation or if the JupyterHub OAuth
_is_ the generic implementation.
A generic implementation, which you can use for OAuth authentication
A [generic implementation](https://oauthenticator.readthedocs.io/en/latest/api/gen/oauthenticator.generic.html#module-oauthenticator.generic), which you can use for OAuth authentication
with any provider, is also available.
## Use DummyAuthenticator for testing
The :class:`~jupyterhub.auth.DummyAuthenticator` is a simple authenticator that
allows for any username/password unless if a global password has been set. If
The `DummyAuthenticator` is a simple Authenticator that
allows for any username or password unless a global password has been set. If
set, it will allow for any username as long as the correct password is provided.
To set a global password, add this to the config file:
@@ -118,5 +127,5 @@ To set a global password, add this to the config file:
c.DummyAuthenticator.password = "some_password"
```
[PAM]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
[OAuthenticator]: https://github.com/jupyterhub/oauthenticator
[pam]: https://en.wikipedia.org/wiki/Pluggable_authentication_module
[oauthenticator]: https://github.com/jupyterhub/oauthenticator

View File

@@ -1,6 +1,6 @@
# Configuration Basics
The section contains basic information about configuring settings for a JupyterHub
This section contains basic information about configuring settings for a JupyterHub
deployment. The [Technical Reference](../reference/index)
documentation provides additional details.
@@ -44,30 +44,30 @@ jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
```
The IPython documentation provides additional information on the
[config system](http://ipython.readthedocs.io/en/stable/development/config)
[config system](http://ipython.readthedocs.io/en/stable/development/config.html)
that Jupyter uses.
## Configure using command line options
To display all command line options that are available for configuration:
To display all command line options that are available for configuration run the following command:
```bash
jupyterhub --help-all
```
Configuration using the command line options is done when launching JupyterHub.
For example, to start JupyterHub on ``10.0.1.2:443`` with https, you
For example, to start JupyterHub on `10.0.1.2:443` with https, you
would enter:
```bash
jupyterhub --ip 10.0.1.2 --port 443 --ssl-key my_ssl.key --ssl-cert my_ssl.cert
```
All configurable options may technically be set on the command-line,
All configurable options may technically be set on the command line,
though some are inconvenient to type. To set a particular configuration
parameter, `c.Class.trait`, you would use the command line option,
`--Class.trait`, when starting JupyterHub. For example, to configure the
`c.Spawner.notebook_dir` trait from the command-line, use the
`c.Spawner.notebook_dir` trait from the command line, use the
`--Spawner.notebook_dir` option:
```bash
@@ -77,24 +77,24 @@ jupyterhub --Spawner.notebook_dir='~/assignments'
## Configure for various deployment environments
The default authentication and process spawning mechanisms can be replaced, and
specific [authenticators](./authenticators-users-basics) and
[spawners](./spawners-basics) can be set in the configuration file.
specific [authenticators](authenticators-users-basics) and
[spawners](spawners-basics) can be set in the configuration file.
This enables JupyterHub to be used with a variety of authentication methods or
process control and deployment environments. [Some examples](../reference/config-examples),
meant as illustration, are:
meant as illustrations, are:
- Using GitHub OAuth instead of PAM with [OAuthenticator](https://github.com/jupyterhub/oauthenticator)
- Spawning single-user servers with Docker, using the [DockerSpawner](https://github.com/jupyterhub/dockerspawner)
## Run the proxy separately
This is *not* strictly necessary, but useful in many cases. If you
use a custom proxy (e.g. Traefik), this also not needed.
This is _not_ strictly necessary, but useful in many cases. If you
use a custom proxy (e.g. Traefik), this is also not needed.
Connections to user servers go through the proxy, and *not* the hub
itself. If the proxy stays running when the hub restarts (for
maintenance, re-configuration, etc.), then user connections are not
interrupted. For simplicity, by default the hub starts the proxy
automatically, so if the hub restarts, the proxy restarts, and user
connections are interrupted. It is easy to run the proxy separately,
connections are interrupted. It is easy to run the proxy separately,
for information see [the separate proxy page](../reference/separate-proxy).

View File

@@ -0,0 +1,35 @@
# Frequently asked questions
## How do I share links to notebooks?
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
Sharing links to notebooks is a common activity,
and can look different based on what you mean.
Your first instinct might be to copy the URL you see in the browser,
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
However, let's break down what this URL means:
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
which means that sharing this URL is asking the person you share the link with
to come to _your server_ and look at the exact same file.
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
But what is our actual goal?
A typical situation is that you have some shared or common filesystem,
such that the same path corresponds to the same document
(either the exact same document or another copy of it).
Typically, what folks want when they do sharing like this
is for each visitor to open the same file _on their own server_,
so Breq would open `/user/breq/notebooks/foo.ipynb` and
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
JupyterHub has a special URL that does exactly this!
It's called `/hub/user-redirect/...`.
So if you replace `/user/yourname` in your URL bar
with `/hub/user-redirect` any visitor should get the same
URL on their own server, rather than visiting yours.
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
action in the file browser.

View File

@@ -1,5 +1,10 @@
Getting Started
===============
Get Started
===========
This section covers how to configure and customize JupyterHub for your
needs. It contains information about authentication, networking, security, and
other topics that are relevant to individuals or organizations deploying their
own JupyterHub.
.. toctree::
:maxdepth: 2
@@ -10,3 +15,5 @@ Getting Started
authenticators-users-basics
spawners-basics
services-basics
faq
institutional-faq

View File

@@ -0,0 +1,260 @@
# Institutional FAQ
This page contains common questions from users of JupyterHub,
broken down by their roles within organizations.
## For all
### Is it appropriate for adoption within a larger institutional context?
Yes! JupyterHub has been used at-scale for large pools of users, as well
as complex and high-performance computing. For example, UC Berkeley uses
JupyterHub for its Data Science Education Program courses (serving over
3,000 students). The Pangeo project uses JupyterHub to provide access
to scalable cloud computing with Dask. JupyterHub is stable and customizable
to the use-cases of large organizations.
### I keep hearing about Jupyter Notebook, JupyterLab, and now JupyterHub. Whats the difference?
Here is a quick breakdown of these three tools:
- **The Jupyter Notebook** is a document specification (the `.ipynb`) file that interweaves
narrative text with code cells and their outputs. It is also a graphical interface
that allows users to edit these documents. There are also several other graphical interfaces
that allow users to edit the `.ipynb` format (nteract, Jupyter Lab, Google Colab, Kaggle, etc).
- **JupyterLab** is a flexible and extendible user interface for interactive computing. It
has several extensions that are tailored for using Jupyter Notebooks, as well as extensions
for other parts of the data science stack.
- **JupyterHub** is an application that manages interactive computing sessions for **multiple users**.
It also connects them with infrastructure those users wish to access. It can provide
remote access to Jupyter Notebooks and JupyterLab for many people.
## For management
### Briefly, what problem does JupyterHub solve for us?
JupyterHub provides a shared platform for data science and collaboration.
It allows users to utilize familiar data science workflows (such as the scientific Python stack,
the R tidyverse, and Jupyter Notebooks) on institutional infrastructure. It also allows administrators
some control over access to resources, security, environments, and authentication.
### Is JupyterHub mature? Why should we trust it?
Yes - the core JupyterHub application recently
reached 1.0 status, and is considered stable and performant for most institutions.
JupyterHub has also been deployed (along with other tools) to work on
scalable infrastructure, large datasets, and high-performance computing.
### Who else uses JupyterHub?
JupyterHub is used at a variety of institutions in academia,
industry, and government research labs. It is most-commonly used by two kinds of groups:
- Small teams (e.g., data science teams, research labs, or collaborative projects) to provide a
shared resource for interactive computing, collaboration, and analytics.
- Large teams (e.g., a department, a large class, or a large group of remote users) to provide
access to organizational hardware, data, and analytics environments at scale.
Here is a sample of organizations that use JupyterHub:
- **Universities and colleges**: UC Berkeley, UC San Diego, Cal Poly SLO, Harvard University, University of Chicago,
University of Oslo, University of Sheffield, Université Paris Sud, University of Versailles
- **Research laboratories**: NASA, NCAR, NOAA, the Large Synoptic Survey Telescope, Brookhaven National Lab,
Minnesota Supercomputing Institute, ALCF, CERN, Lawrence Livermore National Laboratory
- **Online communities**: Pangeo, Quantopian, mybinder.org, MathHub, Open Humans
- **Computing infrastructure providers**: NERSC, San Diego Supercomputing Center, Compute Canada
- **Companies**: Capital One, SANDVIK code, Globus
See the [Gallery of JupyterHub deployments](../gallery-jhub-deployments.md) for
a more complete list of JupyterHub deployments at institutions.
### How does JupyterHub compare with hosted products, like Google Colaboratory, RStudio.cloud, or Anaconda Enterprise?
JupyterHub puts you in control of your data, infrastructure, and coding environment.
In addition, it is vendor neutral, which reduces lock-in to a particular vendor or service.
JupyterHub provides access to interactive computing environments in the cloud (similar to each of these services).
Compared with the tools above, it is more flexible, more customizable, free, and
gives administrators more control over their setup and hardware.
Because JupyterHub is an open-source, community-driven tool, it can be extended and
modified to fit an institution's needs. It plays nicely with the open source data science
stack, and can serve a variety of computing environments, user interfaces, and
computational hardware. It can also be deployed anywhere - on enterprise cloud infrastructure, on
High-Performance-Computing machines, on local hardware, or even on a single laptop, which
is not possible with most other tools for shared interactive computing.
## For IT
### How would I set up JupyterHub on institutional hardware?
That depends on what kind of hardware you've got. JupyterHub is flexible enough to be deployed
on a variety of hardware, including in-room hardware, on-prem clusters, cloud infrastructure,
etc.
The most common way to set up a JupyterHub is to use a JupyterHub distribution, these are pre-configured
and opinionated ways to set up a JupyterHub on particular kinds of infrastructure. The two distributions
that we currently suggest are:
- [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) is a scalable JupyterHub deployment and
guide that runs on Kubernetes. Better for larger or dynamic user groups (50-10,000) or more complex
compute/data needs.
- [The Littlest JupyterHub](https://tljh.jupyter.org) is a lightweight JupyterHub that runs on a single
single machine (in the cloud or under your desk). Better for smaller user groups (4-80) or more
lightweight computational resources.
### Does JupyterHub run well in the cloud?
Yes - most deployments of JupyterHub are run via cloud infrastructure and on a variety of cloud providers.
Depending on the distribution of JupyterHub that you'd like to use, you can also connect your JupyterHub
deployment with a number of other cloud-native services so that users have access to other resources from
their interactive computing sessions.
For example, if you use the [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) distribution,
you'll be able to utilize container-based workflows of other technologies such as the [dask-kubernetes](https://kubernetes.dask.org/en/latest/)
project for distributed computing.
The Z2JH Helm Chart also has some functionality built in for auto-scaling your cluster up and down
as more resources are needed - allowing you to utilize the benefits of a flexible cloud-based deployment.
### Is JupyterHub secure?
The short answer: yes. JupyterHub as a standalone application has been battle-tested at an institutional
level for several years, and makes a number of "default" security decisions that are reasonable for most
users.
- For security considerations in the base JupyterHub application,
[see the JupyterHub security page](https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html).
- For security considerations when deploying JupyterHub on Kubernetes, see the
[JupyterHub on Kubernetes security page](https://zero-to-jupyterhub.readthedocs.io/en/latest/security.html).
The longer answer: it depends on your deployment. Because JupyterHub is very flexible, it can be used
in a variety of deployment setups. This often entails connecting your JupyterHub to **other** infrastructure
(such as a [Dask Gateway service](https://gateway.dask.org/)). There are many security decisions to be made
in these cases, and the security of your JupyterHub deployment will often depend on these decisions.
If you are worried about security, don't hesitate to reach out to the JupyterHub community in the
[Jupyter Community Forum](https://discourse.jupyter.org/c/jupyterhub). This community of practice has many
individuals with experience running secure JupyterHub deployments.
### Does JupyterHub provide computing or data infrastructure?
No - JupyterHub manages user sessions and can _control_ computing infrastructure, but it does not provide these
things itself. You are expected to run JupyterHub on your own infrastructure (local or in the cloud). Moreover,
JupyterHub has no internal concept of "data", but is designed to be able to communicate with data repositories
(again, either locally or remotely) for use within interactive computing sessions.
### How do I manage users?
JupyterHub offers a few options for managing your users. Upon setting up a JupyterHub, you can choose what
kind of **authentication** you'd like to use. For example, you can have users sign up with an institutional
email address, or choose a username / password when they first log-in, or offload authentication onto
another service such as an organization's OAuth.
The users of a JupyterHub are stored locally, and can be modified manually by an administrator of the JupyterHub.
Moreover, the _active_ users on a JupyterHub can be found on the administrator's page. This page
gives you the abiltiy to stop or restart kernels, inspect user filesystems, and even take over user
sessions to assist them with debugging.
### How do I manage software environments?
A key benefit of JupyterHub is the ability for an administrator to define the environment(s) that users
have access to. There are many ways to do this, depending on what kind of infrastructure you're using for
your JupyterHub.
For example, **The Littlest JupyterHub** runs on a single VM. In this case, the administrator defines
an environment by installing packages to a shared folder that exists on the path of all users. The
**JupyterHub for Kubernetes** deployment uses Docker images to define environments. You can create your
own list of Docker images that users can select from, and can also control things like the amount of
RAM available to users, or the types of machines that their sessions will use in the cloud.
### How does JupyterHub manage computational resources?
For interactive computing sessions, JupyterHub controls computational resources via a **spawner**.
Spawners define how a new user session is created, and are customized for particular kinds of
infrastructure. For example, the KubeSpawner knows how to control a Kubernetes deployment
to create new pods when users log in.
For more sophisticated computational resources (like distributed computing), JupyterHub can
connect with other infrastructure tools (like Dask or Spark). This allows users to control
scalable or high-performance resources from within their JupyterHub sessions. The logic of
how those resources are controlled is taken care of by the non-JupyterHub application.
### Can JupyterHub be used with my high-performance computing resources?
Yes - JupyterHub can provide access to many kinds of computing infrastructure.
Especially when combined with other open-source schedulers such as Dask, you can manage fairly
complex computing infrastructures from the interactive sessions of a JupyterHub. For example
[see the Dask HPC page](https://docs.dask.org/en/latest/setup/hpc.html).
### How much resources do user sessions take?
This is highly configurable by the administrator. If you wish for your users to have simple
data analytics environments for prototyping and light data exploring, you can restrict their
memory and CPU based on the resources that you have available. If you'd like your JupyterHub
to serve as a gateway to high-performance compute or data resources, you may increase the
resources available on user machines, or connect them with computing infrastructures elsewhere.
### Can I customize the look and feel of a JupyterHub?
JupyterHub provides some customization of the graphics displayed to users. The most common
modification is to add custom branding to the JupyterHub login page, loading pages, and
various elements that persist across all pages (such as headers).
## For Technical Leads
### Will JupyterHub “just work” with our team's interactive computing setup?
Depending on the complexity of your setup, you'll have different experiences with "out of the box"
distributions of JupyterHub. If all of the resources you need will fit on a single VM, then
[The Littlest JupyterHub](https://tljh.jupyter.org) should get you up-and-running within
a half day or so. For more complex setups, such as scalable Kubernetes clusters or access
to high-performance computing and data, it will require more time and expertise with
the technologies your JupyterHub will use (e.g., dev-ops knowledge with cloud computing).
In general, the base JupyterHub deployment is not the bottleneck for setup, it is connecting
your JupyterHub with the various services and tools that you wish to provide to your users.
### How well does JupyterHub scale? What are JupyterHub's limitations?
JupyterHub works well at both a small scale (e.g., a single VM or machine) as well as a
high scale (e.g., a scalable Kubernetes cluster). It can be used for teams as small as 2, and
for user bases as large as 10,000. The scalability of JupyterHub largely depends on the
infrastructure on which it is deployed. JupyterHub has been designed to be lightweight and
flexible, so you can tailor your JupyterHub deployment to your needs.
### Is JupyterHub resilient? What happens when a machine goes down?
For JupyterHubs that are deployed in a containerized environment (e.g., Kubernetes), it is
possible to configure the JupyterHub to be fairly resistant to failures in the system.
For example, if JupyterHub fails, then user sessions will not be affected (though new
users will not be able to log in). When a JupyterHub process is restarted, it should
seamlessly connect with the user database and the system will return to normal.
Again, the details of your JupyterHub deployment (e.g., whether it's deployed on a scalable cluster)
will affect the resiliency of the deployment.
### What interfaces does JupyterHub support?
Out of the box, JupyterHub supports a variety of popular data science interfaces for user sessions,
such as JupyterLab, Jupyter Notebooks, and RStudio. Any interface that can be served
via a web address can be served with a JupyterHub (with the right setup).
### Does JupyterHub make it easier for our team to collaborate?
JupyterHub provides a standardized environment and access to shared resources for your teams.
This greatly reduces the cost associated with sharing analyses and content with other team
members, and makes it easier to collaborate and build off of one another's ideas. Combined with
access to high-performance computing and data, JupyterHub provides a common resource to
amplify your team's ability to prototype their analyses, scale them to larger data, and then
share their results with one another.
JupyterHub also provides a computational framework to share computational narratives between
different levels of an organization. For example, data scientists can share Jupyter Notebooks
rendered as [Voilà dashboards](https://voila.readthedocs.io/en/stable/) with those who are not
familiar with programming, or create publicly-available interactive analyses to allow others to
interact with your work.
### Can I use JupyterHub with R/RStudio or other languages and environments?
Yes, Jupyter is a polyglot project, and there are over 40 community-provided kernels for a variety
of languages (the most common being Python, Julia, and R). You can also use a JupyterHub to provide
access to other interfaces, such as RStudio, that provide their own access to a language kernel.

View File

@@ -11,8 +11,8 @@ This section will help you with basic proxy and network configuration to:
The Proxy's main IP address setting determines where JupyterHub is available to users.
By default, JupyterHub is configured to be available on all network interfaces
(`''`) on port 8000. *Note*: Use of `'*'` is discouraged for IP configuration;
instead, use of `'0.0.0.0'` is preferred.
(`''`) on port 8000. _Note_: Use of `'*'` is discouraged for IP configuration;
instead, use of `'0.0.0.0'` is preferred.
Changing the Proxy's main IP address and port can be done with the following
JupyterHub **command line options**:
@@ -74,7 +74,7 @@ The Hub service listens only on `localhost` (port 8081) by default.
The Hub needs to be accessible from both the proxy and all Spawners.
When spawning local servers, an IP address setting of `localhost` is fine.
If *either* the Proxy *or* (more likely) the Spawners will be remote or
If _either_ the Proxy _or_ (more likely) the Spawners will be remote or
isolated in containers, the Hub must listen on an IP that is accessible.
```python
@@ -82,20 +82,20 @@ c.JupyterHub.hub_ip = '10.0.1.4'
c.JupyterHub.hub_port = 54321
```
**Added in 0.8:** The `c.JupyterHub.hub_connect_ip` setting is the ip address or
**Added in 0.8:** The `c.JupyterHub.hub_connect_ip` setting is the IP address or
hostname that other services should use to connect to the Hub. A common
configuration for, e.g. docker, is:
```python
c.JupyterHub.hub_ip = '0.0.0.0' # listen on all interfaces
c.JupyterHub.hub_connect_ip = '10.0.1.4' # ip as seen on the docker network. Can also be a hostname.
c.JupyterHub.hub_connect_ip = '10.0.1.4' # IP as seen on the docker network. Can also be a hostname.
```
## Adjusting the hub's URL
The hub will most commonly be running on a hostname of its own. If it
The hub will most commonly be running on a hostname of its own. If it
is not for example, if the hub is being reverse-proxied and being
exposed at a URL such as `https://proxy.example.org/jupyter/` then
you will need to tell JupyterHub the base URL of the service. In such
you will need to tell JupyterHub the base URL of the service. In such
a case, it is both necessary and sufficient to set
`c.JupyterHub.base_url = '/jupyter/'` in the configuration.

View File

@@ -5,17 +5,17 @@ Security settings
You should not run JupyterHub without SSL encryption on a public network.
Security is the most important aspect of configuring Jupyter. Three
configuration settings are the main aspects of security configuration:
Security is the most important aspect of configuring Jupyter.
Three (3) configuration settings are the main aspects of security configuration:
1. :ref:`SSL encryption <ssl-encryption>` (to enable HTTPS)
2. :ref:`Cookie secret <cookie-secret>` (a key for encrypting browser cookies)
3. Proxy :ref:`authentication token <authentication-token>` (used for the Hub and
other services to authenticate to the Proxy)
The Hub hashes all secrets (e.g., auth tokens) before storing them in its
The Hub hashes all secrets (e.g. auth tokens) before storing them in its
database. A loss of control over read-access to the database should have
minimal impact on your deployment; if your database has been compromised, it
minimal impact on your deployment. If your database has been compromised, it
is still a good idea to revoke existing tokens.
.. _ssl-encryption:
@@ -31,7 +31,7 @@ Using an SSL certificate
This will require you to obtain an official, trusted SSL certificate or create a
self-signed certificate. Once you have obtained and installed a key and
certificate you need to specify their locations in the ``jupyterhub_config.py``
certificate, you need to specify their locations in the ``jupyterhub_config.py``
configuration file as follows:
.. code-block:: python
@@ -72,7 +72,7 @@ would be the needed configuration:
If SSL termination happens outside of the Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In certain cases, for example if the hub is running behind a reverse proxy, and
In certain cases, for example, if the hub is running behind a reverse proxy, and
`SSL termination is being provided by NGINX <https://www.nginx.com/resources/admin-guide/nginx-ssl-termination/>`_,
it is reasonable to run the hub without SSL.
@@ -80,12 +80,55 @@ To achieve this, remove ``c.JupyterHub.ssl_key`` and ``c.JupyterHub.ssl_cert``
from your configuration (setting them to ``None`` or an empty string does not
have the same effect, and will result in an error).
.. _authentication-token:
Proxy authentication token
--------------------------
The Hub authenticates its requests to the Proxy using a secret token that
the Hub and Proxy agree upon. Note that this applies to the default
``ConfigurableHTTPProxy`` implementation. Not all proxy implementations
use an auth token.
The value of this token should be a random string (for example, generated by
``openssl rand -hex 32``). You can store it in the configuration file or an
environment variable.
Generating and storing token in the configuration file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can set the value in the configuration file, ``jupyterhub_config.py``:
.. code-block:: python
c.ConfigurableHTTPProxy.api_token = 'abc123...' # any random string
Generating and storing as an environment variable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can pass this value of the proxy authentication token to the Hub and Proxy
using the ``CONFIGPROXY_AUTH_TOKEN`` environment variable:
.. code-block:: bash
export CONFIGPROXY_AUTH_TOKEN=$(openssl rand -hex 32)
This environment variable needs to be visible to the Hub and Proxy.
Default if token is not set
~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you do not set the Proxy authentication token, the Hub will generate a random
key itself. This means that any time you restart the Hub, you **must also
restart the Proxy**. If the proxy is a subprocess of the Hub, this should happen
automatically (this is the default configuration).
.. _cookie-secret:
Cookie secret
-------------
The cookie secret is an encryption key, used to encrypt the browser cookies
The cookie secret is an encryption key, used to encrypt the browser cookies,
which are used for authentication. Three common methods are described for
generating and configuring the cookie secret.
@@ -93,8 +136,8 @@ Generating and storing as a cookie secret file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The cookie secret should be 32 random bytes, encoded as hex, and is typically
stored in a ``jupyterhub_cookie_secret`` file. An example command to generate the
``jupyterhub_cookie_secret`` file is:
stored in a ``jupyterhub_cookie_secret`` file. Below, is an example command to generate the
``jupyterhub_cookie_secret`` file:
.. code-block:: bash
@@ -112,7 +155,7 @@ The location of the ``jupyterhub_cookie_secret`` file can be specified in the
If the cookie secret file doesn't exist when the Hub starts, a new cookie
secret is generated and stored in the file. The file must not be readable by
``group`` or ``other`` or the server won't start. The recommended permissions
``group`` or ``other``, otherwise the server won't start. The recommended permissions
for the cookie secret file are ``600`` (owner-only rw).
Generating and storing as an environment variable
@@ -133,54 +176,79 @@ the Hub starts.
Generating and storing as a binary string
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can also set the cookie secret in the configuration file
itself, ``jupyterhub_config.py``, as a binary string:
You can also set the cookie secret, as a binary string,
in the configuration file (``jupyterhub_config.py``) itself:
.. code-block:: python
c.JupyterHub.cookie_secret = bytes.fromhex('64 CHAR HEX STRING')
.. _cookies:
.. important::
Cookies used by JupyterHub authentication
-----------------------------------------
If the cookie secret value changes for the Hub, all single-user notebook
servers must also be restarted.
The following cookies are used by the Hub for handling user authentication.
This section was created based on this post_ from Discourse.
.. _authentication-token:
.. _post: https://discourse.jupyter.org/t/how-to-force-re-login-for-users/1998/6
Proxy authentication token
--------------------------
jupyterhub-hub-login
~~~~~~~~~~~~~~~~~~~~
The Hub authenticates its requests to the Proxy using a secret token that
the Hub and Proxy agree upon. The value of this string should be a random
string (for example, generated by ``openssl rand -hex 32``).
This is the login token used when visiting Hub-served pages that are
protected by authentication, such as the main home, the spawn form, etc.
If this cookie is set, then the user is logged in.
Generating and storing token in the configuration file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Resetting the Hub cookie secret effectively revokes this cookie.
Or you can set the value in the configuration file, ``jupyterhub_config.py``:
This cookie is restricted to the path ``/hub/``.
.. code-block:: python
jupyterhub-user-<username>
~~~~~~~~~~~~~~~~~~~~~~~~~~
c.JupyterHub.proxy_auth_token = '0bc02bede919e99a26de1e2a7a5aadfaf6228de836ec39a05a6c6942831d8fe5'
This is the cookie used for authenticating with a single-user server.
It is set by the single-user server, after OAuth with the Hub.
Generating and storing as an environment variable
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Effectively the same as ``jupyterhub-hub-login``, but for the
single-user server instead of the Hub. It contains an OAuth access token,
which is checked with the Hub to authenticate the browser.
You can pass this value of the proxy authentication token to the Hub and Proxy
using the ``CONFIGPROXY_AUTH_TOKEN`` environment variable:
Each OAuth access token is associated with a session id (see ``jupyterhub-session-id`` section
below).
.. code-block:: bash
To avoid hitting the Hub on every request, the authentication response is cached.
The cache key is comprised of both the token and session id, to avoid a stale cache.
export CONFIGPROXY_AUTH_TOKEN=$(openssl rand -hex 32)
Resetting the Hub cookie secret effectively revokes this cookie.
This environment variable needs to be visible to the Hub and Proxy.
This cookie is restricted to the path ``/user/<username>``,
to ensure that only the users server receives it.
Default if token is not set
~~~~~~~~~~~~~~~~~~~~~~~~~~~
jupyterhub-session-id
~~~~~~~~~~~~~~~~~~~~~
If you don't set the Proxy authentication token, the Hub will generate a random
key itself, which means that any time you restart the Hub you **must also
restart the Proxy**. If the proxy is a subprocess of the Hub, this should happen
automatically (this is the default configuration).
This is a random string, meaningless in itself, and the only cookie
shared by the Hub and single-user servers.
Its sole purpose is to coordinate logout of the multiple OAuth cookies.
This cookie is set to ``/`` so all endpoints can receive it, clear it, etc.
jupyterhub-user-<username>-oauth-state
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A short-lived cookie, used solely to store and validate OAuth state.
It is only set while OAuth between the single-user server and the Hub
is processing.
If you use your browser development tools, you should see this cookie
for a very brief moment before you are logged in,
with an expiration date shorter than ``jupyterhub-hub-login`` or
``jupyterhub-user-<username>``.
This cookie should not exist after you have successfully logged in.
This cookie is restricted to the path ``/user/<username>``, so that only
the users server receives it.

View File

@@ -2,10 +2,10 @@
When working with JupyterHub, a **Service** is defined as a process
that interacts with the Hub's REST API. A Service may perform a specific
or action or task. For example, shutting down individuals' single user
action or task. For example, shutting down individuals' single user
notebook servers that have been idle for some time is a good example of
a task that could be automated by a Service. Let's look at how the
[cull_idle_servers][] script can be used as a Service.
[jupyterhub_idle_culler][] script can be used as a Service.
## Real-world example to cull idle servers
@@ -15,16 +15,16 @@ document will:
- explain some basic information about API tokens
- clarify that API tokens can be used to authenticate to
single-user servers as of [version 0.8.0](../changelog)
- show how the [cull_idle_servers][] script can be:
- used in a Hub-managed service
- run as a standalone script
- show how the [jupyterhub_idle_culler][] script can be:
- used in a Hub-managed service
- run as a standalone script
Both examples for `cull_idle_servers` will communicate tasks to the
Both examples for `jupyterhub_idle_culler` will communicate tasks to the
Hub via the REST API.
## API Token basics
### Create an API token
### Step 1: Generate an API token
To run such an external service, an API token must be created and
provided to the service.
@@ -43,12 +43,12 @@ generating an API token is available from the JupyterHub user interface:
![API TOKEN success page](../images/token-request-success.png)
### Pass environment variable with token to the Hub
### Step 2: Pass environment variable with token to the Hub
In the case of `cull_idle_servers`, it is passed as the environment
variable called `JUPYTERHUB_API_TOKEN`.
### Use API tokens for services and tasks that require external access
### Step 3: Use API tokens for services and tasks that require external access
While API tokens are often associated with a specific user, API tokens
can be used by services that require external access for activities
@@ -62,7 +62,7 @@ c.JupyterHub.services = [
]
```
### Restart JupyterHub
### Step 4: Restart JupyterHub
Upon restarting JupyterHub, you should see a message like below in the
logs:
@@ -78,44 +78,72 @@ single-user servers, and only cookies can be used for authentication.
0.8 supports using JupyterHub API tokens to authenticate to single-user
servers.
## Configure `cull-idle` to run as a Hub-Managed Service
## How to configure the idle culler to run as a Hub-Managed Service
In `jupyterhub_config.py`, add the following dictionary for the
`cull-idle` Service to the `c.JupyterHub.services` list:
### Step 1: Install the idle culler:
```
pip install jupyterhub-idle-culler
```
### Step 2: In `jupyterhub_config.py`, add the following dictionary for the `idle-culler` Service to the `c.JupyterHub.services` list:
```python
c.JupyterHub.services = [
{
'name': 'cull-idle',
'admin': True,
'command': [sys.executable, 'cull_idle_servers.py', '--timeout=3600'],
'name': 'idle-culler',
'command': [sys.executable, '-m', 'jupyterhub_idle_culler', '--timeout=3600'],
}
]
c.JupyterHub.load_roles = [
{
"name": "list-and-cull", # name the role
"services": [
"idle-culler", # assign the service to this role
],
"scopes": [
# declare what permissions the service should have
"list:users", # list users
"read:users:activity", # read user last-activity
"admin:servers", # start/stop servers
],
}
]
```
where:
- `'admin': True` indicates that the Service has 'admin' permissions, and
- `'command'` indicates that the Service will be launched as a
- `command` indicates that the Service will be launched as a
subprocess, managed by the Hub.
## Run `cull-idle` manually as a standalone script
```{versionchanged} 2.0
Prior to 2.0, the idle-culler required 'admin' permissions.
It now needs the scopes:
Now you can run your script, i.e. `cull_idle_servers`, by providing it
- `list:users` to access the user list endpoint
- `read:users:activity` to read activity info
- `admin:servers` to start/stop servers
```
## How to run `cull-idle` manually as a standalone script
Now you can run your script by providing it
the API token and it will authenticate through the REST API to
interact with it.
This will run `cull-idle` manually. `cull-idle` can be run as a standalone
This will run the idle culler service manually. It can be run as a standalone
script anywhere with access to the Hub, and will periodically check for idle
servers and shut them down via the Hub's REST API. In order to shutdown the
servers, the token given to cull-idle must have admin privileges.
servers, the token given to `cull-idle` must have permission to list users
and admin their servers.
Generate an API token and store it in the `JUPYTERHUB_API_TOKEN` environment
variable. Run `cull_idle_servers.py` manually.
variable. Run `jupyterhub_idle_culler` manually.
```bash
export JUPYTERHUB_API_TOKEN='token'
python3 cull_idle_servers.py [--timeout=900] [--url=http://127.0.0.1:8081/hub/api]
python -m jupyterhub_idle_culler [--timeout=900] [--url=http://127.0.0.1:8081/hub/api]
```
[cull_idle_servers]: https://github.com/jupyterhub/jupyterhub/blob/master/examples/cull-idle/cull_idle_servers.py
[jupyterhub_idle_culler]: https://github.com/jupyterhub/jupyterhub-idle-culler

View File

@@ -1,12 +1,12 @@
# Spawners and single-user notebook servers
Since the single-user server is an instance of `jupyter notebook`, an entire separate
multi-process application, there are many aspect of that server can configure, and a lot of ways
to express that configuration.
A Spawner starts each single-user notebook server. Since the single-user server is an instance of `jupyter notebook`, an entire separate
multi-process application, many aspects of that server can be configured and there are a lot
of ways to express that configuration.
At the JupyterHub level, you can set some values on the Spawner. The simplest of these is
`Spawner.notebook_dir`, which lets you set the root directory for a user's server. This root
notebook directory is the highest level directory users will be able to access in the notebook
notebook directory is the highest-level directory users will be able to access in the notebook
dashboard. In this example, the root notebook directory is set to `~/notebooks`, where `~` is
expanded to the user's home directory.
@@ -14,13 +14,13 @@ expanded to the user's home directory.
c.Spawner.notebook_dir = '~/notebooks'
```
You can also specify extra command-line arguments to the notebook server with:
You can also specify extra command line arguments to the notebook server with:
```python
c.Spawner.args = ['--debug', '--profile=PHYS131']
```
This could be used to set the users default page for the single user server:
This could be used to set the user's default page for the single-user server:
```python
c.Spawner.args = ['--NotebookApp.default_url=/notebooks/Welcome.ipynb']