mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-16 14:33:00 +00:00
what-is-jupyterhub: Full revision
This commit is contained in:
@@ -1,16 +1,22 @@
|
|||||||
# What is Jupyter and JupyterHub?
|
# What is Jupyter and JupyterHub?
|
||||||
|
|
||||||
JupyterHub is not what you think it is. Most things you think are
|
JupyterHub is not what you think it is. Most things you think are
|
||||||
part of JupyterHub are actually handled by some other component, and
|
part of JupyterHub are actually handled by some other component, for
|
||||||
it's not always obvious how the parts relate. This document was
|
example the spawner or notebook server itself, and it's not always
|
||||||
originally written to assist in debugging: very often, the actual
|
obvious how the parts relate. The knowledge contained here hasn't
|
||||||
problem is not where one thinks it is and thus people can't easily
|
been assembled in one place before, and is essential to understand
|
||||||
debug. In order to tell this story, we start at JupyterHub and go all
|
when setting up a sufficiently complex Jupyter(Hub) setup.
|
||||||
the way down to the fundamental components of Jupyter.
|
|
||||||
|
|
||||||
We occasionally leave things out or bend the truth where it helps in
|
This document was originally written to assist in debugging: very
|
||||||
explanation, and give our explanations in terms of Python even though
|
often, the actual problem is not where one thinks it is and thus
|
||||||
many other languages can be used instead.
|
people can't easily debug. In order to tell this story, we start at
|
||||||
|
JupyterHub and go all the way down to the fundamental components of
|
||||||
|
Jupyter.
|
||||||
|
|
||||||
|
In this document, we occasionally leave things out or bend the truth
|
||||||
|
where it helps in explanation, and give our explanations in terms of
|
||||||
|
Python even though Jupyter itself is language-neutral. The "(&)"
|
||||||
|
symbol highlights important points where there is more.
|
||||||
|
|
||||||
This guide is long, but after reading it you will be know of all major
|
This guide is long, but after reading it you will be know of all major
|
||||||
components in the Jupyter ecosystem and everything else you read
|
components in the Jupyter ecosystem and everything else you read
|
||||||
@@ -20,15 +26,15 @@ should make sense.
|
|||||||
|
|
||||||
## Just what is Jupyter?
|
## Just what is Jupyter?
|
||||||
|
|
||||||
Before we get too far, let's remember what our end goal is. A Jupyter
|
Before we get too far, let's remember what our end goal is. A
|
||||||
Notebook is really nothing more than a Python process (or some
|
**Jupyter Notebook** is really nothing more than a Python(&) process
|
||||||
language) which is getting commands from a web browser and displaying
|
which is getting commands from a web browser and displaying the output
|
||||||
the output via a browser. What the process actually sees can roughly
|
via that browser. What the process actually sees can roughly like
|
||||||
be considered getting data on standard input and writing to standard
|
getting commands on standard input(&) and writing to standard
|
||||||
output (*). There is nothing intrinsically special about this process
|
output(&). There is nothing intrinsically special about this process
|
||||||
- it can do anything a normal Python process can do, and nothing more.
|
- it can do anything a normal Python process can do, and nothing more.
|
||||||
The kernel handles capturing output and converting things like
|
The **Jupyter kernel** handles capturing output and converting things
|
||||||
graphics to a form usable by the browser.
|
such as graphics to a form usable by the browser.
|
||||||
|
|
||||||
Everything we explain below is building up to this, going through many
|
Everything we explain below is building up to this, going through many
|
||||||
different layers which give you many ways of customizing how this
|
different layers which give you many ways of customizing how this
|
||||||
@@ -39,36 +45,42 @@ process runs. But this process is not *too* special.
|
|||||||
## JupyterHub
|
## JupyterHub
|
||||||
|
|
||||||
**JupyterHub** is the central piece that provides multi-user
|
**JupyterHub** is the central piece that provides multi-user
|
||||||
login. Despite this, the end user only briefly interacts with it and
|
login. Despite this, the end user only briefly interacts with
|
||||||
most of the actual Jupyter session does not relate to the hub at all.
|
JupyterHub and most of the actual Jupyter session does not relate to
|
||||||
In short, anything which is related to *starting* the user's workspace
|
the hub at all: the hub mainly handles authentication and spawning the
|
||||||
is about JupyterHub, anything about *running* usually isn't.
|
single-user server. In short, anything which is related to *starting*
|
||||||
|
the user's workspace/environment is about JupyterHub, anything about
|
||||||
|
*running* usually isn't.
|
||||||
|
|
||||||
If you have problems connecting the authentication, spawning, and the
|
If you have problems connecting the authentication, spawning, and the
|
||||||
proxy (explained below), the issues is usually with JupyterHub. To
|
proxy (explained below), the issues is usually with JupyterHub. To
|
||||||
debug, JupyterHub has extensive logs which get printed to its console
|
debug, JupyterHub has extensive logs which get printed to its console
|
||||||
and can be used to discover most problems.
|
and can be used to discover most problems.
|
||||||
|
|
||||||
JupyterHub consists of the main pieces below:
|
The main pieces of JupyterHub are:
|
||||||
|
|
||||||
### Authenticators
|
### Authenticator
|
||||||
|
|
||||||
JupyterHub itself doesn't actually (necessarily) manage your users.
|
JupyterHub itself doesn't actually manage your users(&). It has a
|
||||||
It has a database of users, but it is usually connected with some
|
database of users, but it is usually connected with some other system
|
||||||
other system that manages the usernames and passwords. When someone
|
that manages the usernames and passwords. When someone tries to log
|
||||||
tries to log in to JupyteHub, it just asks the **authenticator** if
|
in to JupyteHub, it just asks the
|
||||||
the username/password is valid. The authenticator can also return
|
**authenticator**([basics](authenticators-users-basics.html),
|
||||||
user groups and admin status of users, so that JupyterHub can roughly
|
[reference](../reference/authenticators.html)) if the
|
||||||
manage users to services.
|
username/password is valid(&). The authenticator can also return user
|
||||||
|
groups and admin status of users, so that JupyterHub can do some
|
||||||
|
higher-level management. The authenticator returns a username(&),
|
||||||
|
which is passed on to the spawner, which has to use it to start that
|
||||||
|
user's environment.
|
||||||
|
|
||||||
The following authenticators are included with JupyterHub:
|
The following authenticators are included with JupyterHub:
|
||||||
|
|
||||||
- **PAMAuthenticator** uses the standard Unix/Linux operating system
|
- **PAMAuthenticator** uses the standard Unix/Linux operating system
|
||||||
functions to check users. Roughly, if someone already has access to
|
functions to check users. Roughly, if someone already has access to
|
||||||
the machine (they can log in by ssh or otherwise), they will be able
|
the machine (they can log in by ssh), they will be able to log in to
|
||||||
to log in to JupyterHub automatically. Thus, JupyterHub fills the
|
JupyterHub without any other setup. Thus, JupyterHub fills the role
|
||||||
role of a ssh server, but providing a web-browser based way to
|
of a ssh server, but providing a web-browser based way to access the
|
||||||
access the machine.
|
machine.
|
||||||
|
|
||||||
|
|
||||||
But those are fairly limited, and thus there are [plenty of others to
|
But those are fairly limited, and thus there are [plenty of others to
|
||||||
@@ -77,17 +89,16 @@ from](https://github.com/jupyterhub/jupyterhub/wiki/Authenticators).
|
|||||||
You can connect to almost any other existing service to manage your
|
You can connect to almost any other existing service to manage your
|
||||||
users. You either use all users from this other service (e.g. your
|
users. You either use all users from this other service (e.g. your
|
||||||
company), or whitelist only the allowed users (e.g. your group's
|
company), or whitelist only the allowed users (e.g. your group's
|
||||||
Github users). Some other popular authenticators include:
|
Github usernames). Some other popular authenticators include:
|
||||||
|
|
||||||
- **OAuthenticator** uses the standard OAuth protocol to verify users.
|
- **OAuthenticator** uses the standard OAuth protocol to verify users.
|
||||||
For example, you can easily use Github to authenticate your users -
|
For example, you can easily use Github to authenticate your users -
|
||||||
people have a "click to login with Github" button. This is often
|
people have a "click to login with Github" button. This is often
|
||||||
done with a whitelist to only allow certain users.
|
done with a whitelist to only allow certain users.
|
||||||
|
|
||||||
- **NativeAuthenticator** actually stores its own usernames and
|
- **NativeAuthenticator** actually stores and validates its own
|
||||||
passwords, unlike most other authenticators. Thus, you can manage
|
usernames and passwords, unlike most other authenticators. Thus,
|
||||||
all your users within JupyterHUb only. (include one more example
|
you can manage all your users within JupyterHub only.
|
||||||
here)
|
|
||||||
|
|
||||||
- There are authenticators for LTI (learning management systems),
|
- There are authenticators for LTI (learning management systems),
|
||||||
Shibboleth, Kerberos - and so on.
|
Shibboleth, Kerberos - and so on.
|
||||||
@@ -100,15 +111,17 @@ The authenticator runs internally to the Hub process but communicates
|
|||||||
with outside services.
|
with outside services.
|
||||||
|
|
||||||
If you have trouble logging in, this is usually a problem of the
|
If you have trouble logging in, this is usually a problem of the
|
||||||
authenticator. The authenticator debug information goes to the
|
authenticator. The authenticator logs are part of the the JupyterHub
|
||||||
JupyterHub logs, but there may also be hints in whatever external
|
logs, but there may also be relevant information in whatever external
|
||||||
services you are using.
|
services you are using.
|
||||||
|
|
||||||
### Spawners
|
### Spawner
|
||||||
|
|
||||||
The **spawner** is the real core of JupyterHub: when someone wants a
|
The **spawner** ([basics](spawners-basics.html),
|
||||||
notebook server, it finds resources and starts the server. It could
|
[reference](../reference/spawners.html)) is the real core of
|
||||||
run on the current server, on another server, on some cloud service,
|
JupyterHub: when someone wants a notebook server, it allocates
|
||||||
|
resources and starts the server. The notebook server could run on the
|
||||||
|
same server as JupyterHub, on another server, on some cloud service,
|
||||||
or even more. They can limit resources (CPU, memory) or isolate users
|
or even more. They can limit resources (CPU, memory) or isolate users
|
||||||
from each other - if the spawner supports it. They can also do no
|
from each other - if the spawner supports it. They can also do no
|
||||||
limiting and allow any user to access any other user's files if they
|
limiting and allow any user to access any other user's files if they
|
||||||
@@ -116,35 +129,41 @@ are not configured properly.
|
|||||||
|
|
||||||
Some basic spawners included in JupyterHub is:
|
Some basic spawners included in JupyterHub is:
|
||||||
|
|
||||||
**LocalProcessSpawner** is build in to JupyterHub and basically starts
|
- **LocalProcessSpawner** is build into JupyterHub and basically tries
|
||||||
tries to switch user to the given username and start Jupyter. It
|
to switch user to the given username (`su` (&)) and start the
|
||||||
requires that the hub be run as root (because only root has permission
|
notebook server. It requires that the hub be run as root (because
|
||||||
to start processes as other user IDs). LocalProcessSpawner is no
|
only root has permission to start processes as other user IDs).
|
||||||
different than a user logging in with something like `ssh` and running
|
LocalProcessSpawner is no different than a user logging in with
|
||||||
jobs. PAMAuthenticator and LocalProcessSpawner is the most basic way
|
something like `ssh` and running something. PAMAuthenticator and
|
||||||
of using JupyterHub (and what it does out of the box) and makes the
|
LocalProcessSpawner is the most basic way of using JupyterHub (and
|
||||||
hub not too dissimilar to an advanced ssh server.
|
what it does out of the box) and makes the hub not too dissimilar to
|
||||||
|
an advanced ssh server.
|
||||||
|
|
||||||
There are many more advanced fancy spawners:
|
There are many more advanced spawners:
|
||||||
|
|
||||||
- **SudoSpawner** is like LocalProcessSpawner but lets you run
|
- **SudoSpawner** is like LocalProcessSpawner but lets you run
|
||||||
JupyterHub without root. sudo has to be configured to allow the
|
JupyterHub without root. `sudo` has to be configured to allow the
|
||||||
hub's user to run processes under other user IDs.
|
hub's user to run processes under other user IDs.
|
||||||
|
|
||||||
- **SystemdSpawner** uses Systemd to start other processes. It can
|
- **SystemdSpawner** uses Systemd to start other processes. It can
|
||||||
isolate users from each other and provide some limits.
|
isolate users from each other and provide resource limiting.
|
||||||
|
|
||||||
- **DockerSpawner** runs stuff in Docker, a containerization system.
|
- **DockerSpawner** runs stuff in Docker, a containerization system.
|
||||||
This lets you fully isolate users, limit CPU, memory, and provide
|
This lets you fully isolate users, limit CPU, memory, and provide
|
||||||
other operating system images to fully customize the environment.
|
other container images to fully customize the environment.
|
||||||
|
|
||||||
- **KubeSpawner** runs on the Kubernetes, a cloud orchestration
|
- **KubeSpawner** runs on the Kubernetes, a cloud orchestration
|
||||||
system. The spawner can easily limit users and provide cloud
|
system. The spawner can easily limit users and provide cloud
|
||||||
scaling - but the spawner doesn't actually do that, Kubernetes does.
|
scaling - but the spawner doesn't actually do that, Kubernetes
|
||||||
|
does. The spawner just tells Kubernetes what to do. If you want to
|
||||||
|
get KubeSpawner to do something, first you would figure out how to
|
||||||
|
do it in Kubernetes, then figure out how to tell KubeSpawner to tell
|
||||||
|
Kubernetes that. Actually... this is true for most spawners.
|
||||||
|
|
||||||
- **BatchSpawner** runs on computer clusters with batch queuing
|
- **BatchSpawner** runs on computer clusters with batch job scheduling
|
||||||
systems. The user processes are run as batch jobs, having access to
|
systems (e.g Slurm, HTCondor, PBS, etc). The user processes are run
|
||||||
all the data and software that the users normally will.
|
as batch jobs, having access to all the data and software that the
|
||||||
|
users normally will.
|
||||||
|
|
||||||
In short, spawners are the interface to the rest of the operating
|
In short, spawners are the interface to the rest of the operating
|
||||||
system, and to configure them right you need to know a bit about how
|
system, and to configure them right you need to know a bit about how
|
||||||
@@ -166,24 +185,25 @@ error is usually with the spawner or the notebook server (as described
|
|||||||
in the next section). Each spawner outputs some logs to the main
|
in the next section). Each spawner outputs some logs to the main
|
||||||
JupyterHub logs, but may also have logs in other places depending on
|
JupyterHub logs, but may also have logs in other places depending on
|
||||||
what services it interacts with (for example, the Docker spawner
|
what services it interacts with (for example, the Docker spawner
|
||||||
somehow puts logs in the Docker system services).
|
somehow puts logs in the Docker system services, Kubernetes through
|
||||||
|
the `kubectl` API).
|
||||||
|
|
||||||
|
|
||||||
### Proxy
|
### Proxy
|
||||||
|
|
||||||
Previously, we said that the hub is between the user and the user's
|
Previously, we said that the hub is between the user's web browser and
|
||||||
notebook servers. It actually isn't directly between, because the
|
the user's notebook servers. It actually isn't directly between,
|
||||||
JupyterHub **proxy** relays connections between the users and their
|
because the JupyterHub **proxy** relays connections between the users
|
||||||
single-user notebook servers. What this basically means is that the
|
and their single-user notebook servers. What this basically means is
|
||||||
hub itself can shut down, and if the proxy can continue to allow users
|
that the hub itself can shut down, and if the proxy can continue to
|
||||||
to communicate with their notebook servers. (This just further
|
allow users to communicate with their notebook servers. (This just
|
||||||
emphasizes that the hub is responsible for starting, not running, the
|
further emphasizes that the hub is responsible for starting, not
|
||||||
notebooks). By default, the hub starts the proxy automatically (so
|
running, the notebooks). By default, the hub starts the proxy
|
||||||
that you don't realize there is a separate proxy) and stops the proxy
|
automatically (so that you don't realize there is a separate proxy)
|
||||||
when the hub stops (so that connections get interrupted). But when
|
and stops the proxy when the hub stops (so that connections get
|
||||||
you [configure the proxy to run
|
interrupted). But when you [configure the proxy to run
|
||||||
separately](https://jupyterhub.readthedocs.io/en/stable/reference/separate-proxy.html),
|
separately](../reference/separate-proxy.html),
|
||||||
your users connections will stay working even without the hub.
|
users connection will stay working even without the hub.
|
||||||
|
|
||||||
The default proxy is **ConfigurableHttpProxy** which is simple but
|
The default proxy is **ConfigurableHttpProxy** which is simple but
|
||||||
effective. A more advanced option is the **Traefik Proxy**, which
|
effective. A more advanced option is the **Traefik Proxy**, which
|
||||||
@@ -192,11 +212,11 @@ gives you redundancy and high-availability.
|
|||||||
When users "connect to JupyterHub", they *always* first connect to the
|
When users "connect to JupyterHub", they *always* first connect to the
|
||||||
proxy and the proxy relays the connection to the hub. Thus, the proxy
|
proxy and the proxy relays the connection to the hub. Thus, the proxy
|
||||||
is responsible for SSL and accepting connections from the rest of the
|
is responsible for SSL and accepting connections from the rest of the
|
||||||
internet.
|
internet. The user uses the hub to authenticate and start the server,
|
||||||
|
and then the hub connect back to the proxy to adjust the proxy routes
|
||||||
The hub has to connect to the proxy to adjust the routes (The web path
|
for the user's server (e.g. the web path `/user/someone` redirects to
|
||||||
`/user/someone` goes to the server of someone at a certain address).
|
the server of someone at a certain internal address). The proxy has
|
||||||
The proxy has to be able to connect to both the hub and all the
|
to be able to internally connect to both the hub and all the
|
||||||
single-user servers.
|
single-user servers.
|
||||||
|
|
||||||
The proxy always runs as a separate process to JupyterHub (even though
|
The proxy always runs as a separate process to JupyterHub (even though
|
||||||
@@ -210,26 +230,43 @@ notebook servers, or making the first connection to the hub, it is
|
|||||||
usually caused by the proxy. The ConfigurableHttpProxy's logs are
|
usually caused by the proxy. The ConfigurableHttpProxy's logs are
|
||||||
mixed with JupyterHub's logs if it's started through the hub (the
|
mixed with JupyterHub's logs if it's started through the hub (the
|
||||||
default case), otherwise from whatever system runs the proxy (if you
|
default case), otherwise from whatever system runs the proxy (if you
|
||||||
do it, you'll know).
|
do configure it, you'll know).
|
||||||
|
|
||||||
### Services
|
### Services
|
||||||
|
|
||||||
JupyterHub has the concept of **services**, which are other web
|
JupyterHub has the concept of **services**
|
||||||
services started by the hub, but otherwise are not really related to
|
([basics](services-basics.html),
|
||||||
the hub itself. They are often used to do things related to Jupyter
|
[reference](../reference/services.html)), which are other web services
|
||||||
|
started by the hub, but otherwise are not necessarily related to the
|
||||||
|
hub itself. They are often used to do things related to Jupyter
|
||||||
(things that user interacts with, usually not the hub), but could
|
(things that user interacts with, usually not the hub), but could
|
||||||
always be run some other way. Running from the hub provides an easy
|
always be run some other way. Running from the hub provides an easy
|
||||||
way to get Hub API tokens and authenticate users against the hub.
|
way to get Hub API tokens and authenticate users against the hub. It
|
||||||
|
can also automatically add a proxy route to forward web requests to
|
||||||
|
that service.
|
||||||
|
|
||||||
The configuration option `c.JupyterHub.services` (??) is used to start
|
A common example of a service is the [cull idle
|
||||||
services from the hub.
|
servers](https://jupyterhub.readthedocs.io/en/stable/getting-started/services-basics.html#real-world-example-to-cull-idle-servers)
|
||||||
|
script. When started by the hub, it automatically gets admin API
|
||||||
|
tokens. It uses the API to list all running servers, compare against
|
||||||
|
activity timeouts, and shut down servers exceeding the limits. Even
|
||||||
|
though this is an intrinsic part of JupyterHub, it is only loosely
|
||||||
|
coupled and running as a service provides convenience of
|
||||||
|
authentication - it could be just as well run some other way, with a
|
||||||
|
manually provided API token.
|
||||||
|
|
||||||
Let's use the often-requested question of *sharing files using
|
Another example of an often-requested question of *sharing files using
|
||||||
hubshare* as an example. Hubshare would work as an external service
|
hubshare* as an example. Hubshare would work as an external service
|
||||||
which user notebooks talk to and use Hub authentication, but otherwise
|
which user notebooks talk to and use Hub authentication, but otherwise
|
||||||
it isn't directly a matter of the hub. You could equally well share
|
it isn't directly a matter of the hub. You could equally well share
|
||||||
files by other extensions to the single-user notebook servers or
|
files by other extensions to the single-user notebook servers or
|
||||||
configuring the spawners to access shared storage spaces.
|
configuring the spawners to access shared storage spaces. In order to
|
||||||
|
use something such as hubshare, the difficulty is not modifying
|
||||||
|
JupyterHub: it is modifying the notebook servers to speak to some
|
||||||
|
service, and making that service.
|
||||||
|
|
||||||
|
The configuration option `c.JupyterHub.services` is used to start
|
||||||
|
services from the hub.
|
||||||
|
|
||||||
When a service is started from JupyterHub automatically, its logs are
|
When a service is started from JupyterHub automatically, its logs are
|
||||||
included in the JupyterHub logs.
|
included in the JupyterHub logs.
|
||||||
@@ -243,15 +280,14 @@ running `jupyter notebook` or `jupyter lab` from the command line -
|
|||||||
the actual Jupyter user interface for a single person.
|
the actual Jupyter user interface for a single person.
|
||||||
|
|
||||||
The role of the spawner is to start this server - basically, running
|
The role of the spawner is to start this server - basically, running
|
||||||
the command `jupyter notebook`.
|
the command `jupyter notebook`. Actually it doesn't run that, it runs
|
||||||
Actually it doesn't run that, it runs `jupyterhub-singleuser` which
|
`jupyterhub-singleuser` which first communicates with the hub to say
|
||||||
first communicates with the hub to say "I'm alive" before running a
|
"I'm alive" before running a completely normal Jupyter server. The
|
||||||
completely normal Jupyter server. The single-user server can be
|
single-user server can be JupyterLab or classic notebooks. By this
|
||||||
JupyterLab or classic notebooks. By this point, the hub is almost
|
point, the hub is almost completely out of the picture (the web
|
||||||
completely out of the picture (the web traffic is going through proxy
|
traffic is going through proxy unchanged). Also by this time, the
|
||||||
unchanged). By this time, the spawner has already decided the
|
spawner has already decided the environment which this single-user
|
||||||
environment which this single-user server will have and the
|
server will have and the single-user server has to deal with that.
|
||||||
single-user server has to deal with that.
|
|
||||||
|
|
||||||
The spawner starts the server using `jupyterhub-singleuser` with some
|
The spawner starts the server using `jupyterhub-singleuser` with some
|
||||||
environment variables like `JUPYTERHUB_API_TOKEN` and
|
environment variables like `JUPYTERHUB_API_TOKEN` and
|
||||||
@@ -264,16 +300,23 @@ them, they run through the same backend server process and the web
|
|||||||
frontend is an option when it is starting. The spawner can choose the
|
frontend is an option when it is starting. The spawner can choose the
|
||||||
command line when it starts the single-user server. Extensions are a
|
command line when it starts the single-user server. Extensions are a
|
||||||
property of the single-user server (in two parts: there can be a part
|
property of the single-user server (in two parts: there can be a part
|
||||||
that runs in server process, and parts that run in javascript in lab
|
that runs in the Python server process, and parts that run in
|
||||||
or notebook).
|
javascript in lab or notebook).
|
||||||
|
|
||||||
|
If one wants to install software for users, it is not a matter of
|
||||||
|
"installing it for JupyerHub" - it's a matter of installing it for the
|
||||||
|
single-user server, which might be the same environment as the hub,
|
||||||
|
but not necessarily. (Actually, see below - it's a matter of the
|
||||||
|
kernels!)
|
||||||
|
|
||||||
After the single-user notebook server is started, any errors are only
|
After the single-user notebook server is started, any errors are only
|
||||||
an issue of the single-user notebook server. Sometimes, it seems like
|
an issue of the single-user notebook server. Sometimes, it seems like
|
||||||
the spawner is failing, but really the spawner is working but the
|
the spawner is failing, but really the spawner is working but the
|
||||||
single-user notebook server dies right away (in this case, you need to
|
single-user notebook server dies right away (in this case, you need to
|
||||||
find the problem with the single-user server and adjust the spawner to
|
find the problem with the single-user server and adjust the spawner to
|
||||||
start it correctly). This can happen, for example, if the spawner
|
start it correctly or fix the environment). This can happen, for
|
||||||
doesn't set an environment variable or doesn't provide storage.
|
example, if the spawner doesn't set an environment variable or doesn't
|
||||||
|
provide storage.
|
||||||
|
|
||||||
The single-user server's logs are handled by the spawner, so if you
|
The single-user server's logs are handled by the spawner, so if you
|
||||||
notice problems at this phase you need to check your spawner for
|
notice problems at this phase you need to check your spawner for
|
||||||
@@ -289,21 +332,26 @@ configuration option of the spawner.
|
|||||||
### Notebook
|
### Notebook
|
||||||
|
|
||||||
**(Jupyter) Notebook** is the classic interface, where each notebook
|
**(Jupyter) Notebook** is the classic interface, where each notebook
|
||||||
opens in a separate tab.
|
opens in a separate tab. It is traditionally started by `jupyter
|
||||||
|
notebook`.
|
||||||
|
|
||||||
Does anything need to be said here?
|
Does anything need to be said here?
|
||||||
|
|
||||||
### Lab
|
### Lab
|
||||||
|
|
||||||
**JupyterLab** is the new interface, where multiple notebooks are
|
**JupyterLab** is the new interface, where multiple notebooks are
|
||||||
openable in the same tab in an IDE-like environment. JupyterLab is
|
openable in the same tab in an IDE-like environment. It is
|
||||||
run thorugh the same server file, but at a path `/lab` instead of
|
traditionally started with `jupyter lab`. Both Notebook and Lab use
|
||||||
`/tree`.
|
the same `.ipynb` file format.
|
||||||
|
|
||||||
Both Notebook and Lab use the same `.ipynb` file format.
|
JupyterLab is run thorugh the same server file, but at a path `/lab`
|
||||||
|
instead of `/tree`. Thus, they can be active at the same time in the
|
||||||
|
backend and you can switch between them at runtime by changing your
|
||||||
|
URL path.
|
||||||
|
|
||||||
Does anything need to be said here?
|
Extensions need to be re-written for JupyterLab (if moving from
|
||||||
- how extensions work in lab compared to notebook
|
classic notebooks). But, the server-side of the extensions can be
|
||||||
|
shared by both.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -313,30 +361,40 @@ Normally, our tour of the Jupyter ecosystem would stop here. But,
|
|||||||
since if you've read this far you probably need to know every last
|
since if you've read this far you probably need to know every last
|
||||||
bit, let's go further and talk about the kernels. The commands you
|
bit, let's go further and talk about the kernels. The commands you
|
||||||
run in the notebook session are not executed in the same process as
|
run in the notebook session are not executed in the same process as
|
||||||
the notebook itself, but in a separate **kernel**. There are [many
|
the notebook itself, but in a separate **Jupyter kernel**. There are [many
|
||||||
kernels
|
kernels
|
||||||
available](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels).
|
available](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels).
|
||||||
|
|
||||||
As a basic approximation, a **Jupyter kernel** is a process which
|
As a basic approximation, a **Jupyter kernel** is a process which
|
||||||
accepts commands (cells that are run) and returns the output to
|
accepts commands (cells that are run) and returns the output to
|
||||||
Jupyter to display. One example is the **IPython Jupyter kernel**,
|
Jupyter to display. One example is the **IPython Jupyter kernel**,
|
||||||
which runs Python and adds the IPython magic functions (`%`, `%%`,
|
which runs Python. There is nothing special about it, it can be
|
||||||
`!`, etc. commands). There is nothing special about it, it can be
|
considered a *normal Python process. The kernel process can be
|
||||||
considered a *normal Python process*. Like we said above, the kernel
|
approximated in UNIX terms as a process that takes commands on stdin
|
||||||
process can be approximated as a process that takes commands on stdin
|
and returns stuff on stdout(&). Obviously, it's more because it has
|
||||||
and returns stuff on stdout. Actually, a kernel is more fancy,
|
to be able to disentangle all the possible outputs, such as figures,
|
||||||
because it can communicate over the network and add in magic commands.
|
and present it to the user in a web browser.
|
||||||
|
|
||||||
Kernel communication is via the the ZeroMQ protocol on the local
|
Kernel communication is via the the ZeroMQ protocol on the local
|
||||||
computer. Kernels are separate processes from the main single-user
|
computer. Kernels are separate processes from the main single-user
|
||||||
notebook server (and thus obviously, different from the JupyterHub
|
notebook server (and thus obviously, different from the JupyterHub
|
||||||
process and everything else). By default (and unless you do something
|
process and everything else). By default (and unless you do something
|
||||||
special), kernels share the same environment as the notebook server
|
special), kernels share the same environment as the notebook server
|
||||||
(data, resource limits, permissions, user id, etc.). But there are
|
(data, resource limits, permissions, user id, etc.). But they *can*
|
||||||
things like the Jupyter Kernel Gateway / Enterprise Gateway, which
|
run in a separate Python environment from the single-user server
|
||||||
|
(search `--prefix` in the [ipykernel installation
|
||||||
|
instructions](https://ipython.readthedocs.io/en/stable/install/kernel_install.html))
|
||||||
|
There are also more fancy techniques such as the [Jupyter Kernel
|
||||||
|
Gateway](https://jupyter-kernel-gateway.readthedocs.io/) and [Enterprise
|
||||||
|
Gateway](https://jupyter-enterprise-gateway.readthedocs.io/), which
|
||||||
allow you to run the kernels on a different machine and possibly with
|
allow you to run the kernels on a different machine and possibly with
|
||||||
a different environment.
|
a different environment.
|
||||||
|
|
||||||
|
A kernel doesn't just execute it's language - cell magics such as `%`,
|
||||||
|
`%%`, and `!` are a property of the kernel - in particular, these are
|
||||||
|
IPython kernel commands and don't necessarily work in any other
|
||||||
|
kernel unless they specifically support them.
|
||||||
|
|
||||||
What does this mean? There is yet *another* layer of configurability.
|
What does this mean? There is yet *another* layer of configurability.
|
||||||
Each kernel can run a different programming language, with different
|
Each kernel can run a different programming language, with different
|
||||||
software, and so on. By default, they would run in the same
|
software, and so on. By default, they would run in the same
|
||||||
@@ -345,8 +403,8 @@ other way they are configured is by
|
|||||||
running in different Python virtual environments or conda
|
running in different Python virtual environments or conda
|
||||||
environments. They can be started and killed independently (there is
|
environments. They can be started and killed independently (there is
|
||||||
normally one per notebook you have open). The kernels is what uses
|
normally one per notebook you have open). The kernels is what uses
|
||||||
most of your memory and CPU if you have large amounts of data open or
|
most of your memory and CPU when running Jupyter - the rest of the web
|
||||||
are using a lot of compute power.
|
interface has a small footprint.
|
||||||
|
|
||||||
You can list your installed kernels with `jupyter kernelspec list`.
|
You can list your installed kernels with `jupyter kernelspec list`.
|
||||||
If you look at one of `kernel.json` files in those directories, you
|
If you look at one of `kernel.json` files in those directories, you
|
||||||
@@ -355,43 +413,47 @@ automatically made by the kernels, but can be edited as needed. [The
|
|||||||
spec](https://jupyter-client.readthedocs.io/en/stable/kernels.html)
|
spec](https://jupyter-client.readthedocs.io/en/stable/kernels.html)
|
||||||
tells you even more.
|
tells you even more.
|
||||||
|
|
||||||
The kernel has to be reachable by the single-user notebook server.
|
The normally has to be reachable by the single-user notebook server
|
||||||
|
but the gateways mentioned above can get around that limitation.
|
||||||
|
|
||||||
If you get problems with "Kernel died" or some other error in a single
|
If you get problems with "Kernel died" or some other error in a single
|
||||||
notebook but the single-user notebook server stays working, it is
|
notebook but the single-user notebook server stays working, it is
|
||||||
usually a problem with the kernel. It could be that you are trying to
|
usually a problem with the kernel. It could be that you are trying to
|
||||||
use more resources than you are allowed and the symptom is the kernel
|
use more resources than you are allowed and the symptom is the kernel
|
||||||
getting killed. It could be that it crashes for some other reason.
|
getting killed. It could be that it crashes for some other reason.
|
||||||
|
In these cases, you need to find the kernel logs and investigate.
|
||||||
|
|
||||||
The debug logs for the kernel are normally mixed in with the
|
The debug logs for the kernel are normally mixed in with the
|
||||||
single-user notebook server logs.
|
single-user notebook server logs.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### JupyterHub distributions
|
## JupyterHub distributions
|
||||||
|
|
||||||
There are several "distributions" which automatically install all of
|
There are several "distributions" which automatically install all of
|
||||||
the things above and configure them for a certain purpose. They are
|
the things above and configure them for a certain purpose. They are
|
||||||
good ways to get started, but if you are doing very custom things
|
good ways to get started, but if you have custom needs, eventually it
|
||||||
eventually it may become hard to adapt them to your needs.
|
may become hard to adapt them to your requirements.
|
||||||
|
|
||||||
* **Zero to JupyterHub with Kubernetes** installs an entire scaleable
|
* [**Zero to JupyterHub with
|
||||||
system using Kubernetes. Uses KubeSpawner, ....Authenticator, ....
|
Kubernetes**](https://zero-to-jupyterhub.readthedocs.io/) installs
|
||||||
|
an entire scaleable system using Kubernetes. Uses KubeSpawner,
|
||||||
|
....Authenticator, ....
|
||||||
|
|
||||||
* **The Littlest JupyterHub** installs JupyterHub on a single system
|
* [**The Littlest JupyterHub**](https://tljh.jupyter.org/) installs JupyterHub on a single system
|
||||||
using SystemdSpawner and NativeAuthenticator (which manages users
|
using SystemdSpawner and NativeAuthenticator (which manages users
|
||||||
itself).
|
itself).
|
||||||
|
|
||||||
* **JupyterHub the hard way** takes you through everything yourself.
|
* [**JupyterHub the hard
|
||||||
It is a natural companion to this guide, since you get to experience
|
way**](https://jupyterhub.readthedocs.io/en/stable/installation-guide-hard.html)
|
||||||
every little bit.
|
takes you through everything yourself. It is a natural companion to
|
||||||
|
this guide, since you get to experience every little bit.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## I want to...
|
## I want to...
|
||||||
|
|
||||||
**Share files between users**. Spawner to share data, or
|
TODO: answers to common cross-layer questions.
|
||||||
JupyterNotebook/Lab user interface + some service for distributing
|
|
||||||
files.
|
|
||||||
|
|
||||||
|
|
||||||
## What's next?
|
## What's next?
|
||||||
@@ -399,5 +461,5 @@ files.
|
|||||||
Now you know everything. Well, you know how everything relates, but
|
Now you know everything. Well, you know how everything relates, but
|
||||||
there are still plenty of details, implementations, and exceptions.
|
there are still plenty of details, implementations, and exceptions.
|
||||||
When setting up JupyterHub, the first step is to consider the above
|
When setting up JupyterHub, the first step is to consider the above
|
||||||
layers and see what options are suitable for you. Then, put
|
layers, decide the right option for each of them, then begin putting
|
||||||
everything together.
|
everything together.
|
||||||
|
Reference in New Issue
Block a user