transferred docs to FAQ folder

This commit is contained in:
alwasega
2023-01-24 12:13:18 +03:00
parent cb0073e9b8
commit d8a5034b16
7 changed files with 6 additions and 4 deletions

36
docs/source/faq/faq.md Normal file
View File

@@ -0,0 +1,36 @@
# Frequently asked questions
## How do I share links to notebooks?
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
Sharing links to notebooks is a common activity,
and can look different based on what you mean.
Your first instinct might be to copy the URL you see in the browser,
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
However, let's break down what this URL means:
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
which means that sharing this URL is asking the person you share the link with
to come to _your server_ and look at the exact same file.
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
**But what is our actual goal?**
A typical situation is that you have some shared or common filesystem,
such that the same path corresponds to the same document
(either the exact same document or another copy of it).
Typically, what folks want when they do sharing like this
is for each visitor to open the same file _on their own server_,
so Breq would open `/user/breq/notebooks/foo.ipynb` and
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
JupyterHub has a special URL that does exactly this!
It's called `/hub/user-redirect/...`.
So if you replace `/user/yourname` in your URL bar
with `/hub/user-redirect` any visitor should get the same
URL on their own server, rather than visiting yours.
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
action in the file browser.

View File

@@ -5,4 +5,7 @@ Find answers to some of the most frequently-asked questions around JupyterHub an
```{toctree}
:maxdepth: 2
faq
institutional-faq
troubleshooting
```

View File

@@ -0,0 +1,267 @@
# Institutional FAQ
This page contains common questions from users of JupyterHub,
broken down by their roles within organizations.
## For all
### Is it appropriate for adoption within a larger institutional context?
Yes! JupyterHub has been used at-scale for large pools of users, as well
as complex and high-performance computing.
For example,
- UC Berkeley uses
JupyterHub for its Data Science Education Program courses (serving over
3,000 students).
- The Pangeo project uses JupyterHub to provide access
to scalable cloud computing with Dask.
JupyterHub is stable and customizable
to the use-cases of large organizations.
### I keep hearing about Jupyter Notebook, JupyterLab, and now JupyterHub. Whats the difference?
Here is a quick breakdown of these three tools:
- **The Jupyter Notebook** is a document specification (the `.ipynb`) file that interweaves
narrative text with code cells and their outputs. It is also a graphical interface
that allows users to edit these documents. There are also several other graphical interfaces
that allow users to edit the `.ipynb` format (nteract, Jupyter Lab, Google Colab, Kaggle, etc).
- **JupyterLab** is a flexible and extendible user interface for interactive computing. It
has several extensions that are tailored for using Jupyter Notebooks, as well as extensions
for other parts of the data science stack.
- **JupyterHub** is an application that manages interactive computing sessions for **multiple users**.
It also connects users with infrastructure they wish to access. It can provide
remote access to Jupyter Notebooks and JupyterLab for many people.
## For management
### Briefly, what problem does JupyterHub solve for us?
JupyterHub provides a shared platform for data science and collaboration.
It allows users to utilize familiar data science workflows (such as the scientific Python stack,
the R tidyverse, and Jupyter Notebooks) on institutional infrastructure. It also gives administrators
some control over access to resources, security, environments, and authentication.
### Is JupyterHub mature? Why should we trust it?
Yes - the core JupyterHub application recently
reached 1.0 status, and is considered stable and performant for most institutions.
JupyterHub has also been deployed (along with other tools) to work on
scalable infrastructure, large datasets, and high-performance computing.
### Who else uses JupyterHub?
JupyterHub is used at a variety of institutions in academia,
industry, and government research labs. It is most-commonly used by two kinds of groups:
- Small teams (e.g., data science teams, research labs, or collaborative projects) to provide a
shared resource for interactive computing, collaboration, and analytics.
- Large teams (e.g., a department, a large class, or a large group of remote users) to provide
access to organizational hardware, data, and analytics environments at scale.
Here is a sample of organizations that use JupyterHub:
- **Universities and colleges**: UC Berkeley, UC San Diego, Cal Poly SLO, Harvard University, University of Chicago,
University of Oslo, University of Sheffield, Université Paris Sud, University of Versailles
- **Research laboratories**: NASA, NCAR, NOAA, the Large Synoptic Survey Telescope, Brookhaven National Lab,
Minnesota Supercomputing Institute, ALCF, CERN, Lawrence Livermore National Laboratory
- **Online communities**: Pangeo, Quantopian, mybinder.org, MathHub, Open Humans
- **Computing infrastructure providers**: NERSC, San Diego Supercomputing Center, Compute Canada
- **Companies**: Capital One, SANDVIK code, Globus
See the [Gallery of JupyterHub deployments](gallery-of-deployments) for
a more complete list of JupyterHub deployments at institutions.
### How does JupyterHub compare with hosted products, like Google Colaboratory, RStudio.cloud, or Anaconda Enterprise?
JupyterHub puts you in control of your data, infrastructure, and coding environment.
In addition, it is vendor neutral, which reduces lock-in to a particular vendor or service.
JupyterHub provides access to interactive computing environments in the cloud (similar to each of these services).
Compared with the tools above, it is more flexible, more customizable, free, and
gives administrators more control over their setup and hardware.
Because JupyterHub is an open-source, community-driven tool, it can be extended and
modified to fit an institution's needs. It plays nicely with the open source data science
stack, and can serve a variety of computing environments, user interfaces, and
computational hardware. It can also be deployed anywhere - on enterprise cloud infrastructure, on
High-Performance-Computing machines, on local hardware, or even on a single laptop, which
is not possible with most other tools for shared interactive computing.
## For IT
### How would I set up JupyterHub on institutional hardware?
That depends on what kind of hardware you've got. JupyterHub is flexible enough to be deployed
on a variety of hardware, including in-room hardware, on-prem clusters, cloud infrastructure,
etc.
The most common way to set up a JupyterHub is to use a JupyterHub distribution, these are pre-configured
and opinionated ways to set up a JupyterHub on particular kinds of infrastructure. The two distributions
that we currently suggest are:
- [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) is a scalable JupyterHub deployment and
guide that runs on Kubernetes. Better for larger or dynamic user groups (50-10,000) or more complex
compute/data needs.
- [The Littlest JupyterHub](https://tljh.jupyter.org) is a lightweight JupyterHub that runs on a single
machine (in the cloud or under your desk). Better for smaller user groups (4-80) or more
lightweight computational resources.
### Does JupyterHub run well in the cloud?
**Yes** - most deployments of JupyterHub are run via cloud infrastructure and on a variety of cloud providers.
Depending on the distribution of JupyterHub that you'd like to use, you can also connect your JupyterHub
deployment with a number of other cloud-native services so that users have access to other resources from
their interactive computing sessions.
For example, if you use the [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) distribution,
you'll be able to utilize container-based workflows of other technologies such as the [dask-kubernetes](https://kubernetes.dask.org/en/latest/)
project for distributed computing.
The Z2JH Helm Chart also has some functionality built in for auto-scaling your cluster up and down
as more resources are needed - allowing you to utilize the benefits of a flexible cloud-based deployment.
### Is JupyterHub secure?
The short answer: yes.
JupyterHub as a standalone application has been battle-tested at an institutional
level for several years, and makes a number of "default" security decisions that are reasonable for most
users.
- For security considerations in the base JupyterHub application,
[see the JupyterHub security page](https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html).
- For security considerations when deploying JupyterHub on Kubernetes, see the
[JupyterHub on Kubernetes security page](https://z2jh.jupyter.org/en/latest/security.html).
The longer answer: it depends on your deployment. Because JupyterHub is very flexible, it can be used
in a variety of deployment setups. This often entails connecting your JupyterHub to **other** infrastructure
(such as a [Dask Gateway service](https://gateway.dask.org/)). There are many security decisions to be made
in these cases, and the security of your JupyterHub deployment will often depend on these decisions.
If you are worried about security, don't hesitate to reach out to the JupyterHub community in the
[Jupyter Community Forum](https://discourse.jupyter.org/c/jupyterhub). This community of practice has many
individuals with experience running secure JupyterHub deployments and will be very glad to help you out.
### Does JupyterHub provide computing or data infrastructure?
**No** - JupyterHub manages user sessions and can _control_ computing infrastructure, but it does not provide these
things itself. You are expected to run JupyterHub on your own infrastructure (local or in the cloud). Moreover,
JupyterHub has no internal concept of "data", but is designed to be able to communicate with data repositories
(again, either locally or remotely) for use within interactive computing sessions.
### How do I manage users?
JupyterHub offers a few options for managing your users. Upon setting up a JupyterHub, you can choose what
kind of **authentication** you'd like to use. For example, you can have users sign up with an institutional
email address, or choose a username / password when they first log-in, or offload authentication onto
another service such as an organization's OAuth.
The users of a JupyterHub are stored locally, and can be modified manually by an administrator of the JupyterHub.
Moreover, the _active_ users on a JupyterHub can be found on the administrator's page. This page
gives you the abiltiy to stop or restart kernels, inspect user filesystems, and even take over user
sessions to assist them with debugging.
### How do I manage software environments?
A key benefit of JupyterHub is the ability for an administrator to define the environment(s) that users
have access to. There are many ways to do this, depending on what kind of infrastructure you're using for
your JupyterHub.
For example, **The Littlest JupyterHub** runs on a single VM. In this case, the administrator defines
an environment by installing packages to a shared folder that exists on the path of all users. The
**JupyterHub for Kubernetes** deployment uses Docker images to define environments. You can create your
own list of Docker images that users can select from, and can also control things like the amount of
RAM available to users, or the types of machines that their sessions will use in the cloud.
### How does JupyterHub manage computational resources?
For interactive computing sessions, JupyterHub controls computational resources via a **spawner**.
Spawners define how a new user session is created, and are customized for particular kinds of
infrastructure. For example, the KubeSpawner knows how to control a Kubernetes deployment
to create new pods when users log in.
For more sophisticated computational resources (like distributed computing), JupyterHub can
connect with other infrastructure tools (like Dask or Spark). This allows users to control
scalable or high-performance resources from within their JupyterHub sessions. The logic of
how those resources are controlled is taken care of by the non-JupyterHub application.
### Can JupyterHub be used with my high-performance computing resources?
Yes - JupyterHub can provide access to many kinds of computing infrastructure.
Especially when combined with other open-source schedulers such as Dask, you can manage fairly
complex computing infrastructures from the interactive sessions of a JupyterHub. For example
[see the Dask HPC page](https://docs.dask.org/en/latest/setup/hpc.html).
### How much resources do user sessions take?
This is highly configurable by the administrator. If you wish for your users to have simple
data analytics environments for prototyping and light data exploring, you can restrict their
memory and CPU based on the resources that you have available. If you'd like your JupyterHub
to serve as a gateway to high-performance computing or data resources, you may increase the
resources available on user machines, or connect them with computing infrastructures elsewhere.
### Can I customize the look and feel of a JupyterHub?
JupyterHub provides some customization of the graphics displayed to users. The most common
modification is to add custom branding to the JupyterHub login page, loading pages, and
various elements that persist across all pages (such as headers).
## For Technical Leads
### Will JupyterHub “just work” with our team's interactive computing setup?
Depending on the complexity of your setup, you'll have different experiences with "out of the box"
distributions of JupyterHub. If all of the resources you need will fit on a single VM, then
[The Littlest JupyterHub](https://tljh.jupyter.org) should get you up-and-running within
a half day or so. For more complex setups, such as scalable Kubernetes clusters or access
to high-performance computing and data, it will require more time and expertise with
the technologies your JupyterHub will use (e.g., dev-ops knowledge with cloud computing).
In general, the base JupyterHub deployment is not the bottleneck for setup, it is connecting
your JupyterHub with the various services and tools that you wish to provide to your users.
### How well does JupyterHub scale? What are JupyterHub's limitations?
JupyterHub works well at both a small scale (e.g., a single VM or machine) as well as a
high scale (e.g., a scalable Kubernetes cluster). It can be used for teams as small as 2, and
for user bases as large as 10,000. The scalability of JupyterHub largely depends on the
infrastructure on which it is deployed. JupyterHub has been designed to be lightweight and
flexible, so you can tailor your JupyterHub deployment to your needs.
### Is JupyterHub resilient? What happens when a machine goes down?
For JupyterHubs that are deployed in a containerized environment (e.g., Kubernetes), it is
possible to configure the JupyterHub to be fairly resistant to failures in the system.
For example, if JupyterHub fails, then user sessions will not be affected (though new
users will not be able to log in). When a JupyterHub process is restarted, it should
seamlessly connect with the user database and the system will return to normal.
Again, the details of your JupyterHub deployment (e.g., whether it's deployed on a scalable cluster)
will affect the resiliency of the deployment.
### What interfaces does JupyterHub support?
Out of the box, JupyterHub supports a variety of popular data science interfaces for user sessions,
such as JupyterLab, Jupyter Notebooks, and RStudio. Any interface that can be served
via a web address can be served with a JupyterHub (with the right setup).
### Does JupyterHub make it easier for our team to collaborate?
JupyterHub provides a standardized environment and access to shared resources for your teams.
This greatly reduces the cost associated with sharing analyses and content with other team
members, and makes it easier to collaborate and build off of one another's ideas. Combined with
access to high-performance computing and data, JupyterHub provides a common resource to
amplify your team's ability to prototype their analyses, scale them to larger data, and then
share their results with one another.
JupyterHub also provides a computational framework to share computational narratives between
different levels of an organization. For example, data scientists can share Jupyter Notebooks
rendered as [Voilà dashboards](https://voila.readthedocs.io/en/stable/) with those who are not
familiar with programming, or create publicly-available interactive analyses to allow others to
interact with your work.
### Can I use JupyterHub with R/RStudio or other languages and environments?
Yes, Jupyter is a polyglot project, and there are over 40 community-provided kernels for a variety
of languages (the most common being Python, Julia, and R). You can also use a JupyterHub to provide
access to other interfaces, such as RStudio, that provide their own access to a language kernel.

View File

@@ -0,0 +1,393 @@
# Troubleshooting
When troubleshooting, you may see unexpected behaviors or receive an error
message. This section provides links for identifying the cause of the
problem and how to resolve it.
## Behavior
### JupyterHub proxy fails to start
If you have tried to start the JupyterHub proxy and it fails to start:
- check if the JupyterHub IP configuration setting is
`c.JupyterHub.ip = '*'`; if it is, try `c.JupyterHub.ip = ''`
- Try starting with `jupyterhub --ip=0.0.0.0`
**Note**: If this occurs on Ubuntu/Debian, check that you are using a
recent version of [Node](https://nodejs.org). Some versions of Ubuntu/Debian come with a very old version
of Node and it is necessary to update Node.
### sudospawner fails to run
If the sudospawner script is not found in the path, sudospawner will not run.
To avoid this, specify sudospawner's absolute path. For example, start
jupyterhub with:
jupyterhub --SudoSpawner.sudospawner_path='/absolute/path/to/sudospawner'
or add:
c.SudoSpawner.sudospawner_path = '/absolute/path/to/sudospawner'
to the config file, `jupyterhub_config.py`.
### What is the default behavior when none of the lists (admin, allowed, allowed groups) are set?
When nothing is given for these lists, there will be no admins, and all users
who can authenticate on the system (i.e. all the Unix users on the server with
a password) will be allowed to start a server. The allowed username set lets you limit
this to a particular set of users, and admin_users lets you specify who
among them may use the admin interface (not necessary, unless you need to do
things like inspect other users' servers or modify the user list at runtime).
### JupyterHub Docker container is not accessible at localhost
Even though the command to start your Docker container exposes port 8000
(`docker run -p 8000:8000 -d --name jupyterhub jupyterhub/jupyterhub jupyterhub`),
it is possible that the IP address itself is not accessible/visible. As a result,
when you try http://localhost:8000 in your browser, you are unable to connect
even though the container is running properly. One workaround is to explicitly
tell Jupyterhub to start at `0.0.0.0` which is visible to everyone. Try this
command:
`docker run -p 8000:8000 -d --name jupyterhub jupyterhub/jupyterhub jupyterhub --ip 0.0.0.0 --port 8000`
### How can I kill ports from JupyterHub-managed services that have been orphaned?
I started JupyterHub + nbgrader on the same host without containers. When I try to restart JupyterHub + nbgrader with this configuration, errors appear that the service accounts cannot start because the ports are being used.
How can I kill the processes that are using these ports?
Run the following command:
sudo kill -9 $(sudo lsof -t -i:<service_port>)
Where `<service_port>` is the port used by the nbgrader course service. This configuration is specified in `jupyterhub_config.py`.
### Why am I getting a Spawn failed error message?
After successfully logging in to JupyterHub with a compatible authenticator, I get a 'Spawn failed' error message in the browser. The JupyterHub logs have `jupyterhub KeyError: "getpwnam(): name not found: <my_user_name>`.
This issue occurs when the authenticator requires a local system user to exist. In these cases, you need to use a spawner
that does not require an existing system user account, such as `DockerSpawner` or `KubeSpawner`.
### How can I run JupyterHub with sudo but use my current environment variables and virtualenv location?
When launching JupyterHub with `sudo jupyterhub` I get import errors and my environment variables don't work.
When launching services with `sudo ...` the shell won't have the same environment variables or `PATH`s in place. The most direct way to solve this issue is to use the full path to your python environment and add environment variables. For example:
```bash
sudo MY_ENV=abc123 \
/home/foo/venv/bin/python3 \
/srv/jupyterhub/jupyterhub
```
## Errors
### Error 500 after spawning my single-user server
You receive a 500 error while accessing the URL `/user/<your_name>/...`.
This is often seen when your single-user server cannot verify your user cookie
with the Hub.
There are two likely reasons for this:
1. The single-user server cannot connect to the Hub's API (networking
configuration problems)
2. The single-user server cannot _authenticate_ its requests (invalid token)
#### Symptoms
The main symptom is a failure to load _any_ page served by the single-user
server, met with a 500 error. This is typically the first page at `/user/<your_name>`
after logging in or clicking "Start my server". When a single-user notebook server
receives a request, the notebook server makes an API request to the Hub to
check if the cookie corresponds to the right user. This request is logged.
If everything is working, the response logged will be similar to this:
```
200 GET /hub/api/authorizations/cookie/jupyterhub-token-name/[secret] (@10.0.1.4) 6.10ms
```
You should see a similar 200 message, as above, in the Hub log when you first
visit your single-user notebook server. If you don't see this message in the log, it
may mean that your single-user notebook server is not connecting to your Hub.
If you see 403 (forbidden) like this, it is likely a token problem:
```
403 GET /hub/api/authorizations/cookie/jupyterhub-token-name/[secret] (@10.0.1.4) 4.14ms
```
Check the logs of the single-user notebook server, which may have more detailed
information on the cause.
#### Causes and resolutions
##### No authorization request
If you make an API request and it is not received by the server, you likely
have a network configuration issue. Often, this happens when the Hub is only
listening on 127.0.0.1 (default) and the single-user servers are not on the
same 'machine' (can be physically remote, or in a docker container or VM). The
fix for this case is to make sure that `c.JupyterHub.hub_ip` is an address
that all single-user servers can connect to, e.g.:
```python
c.JupyterHub.hub_ip = '10.0.0.1'
```
##### 403 GET /hub/api/authorizations/cookie
If you receive a 403 error, the API token for the single-user server is likely
invalid. Commonly, the 403 error is caused by resetting the JupyterHub
database (either removing jupyterhub.sqlite or some other action) while
leaving single-user servers running. This happens most frequently when using
DockerSpawner because Docker's default behavior is to stop/start containers
that reset the JupyterHub database, rather than destroying and recreating
the container every time. This means that the same API token is used by the
server for its whole life until the container is rebuilt.
The fix for this Docker case is to remove any Docker containers seeing this
issue (typically all containers created before a certain point in time):
docker rm -f jupyter-name
After this, when you start your server via JupyterHub, it will build a
new container. If this was the underlying cause of the issue, you should see
your server again.
##### Proxy settings (403 GET)
When your whole JupyterHub sits behind an organization proxy (_not_ a reverse proxy like NGINX as part of your setup and _not_ the configurable-http-proxy) the environment variables `HTTP_PROXY`, `HTTPS_PROXY`, `http_proxy`, and `https_proxy` might be set. This confuses the JupyterHub single-user servers: When connecting to the Hub for authorization they connect via the proxy instead of directly connecting to the Hub on localhost. The proxy might deny the request (403 GET). This results in the single-user server thinking it has the wrong auth token. To circumvent this you should add `<hub_url>,<hub_ip>,localhost,127.0.0.1` to the environment variables `NO_PROXY` and `no_proxy`.
### Launching Jupyter Notebooks to run as an externally managed JupyterHub service with the `jupyterhub-singleuser` command returns a `JUPYTERHUB_API_TOKEN` error
[JupyterHub services](https://jupyterhub.readthedocs.io/en/stable/reference/services.html) allow processes to interact with JupyterHub's REST API. Example use-cases include:
- **Secure Testing**: provide a canonical Jupyter Notebook for testing production data to reduce the number of entry points into production systems.
- **Grading Assignments**: provide access to shared Jupyter Notebooks that may be used for management tasks such as grading assignments.
- **Private Dashboards**: share dashboards with certain group members.
If possible, try to run the Jupyter Notebook as an externally managed service with one of the provided [jupyter/docker-stacks](https://github.com/jupyter/docker-stacks).
Standard JupyterHub installations include a [jupyterhub-singleuser](https://github.com/jupyterhub/jupyterhub/blob/9fdab027daa32c9017845572ad9d5ba1722dbc53/setup.py#L116) command which is built from the `jupyterhub.singleuser:main` method. The `jupyterhub-singleuser` command is the default command when JupyterHub launches single-user Jupyter Notebooks. One of the goals of this command is to make sure the version of JupyterHub installed within the Jupyter Notebook coincides with the version of the JupyterHub server itself.
If you launch a Jupyter Notebook with the `jupyterhub-singleuser` command directly from the command line, the Jupyter Notebook won't have access to the `JUPYTERHUB_API_TOKEN` and will return:
```
JUPYTERHUB_API_TOKEN env is required to run jupyterhub-singleuser.
Did you launch it manually?
```
If you plan on testing `jupyterhub-singleuser` independently from JupyterHub, then you can set the API token environment variable. For example, if you were to run the single-user Jupyter Notebook on the host, then:
export JUPYTERHUB_API_TOKEN=my_secret_token
jupyterhub-singleuser
With a docker container, pass in the environment variable with the run command:
docker run -d \
-p 8888:8888 \
-e JUPYTERHUB_API_TOKEN=my_secret_token \
jupyter/datascience-notebook:latest
[This example](https://github.com/jupyterhub/jupyterhub/tree/HEAD/examples/service-notebook/external) demonstrates how to combine the use of the `jupyterhub-singleuser` environment variables when launching a Notebook as an externally managed service.
## How do I...?
### Use a chained SSL certificate
Some certificate providers, i.e. Entrust, may provide you with a chained
certificate that contains multiple files. If you are using a chained
certificate you will need to concatenate the individual files by appending the
chained cert and root cert to your host cert:
cat your_host.crt chain.crt root.crt > your_host-chained.crt
You would then set in your `jupyterhub_config.py` file the `ssl_key` and
`ssl_cert` as follows:
c.JupyterHub.ssl_cert = your_host-chained.crt
c.JupyterHub.ssl_key = your_host.key
#### Example
Your certificate provider gives you the following files: `example_host.crt`,
`Entrust_L1Kroot.txt`, and `Entrust_Root.txt`.
Concatenate the files appending the chain cert and root cert to your host cert:
cat example_host.crt Entrust_L1Kroot.txt Entrust_Root.txt > example_host-chained.crt
You would then use the `example_host-chained.crt` as the value for
JupyterHub's `ssl_cert`. You may pass this value as a command line option
when starting JupyterHub or more conveniently set the `ssl_cert` variable in
JupyterHub's configuration file, `jupyterhub_config.py`. In `jupyterhub_config.py`,
set:
c.JupyterHub.ssl_cert = /path/to/example_host-chained.crt
c.JupyterHub.ssl_key = /path/to/example_host.key
where `ssl_cert` is example-chained.crt and ssl_key to your private key.
Then restart JupyterHub.
See also {ref}`ssl-encryption`.
### Install JupyterHub without a network connection
Both conda and pip can be used without a network connection. You can make your
own repository (directory) of conda packages and/or wheels, and then install
from there instead of the internet.
For instance, you can install JupyterHub with pip and configurable-http-proxy
with npmbox:
python3 -m pip wheel jupyterhub
npmbox configurable-http-proxy
### I want access to the whole filesystem and still default users to their home directory
Setting the following in `jupyterhub_config.py` will configure access to
the entire filesystem and set the default to the user's home directory.
c.Spawner.notebook_dir = '/'
c.Spawner.default_url = '/home/%U' # %U will be replaced with the username
### How do I increase the number of pySpark executors on YARN?
From the command line, pySpark executors can be configured using a command
similar to this one:
pyspark --total-executor-cores 2 --executor-memory 1G
[Cloudera documentation for configuring spark on YARN applications](https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_running_spark_on_yarn.html#spark_on_yarn_config_apps)
provides additional information. The [pySpark configuration documentation](https://spark.apache.org/docs/0.9.0/configuration.html)
is also helpful for programmatic configuration examples.
### How do I use JupyterLab's pre-release version with JupyterHub?
While JupyterLab is still under active development, we have had users
ask about how to try out JupyterLab with JupyterHub.
You need to install and enable the JupyterLab extension system-wide,
then you can change the default URL to `/lab`.
For instance:
python3 -m pip install jupyterlab
jupyter serverextension enable --py jupyterlab --sys-prefix
The important thing is that JupyterLab is installed and enabled in the
single-user notebook server environment. For system users, this means
system-wide, as indicated above. For Docker containers, it means inside
the single-user docker image, etc.
In `jupyterhub_config.py`, configure the Spawner to tell the single-user
notebook servers to default to JupyterLab:
c.Spawner.default_url = '/lab'
### How do I set up JupyterHub for a workshop (when users are not known ahead of time)?
1. Set up JupyterHub using OAuthenticator for GitHub authentication
2. Configure the admin list to have workshop leaders listed with administrator privileges.
Users will need a GitHub account to log in and be authenticated by the Hub.
### How do I set up rotating daily logs?
You can do this with [logrotate](https://linux.die.net/man/8/logrotate),
or pipe to `logger` to use Syslog instead of directly to a file.
For example, with this logrotate config file:
```
/var/log/jupyterhub.log {
copytruncate
daily
}
```
and run this daily by putting a script in `/etc/cron.daily/`:
```bash
logrotate /path/to/above-config
```
Or use syslog:
jupyterhub | logger -t jupyterhub
### Toree integration with HDFS rack awareness script
The Apache Toree kernel will have an issue when running with JupyterHub if the standard HDFS rack awareness script is used. This will materialize in the logs as a repeated WARN:
```bash
16/11/29 16:24:20 WARN ScriptBasedMapping: Exception running /etc/hadoop/conf/topology_script.py some.ip.address
ExitCodeException exitCode=1: File "/etc/hadoop/conf/topology_script.py", line 63
print rack
^
SyntaxError: Missing parentheses in call to 'print'
at `org.apache.hadoop.util.Shell.runCommand(Shell.java:576)`
```
In order to resolve this issue, there are two potential options.
1. Update HDFS core-site.xml, so the parameter "net.topology.script.file.name" points to a custom
script (e.g. /etc/hadoop/conf/custom_topology_script.py). Copy the original script and change the first line point
to a python two installation (e.g. /usr/bin/python).
2. In spark-env.sh add a Python 2 installation to your path (e.g. export PATH=/opt/anaconda2/bin:$PATH).
### Where do I find Docker images and Dockerfiles related to JupyterHub?
Docker images can be found at the [JupyterHub organization on DockerHub](https://hub.docker.com/u/jupyterhub/).
The Docker image [jupyterhub/singleuser](https://hub.docker.com/r/jupyterhub/singleuser/)
provides an example single-user notebook server for use with DockerSpawner.
Additional single-user notebook server images can be found at the [Jupyter
organization on DockerHub](https://hub.docker.com/r/jupyter/) and information
about each image at the [jupyter/docker-stacks repo](https://github.com/jupyter/docker-stacks).
### How can I view the logs for JupyterHub or the user's Notebook servers when using the DockerSpawner?
Use `docker logs <container>` where `<container>` is the container name defined within `docker-compose.yml`. For example, to view the logs of the JupyterHub container use:
docker logs hub
By default, the user's notebook server is named `jupyter-<username>` where `username` is the user's username within JupyterHub's database.
So if you wanted to see the logs for user `foo` you would use:
docker logs jupyter-foo
You can also tail logs to view them in real-time using the `-f` option:
docker logs -f hub
## Troubleshooting commands
The following commands provide additional detail about installed packages,
versions, and system information that may be helpful when troubleshooting
a JupyterHub deployment. The commands are:
- System and deployment information
```bash
jupyter troubleshoot
```
- Kernel information
```bash
jupyter kernelspec list
```
- Debug logs when running JupyterHub
```bash
jupyterhub --debug
```