mirror of
https://github.com/jupyterhub/jupyterhub.git
synced 2025-10-10 03:23:04 +00:00
transferred docs to FAQ folder
This commit is contained in:
@@ -1,36 +0,0 @@
|
||||
# Frequently asked questions
|
||||
|
||||
## How do I share links to notebooks?
|
||||
|
||||
In short, where you see `/user/name/notebooks/foo.ipynb` use `/hub/user-redirect/notebooks/foo.ipynb` (replace `/user/name` with `/hub/user-redirect`).
|
||||
|
||||
Sharing links to notebooks is a common activity,
|
||||
and can look different based on what you mean.
|
||||
Your first instinct might be to copy the URL you see in the browser,
|
||||
e.g. `hub.jupyter.org/user/yourname/notebooks/coolthing.ipynb`.
|
||||
However, let's break down what this URL means:
|
||||
|
||||
`hub.jupyter.org/user/yourname/` is the URL prefix handled by _your server_,
|
||||
which means that sharing this URL is asking the person you share the link with
|
||||
to come to _your server_ and look at the exact same file.
|
||||
In most circumstances, this is forbidden by permissions because the person you share with does not have access to your server.
|
||||
What actually happens when someone visits this URL will depend on whether your server is running and other factors.
|
||||
|
||||
**But what is our actual goal?**
|
||||
|
||||
A typical situation is that you have some shared or common filesystem,
|
||||
such that the same path corresponds to the same document
|
||||
(either the exact same document or another copy of it).
|
||||
Typically, what folks want when they do sharing like this
|
||||
is for each visitor to open the same file _on their own server_,
|
||||
so Breq would open `/user/breq/notebooks/foo.ipynb` and
|
||||
Seivarden would open `/user/seivarden/notebooks/foo.ipynb`, etc.
|
||||
|
||||
JupyterHub has a special URL that does exactly this!
|
||||
It's called `/hub/user-redirect/...`.
|
||||
So if you replace `/user/yourname` in your URL bar
|
||||
with `/hub/user-redirect` any visitor should get the same
|
||||
URL on their own server, rather than visiting yours.
|
||||
|
||||
In JupyterLab 2.0, this should also be the result of the "Copy Shareable Link"
|
||||
action in the file browser.
|
@@ -14,6 +14,4 @@ security-basics
|
||||
authenticators-users-basics
|
||||
spawners-basics
|
||||
services-basics
|
||||
faq
|
||||
institutional-faq
|
||||
```
|
||||
|
@@ -1,267 +0,0 @@
|
||||
# Institutional FAQ
|
||||
|
||||
This page contains common questions from users of JupyterHub,
|
||||
broken down by their roles within organizations.
|
||||
|
||||
## For all
|
||||
|
||||
### Is it appropriate for adoption within a larger institutional context?
|
||||
|
||||
Yes! JupyterHub has been used at-scale for large pools of users, as well
|
||||
as complex and high-performance computing.
|
||||
For example,
|
||||
|
||||
- UC Berkeley uses
|
||||
JupyterHub for its Data Science Education Program courses (serving over
|
||||
3,000 students).
|
||||
- The Pangeo project uses JupyterHub to provide access
|
||||
to scalable cloud computing with Dask.
|
||||
|
||||
JupyterHub is stable and customizable
|
||||
to the use-cases of large organizations.
|
||||
|
||||
### I keep hearing about Jupyter Notebook, JupyterLab, and now JupyterHub. What’s the difference?
|
||||
|
||||
Here is a quick breakdown of these three tools:
|
||||
|
||||
- **The Jupyter Notebook** is a document specification (the `.ipynb`) file that interweaves
|
||||
narrative text with code cells and their outputs. It is also a graphical interface
|
||||
that allows users to edit these documents. There are also several other graphical interfaces
|
||||
that allow users to edit the `.ipynb` format (nteract, Jupyter Lab, Google Colab, Kaggle, etc).
|
||||
- **JupyterLab** is a flexible and extendible user interface for interactive computing. It
|
||||
has several extensions that are tailored for using Jupyter Notebooks, as well as extensions
|
||||
for other parts of the data science stack.
|
||||
- **JupyterHub** is an application that manages interactive computing sessions for **multiple users**.
|
||||
It also connects users with infrastructure they wish to access. It can provide
|
||||
remote access to Jupyter Notebooks and JupyterLab for many people.
|
||||
|
||||
## For management
|
||||
|
||||
### Briefly, what problem does JupyterHub solve for us?
|
||||
|
||||
JupyterHub provides a shared platform for data science and collaboration.
|
||||
It allows users to utilize familiar data science workflows (such as the scientific Python stack,
|
||||
the R tidyverse, and Jupyter Notebooks) on institutional infrastructure. It also gives administrators
|
||||
some control over access to resources, security, environments, and authentication.
|
||||
|
||||
### Is JupyterHub mature? Why should we trust it?
|
||||
|
||||
Yes - the core JupyterHub application recently
|
||||
reached 1.0 status, and is considered stable and performant for most institutions.
|
||||
JupyterHub has also been deployed (along with other tools) to work on
|
||||
scalable infrastructure, large datasets, and high-performance computing.
|
||||
|
||||
### Who else uses JupyterHub?
|
||||
|
||||
JupyterHub is used at a variety of institutions in academia,
|
||||
industry, and government research labs. It is most-commonly used by two kinds of groups:
|
||||
|
||||
- Small teams (e.g., data science teams, research labs, or collaborative projects) to provide a
|
||||
shared resource for interactive computing, collaboration, and analytics.
|
||||
- Large teams (e.g., a department, a large class, or a large group of remote users) to provide
|
||||
access to organizational hardware, data, and analytics environments at scale.
|
||||
|
||||
Here is a sample of organizations that use JupyterHub:
|
||||
|
||||
- **Universities and colleges**: UC Berkeley, UC San Diego, Cal Poly SLO, Harvard University, University of Chicago,
|
||||
University of Oslo, University of Sheffield, Université Paris Sud, University of Versailles
|
||||
- **Research laboratories**: NASA, NCAR, NOAA, the Large Synoptic Survey Telescope, Brookhaven National Lab,
|
||||
Minnesota Supercomputing Institute, ALCF, CERN, Lawrence Livermore National Laboratory
|
||||
- **Online communities**: Pangeo, Quantopian, mybinder.org, MathHub, Open Humans
|
||||
- **Computing infrastructure providers**: NERSC, San Diego Supercomputing Center, Compute Canada
|
||||
- **Companies**: Capital One, SANDVIK code, Globus
|
||||
|
||||
See the [Gallery of JupyterHub deployments](../gallery-jhub-deployments.md) for
|
||||
a more complete list of JupyterHub deployments at institutions.
|
||||
|
||||
### How does JupyterHub compare with hosted products, like Google Colaboratory, RStudio.cloud, or Anaconda Enterprise?
|
||||
|
||||
JupyterHub puts you in control of your data, infrastructure, and coding environment.
|
||||
In addition, it is vendor neutral, which reduces lock-in to a particular vendor or service.
|
||||
JupyterHub provides access to interactive computing environments in the cloud (similar to each of these services).
|
||||
Compared with the tools above, it is more flexible, more customizable, free, and
|
||||
gives administrators more control over their setup and hardware.
|
||||
|
||||
Because JupyterHub is an open-source, community-driven tool, it can be extended and
|
||||
modified to fit an institution's needs. It plays nicely with the open source data science
|
||||
stack, and can serve a variety of computing environments, user interfaces, and
|
||||
computational hardware. It can also be deployed anywhere - on enterprise cloud infrastructure, on
|
||||
High-Performance-Computing machines, on local hardware, or even on a single laptop, which
|
||||
is not possible with most other tools for shared interactive computing.
|
||||
|
||||
## For IT
|
||||
|
||||
### How would I set up JupyterHub on institutional hardware?
|
||||
|
||||
That depends on what kind of hardware you've got. JupyterHub is flexible enough to be deployed
|
||||
on a variety of hardware, including in-room hardware, on-prem clusters, cloud infrastructure,
|
||||
etc.
|
||||
|
||||
The most common way to set up a JupyterHub is to use a JupyterHub distribution, these are pre-configured
|
||||
and opinionated ways to set up a JupyterHub on particular kinds of infrastructure. The two distributions
|
||||
that we currently suggest are:
|
||||
|
||||
- [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) is a scalable JupyterHub deployment and
|
||||
guide that runs on Kubernetes. Better for larger or dynamic user groups (50-10,000) or more complex
|
||||
compute/data needs.
|
||||
- [The Littlest JupyterHub](https://tljh.jupyter.org) is a lightweight JupyterHub that runs on a single
|
||||
machine (in the cloud or under your desk). Better for smaller user groups (4-80) or more
|
||||
lightweight computational resources.
|
||||
|
||||
### Does JupyterHub run well in the cloud?
|
||||
|
||||
**Yes** - most deployments of JupyterHub are run via cloud infrastructure and on a variety of cloud providers.
|
||||
Depending on the distribution of JupyterHub that you'd like to use, you can also connect your JupyterHub
|
||||
deployment with a number of other cloud-native services so that users have access to other resources from
|
||||
their interactive computing sessions.
|
||||
|
||||
For example, if you use the [Zero to JupyterHub for Kubernetes](https://z2jh.jupyter.org) distribution,
|
||||
you'll be able to utilize container-based workflows of other technologies such as the [dask-kubernetes](https://kubernetes.dask.org/en/latest/)
|
||||
project for distributed computing.
|
||||
|
||||
The Z2JH Helm Chart also has some functionality built in for auto-scaling your cluster up and down
|
||||
as more resources are needed - allowing you to utilize the benefits of a flexible cloud-based deployment.
|
||||
|
||||
### Is JupyterHub secure?
|
||||
|
||||
The short answer: yes.
|
||||
JupyterHub as a standalone application has been battle-tested at an institutional
|
||||
level for several years, and makes a number of "default" security decisions that are reasonable for most
|
||||
users.
|
||||
|
||||
- For security considerations in the base JupyterHub application,
|
||||
[see the JupyterHub security page](https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html).
|
||||
- For security considerations when deploying JupyterHub on Kubernetes, see the
|
||||
[JupyterHub on Kubernetes security page](https://z2jh.jupyter.org/en/latest/security.html).
|
||||
|
||||
The longer answer: it depends on your deployment. Because JupyterHub is very flexible, it can be used
|
||||
in a variety of deployment setups. This often entails connecting your JupyterHub to **other** infrastructure
|
||||
(such as a [Dask Gateway service](https://gateway.dask.org/)). There are many security decisions to be made
|
||||
in these cases, and the security of your JupyterHub deployment will often depend on these decisions.
|
||||
|
||||
If you are worried about security, don't hesitate to reach out to the JupyterHub community in the
|
||||
[Jupyter Community Forum](https://discourse.jupyter.org/c/jupyterhub). This community of practice has many
|
||||
individuals with experience running secure JupyterHub deployments and will be very glad to help you out.
|
||||
|
||||
### Does JupyterHub provide computing or data infrastructure?
|
||||
|
||||
**No** - JupyterHub manages user sessions and can _control_ computing infrastructure, but it does not provide these
|
||||
things itself. You are expected to run JupyterHub on your own infrastructure (local or in the cloud). Moreover,
|
||||
JupyterHub has no internal concept of "data", but is designed to be able to communicate with data repositories
|
||||
(again, either locally or remotely) for use within interactive computing sessions.
|
||||
|
||||
### How do I manage users?
|
||||
|
||||
JupyterHub offers a few options for managing your users. Upon setting up a JupyterHub, you can choose what
|
||||
kind of **authentication** you'd like to use. For example, you can have users sign up with an institutional
|
||||
email address, or choose a username / password when they first log-in, or offload authentication onto
|
||||
another service such as an organization's OAuth.
|
||||
|
||||
The users of a JupyterHub are stored locally, and can be modified manually by an administrator of the JupyterHub.
|
||||
Moreover, the _active_ users on a JupyterHub can be found on the administrator's page. This page
|
||||
gives you the abiltiy to stop or restart kernels, inspect user filesystems, and even take over user
|
||||
sessions to assist them with debugging.
|
||||
|
||||
### How do I manage software environments?
|
||||
|
||||
A key benefit of JupyterHub is the ability for an administrator to define the environment(s) that users
|
||||
have access to. There are many ways to do this, depending on what kind of infrastructure you're using for
|
||||
your JupyterHub.
|
||||
|
||||
For example, **The Littlest JupyterHub** runs on a single VM. In this case, the administrator defines
|
||||
an environment by installing packages to a shared folder that exists on the path of all users. The
|
||||
**JupyterHub for Kubernetes** deployment uses Docker images to define environments. You can create your
|
||||
own list of Docker images that users can select from, and can also control things like the amount of
|
||||
RAM available to users, or the types of machines that their sessions will use in the cloud.
|
||||
|
||||
### How does JupyterHub manage computational resources?
|
||||
|
||||
For interactive computing sessions, JupyterHub controls computational resources via a **spawner**.
|
||||
Spawners define how a new user session is created, and are customized for particular kinds of
|
||||
infrastructure. For example, the KubeSpawner knows how to control a Kubernetes deployment
|
||||
to create new pods when users log in.
|
||||
|
||||
For more sophisticated computational resources (like distributed computing), JupyterHub can
|
||||
connect with other infrastructure tools (like Dask or Spark). This allows users to control
|
||||
scalable or high-performance resources from within their JupyterHub sessions. The logic of
|
||||
how those resources are controlled is taken care of by the non-JupyterHub application.
|
||||
|
||||
### Can JupyterHub be used with my high-performance computing resources?
|
||||
|
||||
Yes - JupyterHub can provide access to many kinds of computing infrastructure.
|
||||
Especially when combined with other open-source schedulers such as Dask, you can manage fairly
|
||||
complex computing infrastructures from the interactive sessions of a JupyterHub. For example
|
||||
[see the Dask HPC page](https://docs.dask.org/en/latest/setup/hpc.html).
|
||||
|
||||
### How much resources do user sessions take?
|
||||
|
||||
This is highly configurable by the administrator. If you wish for your users to have simple
|
||||
data analytics environments for prototyping and light data exploring, you can restrict their
|
||||
memory and CPU based on the resources that you have available. If you'd like your JupyterHub
|
||||
to serve as a gateway to high-performance computing or data resources, you may increase the
|
||||
resources available on user machines, or connect them with computing infrastructures elsewhere.
|
||||
|
||||
### Can I customize the look and feel of a JupyterHub?
|
||||
|
||||
JupyterHub provides some customization of the graphics displayed to users. The most common
|
||||
modification is to add custom branding to the JupyterHub login page, loading pages, and
|
||||
various elements that persist across all pages (such as headers).
|
||||
|
||||
## For Technical Leads
|
||||
|
||||
### Will JupyterHub “just work” with our team's interactive computing setup?
|
||||
|
||||
Depending on the complexity of your setup, you'll have different experiences with "out of the box"
|
||||
distributions of JupyterHub. If all of the resources you need will fit on a single VM, then
|
||||
[The Littlest JupyterHub](https://tljh.jupyter.org) should get you up-and-running within
|
||||
a half day or so. For more complex setups, such as scalable Kubernetes clusters or access
|
||||
to high-performance computing and data, it will require more time and expertise with
|
||||
the technologies your JupyterHub will use (e.g., dev-ops knowledge with cloud computing).
|
||||
|
||||
In general, the base JupyterHub deployment is not the bottleneck for setup, it is connecting
|
||||
your JupyterHub with the various services and tools that you wish to provide to your users.
|
||||
|
||||
### How well does JupyterHub scale? What are JupyterHub's limitations?
|
||||
|
||||
JupyterHub works well at both a small scale (e.g., a single VM or machine) as well as a
|
||||
high scale (e.g., a scalable Kubernetes cluster). It can be used for teams as small as 2, and
|
||||
for user bases as large as 10,000. The scalability of JupyterHub largely depends on the
|
||||
infrastructure on which it is deployed. JupyterHub has been designed to be lightweight and
|
||||
flexible, so you can tailor your JupyterHub deployment to your needs.
|
||||
|
||||
### Is JupyterHub resilient? What happens when a machine goes down?
|
||||
|
||||
For JupyterHubs that are deployed in a containerized environment (e.g., Kubernetes), it is
|
||||
possible to configure the JupyterHub to be fairly resistant to failures in the system.
|
||||
For example, if JupyterHub fails, then user sessions will not be affected (though new
|
||||
users will not be able to log in). When a JupyterHub process is restarted, it should
|
||||
seamlessly connect with the user database and the system will return to normal.
|
||||
Again, the details of your JupyterHub deployment (e.g., whether it's deployed on a scalable cluster)
|
||||
will affect the resiliency of the deployment.
|
||||
|
||||
### What interfaces does JupyterHub support?
|
||||
|
||||
Out of the box, JupyterHub supports a variety of popular data science interfaces for user sessions,
|
||||
such as JupyterLab, Jupyter Notebooks, and RStudio. Any interface that can be served
|
||||
via a web address can be served with a JupyterHub (with the right setup).
|
||||
|
||||
### Does JupyterHub make it easier for our team to collaborate?
|
||||
|
||||
JupyterHub provides a standardized environment and access to shared resources for your teams.
|
||||
This greatly reduces the cost associated with sharing analyses and content with other team
|
||||
members, and makes it easier to collaborate and build off of one another's ideas. Combined with
|
||||
access to high-performance computing and data, JupyterHub provides a common resource to
|
||||
amplify your team's ability to prototype their analyses, scale them to larger data, and then
|
||||
share their results with one another.
|
||||
|
||||
JupyterHub also provides a computational framework to share computational narratives between
|
||||
different levels of an organization. For example, data scientists can share Jupyter Notebooks
|
||||
rendered as [Voilà dashboards](https://voila.readthedocs.io/en/stable/) with those who are not
|
||||
familiar with programming, or create publicly-available interactive analyses to allow others to
|
||||
interact with your work.
|
||||
|
||||
### Can I use JupyterHub with R/RStudio or other languages and environments?
|
||||
|
||||
Yes, Jupyter is a polyglot project, and there are over 40 community-provided kernels for a variety
|
||||
of languages (the most common being Python, Julia, and R). You can also use a JupyterHub to provide
|
||||
access to other interfaces, such as RStudio, that provide their own access to a language kernel.
|
Reference in New Issue
Block a user