First draft configuration page

This commit is contained in:
Peter Parente
2018-03-24 23:24:46 -04:00
parent 51ab78f4a9
commit 08560d2fce
2 changed files with 139 additions and 52 deletions

View File

@@ -1,44 +1,46 @@
# Options and Configuration # Options and Configuration
When launched as Docker containers, all of the Jupyter Docker Stacks start a Jupyter Notebook server by default. They do so by executing a `start-notebook.sh` script which configures the internal container environment and then runs `jupyter notebook $*`, passing it any command line arguments received.
This page describes the options supported by the startup script as well as how to bypass it to run alternative commands.
## Notebook Options ## Notebook Options
The Docker container executes a `start-notebook.sh` script script by default. The `start-notebook.sh` script handles the `NB_UID`, `NB_GID` and `GRANT_SUDO` features documented in the next section, and then executes the `jupyter notebook`. You can pass [Jupyter command line options](https://jupyter.readthedocs.io/en/latest/projects/jupyter-command.html) to the `start-notebook.sh` script when launching the container. For example, to secure the Notebook server with a custom password hashed using `IPython.lib.passwd()` instead of the default token, you can run the following:
You can pass [Jupyter command line options](https://jupyter.readthedocs.io/en/latest/projects/jupyter-command.html) through the `start-notebook.sh` script when launching the container. For example, to secure the Notebook server with a custom password hashed using `IPython.lib.passwd()` instead of the default token, run the following:
``` ```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.password='sha1:74ba40f8a388:c913541b7ee99d15d5ed31d4226bf7838f83a50e' docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.password='sha1:74ba40f8a388:c913541b7ee99d15d5ed31d4226bf7838f83a50e'
``` ```
For example, to set the base URL of the notebook server, run the following: For example, to set the base URL of the notebook server, you can run the following:
``` ```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.base_url=/some/path docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.base_url=/some/path
``` ```
For example, to disable all authentication mechanisms (which is not a recommended practice): For example, to ignore best practice and disable all authentication, you can run the following:
``` ```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.token='' docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.token=''
``` ```
You can sidestep the `start-notebook.sh` script and run your own commands in the container. See the *Alternative Commands* section later in this document for more information.
## Docker Options ## Docker Options
You may customize the execution of the Docker container and the command it is running with the following optional arguments. You may instruct the `start-notebook.sh` script to customize the container environment before launching
the notebook server. You do so by passing arguments to the `docker run` command.
* `-e GEN_CERT=yes` - Generates a self-signed SSL certificate and configures Jupyter Notebook to use it to accept encrypted HTTPS connections. * `-e NB_USER=jovyan` - Instructs the startup script to change the default container username from `jovyan` to the provided value. Causes the script to rename the `jovyan` user home folder.
* `-e NB_UID=1000` - Specify the uid of the `jovyan` user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adjusting the user id.) * `-e NB_UID=1000` - Instructs the startup script to switch the numeric user ID of `$NB_USER` to the given value. This feature is useful when mounting host volumes with specific owner permissions. For this option to take effect, you must run the container with `--user root`. (The startup script will `su $NB_USER` after adjusting the user ID.)
* `-e NB_GID=100` - Specify the gid of the `jovyan` user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adjusting the group id.) * `-e NB_GID=100` - Instructs the startup script to change the numeric group ID of the `$NB_USER` to the given value. This feature is useful when mounting host volumes with specific group permissions. For this option to take effect, you must run the container with `--user root`. (The startup script will `su $NB_USER` after adjusting the group ID.)
* `-e GRANT_SUDO=yes` - Gives the `jovyan` user passwordless `sudo` capability. Useful for installing OS packages. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adding `jovyan` to sudoers.) **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** * `-e CHOWN_HOME=yes` - Instructs the startup script to recursively change the `$NB_USER` home directory owner and group to the current value of `$NB_UID` and `$NB_GID`. This change will take effect even if the user home directory is mounted from the host using `-v` as described below.
* `-e GRANT_SUDO=yes` - Instructs the startup script to grant the `NB_USER` user passwordless `sudo` capability. You do **not** need too this option to allow the user to `conda` or `pip` install additional packages. This option is useful, however, when you wish to give `$NB_USER` the ability to install OS packages with `apt` or modify other root-owned files in the container. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su $NB_USER` after adding `$NB_USER` to sudoers.) **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.**
* `-e GEN_CERT=yes` - Instructs the startup script to generates a self-signed SSL certificate and configure Jupyter Notebook to use it to accept encrypted HTTPS connections.
* `-v /some/host/folder/for/work:/home/jovyan/work` - Mounts a host machine directory as folder in the container. Useful when you want to preserve notebooks and other work even after the container is destroyed. **You must grant the within-container notebook user or group (`NB_UID` or `NB_GID`) write access to the host directory (e.g., `sudo chown 1000 /some/host/folder/for/work`).** * `-v /some/host/folder/for/work:/home/jovyan/work` - Mounts a host machine directory as folder in the container. Useful when you want to preserve notebooks and other work even after the container is destroyed. **You must grant the within-container notebook user or group (`NB_UID` or `NB_GID`) write access to the host directory (e.g., `sudo chown 1000 /some/host/folder/for/work`).**
* `--group-add users` - use this argument if you are also specifying * `-user 5000 --group-add users` - Launches the container with a specific user ID and adds that user to the `users` group so that it can modify files in the default home directory and `/opt/conda`. You can use these arguments as alternatives to setting `$NB_UID` and `$NB_GID`.
a specific user id to launch the container (`-u 5000`), rather than launching the container as root and relying on *NB_UID* and *NB_GID* to set the user and group.
## SSL Certificates ## SSL Certificates
You may mount SSL key and certificate files into a container and configure Jupyter Notebook to use them to accept HTTPS connections. For example, to mount a host folder containing a `notebook.key` and `notebook.crt`: You may mount SSL key and certificate files into a container and configure Jupyter Notebook to use them to accept HTTPS connections. For example, to mount a host folder containing a `notebook.key` and `notebook.crt` and use them, you might run the following:
``` ```
docker run -d -p 8888:8888 \ docker run -d -p 8888:8888 \
@@ -65,23 +67,11 @@ For additional information about using SSL, see the following:
* The [jupyter_notebook_config.py](jupyter_notebook_config.py) file for how this Docker image generates a self-signed certificate. * The [jupyter_notebook_config.py](jupyter_notebook_config.py) file for how this Docker image generates a self-signed certificate.
* The [Jupyter Notebook documentation](https://jupyter-notebook.readthedocs.io/en/latest/public_server.html#securing-a-notebook-server) for best practices about securing a public notebook server in general. * The [Jupyter Notebook documentation](https://jupyter-notebook.readthedocs.io/en/latest/public_server.html#securing-a-notebook-server) for best practices about securing a public notebook server in general.
## Conda Environments
The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. The `/opt/conda/bin` directory is part of the default `jovyan` user's `$PATH`. That directory is also whitelisted for use in `sudo` commands by the `start.sh` script.
The `jovyan` user has full read/write access to the `/opt/conda` directory. You can use either `conda` or `pip` to install new packages without any additional permissions.
```
# install a package into the default (python 3.x) environment
pip install some-package
conda install some-package
```
## Alternative Commands ## Alternative Commands
### start.sh ### start.sh
The `start.sh` script supports the same features as the default `start-notebook.sh` script (e.g., `GRANT_SUDO`), but allows you to specify an arbitrary command to execute. For example, to run the text-based `ipython` console in a container, do the following: The `start-notebook.sh` script actually inherits most of its option handling capability from a more generic `start.sh` script. The `start.sh` script supports all of the features described above, but allows you to specify an arbitrary command to execute. For example, to run the text-based `ipython` console in a container, do the following:
``` ```
docker run -it --rm jupyter/base-notebook start.sh ipython docker run -it --rm jupyter/base-notebook start.sh ipython
@@ -97,21 +87,29 @@ This script is particularly useful when you derive a new Dockerfile from this im
### Others ### Others
You can bypass the provided scripts and specify your an arbitrary start command. If you do, keep in mind that certain features documented above will not function (e.g., `GRANT_SUDO`). You can bypass the provided scripts and specify your an arbitrary start command. If you do, keep in mind that features supported by the `start.sh` script and its kin will not function (e.g., `GRANT_SUDO`).
## Image Specifics ## Conda Environments
### Spark and PySpark The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. The `/opt/conda/bin` directory is part of the default `jovyan` user's `$PATH`. That directory is also whitelisted for use in `sudo` commands by the `start.sh` script.
#### Using Spark Local Mode The `jovyan` user has full read/write access to the `/opt/conda` directory. You can use either `conda` or `pip` to install new packages without any additional permissions.
This configuration is nice for using Spark on small, local data. ```
# install a package into the default (python 3.x) environment
pip install some-package
conda install some-package
```
0. Run the container as shown above. ## Apache Spark
2. Open a Python 2 or 3 notebook.
3. Create a `SparkContext` configured for local mode.
For example, the first few cells in the notebook might read: The `jupyter/pyspark-notebook` and `jupyter/all-spark-notebook` images support the use of Apache Spark in Python, R, and Scala notebooks. The following sections provide some examples of how to get started using them.
### Using Spark Local Mode
Spark local mode is useful for experimentation on small data when you do not have a Spark cluster available.
#### In a Python Notebook
```python ```python
import pyspark import pyspark
@@ -122,20 +120,54 @@ rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5) rdd.takeSample(False, 5)
``` ```
#### In a R Notebook
```r
library(SparkR)
as <- sparkR.session("local[*]")
# do something to prove it works
df <- as.DataFrame(iris)
head(filter(df, df$Petal_Width > 0.2))
```
#### In a Spylon Kernel Scala Notebook
Spylon kernel instantiates a `SparkContext` for you in variable `sc` after you configure Spark options in a `%%init_spark` magic cell.
```python
%%init_spark
# Configure Spark to use a local master
launcher.master = "local[*]"
```
```scala
// Now run Scala code that uses the initialized SparkContext in sc
val rdd = sc.parallelize(0 to 999)
rdd.takeSample(false, 5)
```
#### In an Apache Toree Scala Notebook
Apache Toree instantiates a local `SparkContext` for you in variable `sc` when the kernel starts.
```scala
val rdd = sc.parallelize(0 to 999)
rdd.takeSample(false, 5)
```
### Connecting to a Spark Cluster on Mesos ### Connecting to a Spark Cluster on Mesos
This configuration allows your compute cluster to scale with your data. This configuration allows your compute cluster to scale with your data.
0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html). 0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html).
1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `jovyan` user on every slave node. 1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `$NB_USER` account on every slave node.
2. Ensure Python 2.x and/or 3.x and any Python libraries you wish to use in your Spark lambda functions are installed on your Spark workers. 2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
3. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
* NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details. * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
4. Open a Python 2 or 3 notebook. 3. Follow the language specific instructions below.
5. Create a `SparkConf` instance in a new notebook pointing to your Mesos master node (or Zookeeper instance) and Spark binary package location.
6. Create a `SparkContext` using this configuration.
For example, the first few cells in a Python 3 notebook might read: #### In a Python Notebook
```python ```python
import os import os
@@ -149,7 +181,7 @@ conf = pyspark.SparkConf()
conf.setMaster("mesos://10.10.10.10:5050") conf.setMaster("mesos://10.10.10.10:5050")
# point to spark binary package in HDFS or on local filesystem on all slave # point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz) # nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.122.193.209/spark/spark-2.2.0-bin-hadoop2.7.tgz") conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz")
# set other options as desired # set other options as desired
conf.set("spark.executor.memory", "8g") conf.set("spark.executor.memory", "8g")
conf.set("spark.core.connection.ack.wait.timeout", "1200") conf.set("spark.core.connection.ack.wait.timeout", "1200")
@@ -162,13 +194,68 @@ rdd = sc.parallelize(range(100000000))
rdd.sumApprox(3) rdd.sumApprox(3)
``` ```
To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON` environment variable to point to the location of the Python 2.x interpreter binary. If you leave this environment variable unset, it defaults to `python`. #### In a R Notebook
Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :) ```r
library(SparkR)
#### Connecting to a Spark Cluster in Standalone Mode # Point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
# Point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz) in sparkEnvir
# Set other options in sparkEnvir
sc <- sparkR.session("mesos://10.10.10.10:5050", sparkEnvir=list(
spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz",
spark.executor.memory="8g"
)
)
Connection to Spark Cluster in Standalone Mode requires the following set of steps: # do something to prove it works
data(iris)
df <- as.DataFrame(iris)
head(filter(df, df$Petal_Width > 0.2))
```
#### In a Spylon Kernel Scala Notebook
```python
%%init_spark
# Configure the location of the mesos master and spark distribution on HDFS
launcher.master = "mesos://10.10.10.10:5050"
launcher.conf.spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz
```
```scala
// Now run Scala code that uses the initialized SparkContext in sc
val rdd = sc.parallelize(0 to 999)
rdd.takeSample(false, 5)
```
#### In an Apache Toree Scala Notebook
The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environment variables. You can pass information about your Mesos cluster via the `SPARK_OPTS` environment variable when you spawn a container.
For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
```
docker run -d -p 8888:8888 -e SPARK_OPTS='--master=mesos://10.10.10.10:5050 \
--spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz \
--spark.executor.memory=8g' jupyter/all-spark-notebook
```
Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:
```scala
// should print the value of --master in the kernel spec
println(sc.master)
// do something to prove it works
val rdd = sc.parallelize(0 to 99999999)
rdd.sum()
```
### Connecting to a Spark Cluster on Standalone Mode
Connection to Spark Cluster on Standalone Mode requires the following set of steps:
0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark. 0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark.
1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html). 1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).

View File

@@ -22,7 +22,7 @@ The Jupyter team maintains a set of Docker image definitions in the [https://git
`jupyter/base-notebook` is a small image supporting the [options common across all core stacks](configuration.html). It is the basis for all other stacks. `jupyter/base-notebook` is a small image supporting the [options common across all core stacks](configuration.html). It is the basis for all other stacks.
* Minimally-functional Jupyter Notebook server (e.g., no [pandoc](https://pandoc.org/) for saving notebooks as PDFs) * Minimally-functional Jupyter Notebook server (e.g., no [pandoc](https://pandoc.org/) for saving notebooks as PDFs)
* [Miniconda](https://conda.io/miniconda.html) Python 3.x * [Miniconda](https://conda.io/miniconda.html) Python 3.x in `/opt/conda`
* No preinstalled scientific computing packages * No preinstalled scientific computing packages
* Unprivileged user `jovyan` (`uid=1000`, configurable, see options) in group `users` (`gid=100`) with ownership over the `/home/jovyan` and `/opt/conda` paths * Unprivileged user `jovyan` (`uid=1000`, configurable, see options) in group `users` (`gid=100`) with ownership over the `/home/jovyan` and `/opt/conda` paths
* `tini` as the container entrypoint and a `start-notebook.sh` script as the default command * `tini` as the container entrypoint and a `start-notebook.sh` script as the default command
@@ -251,7 +251,7 @@ notebook
### Using Binder ### Using Binder
[Binder](https://mybinder.org/) is a service that allows you to create and share custom computing environments for projects in version control. You can use any of the Jupyter Docker Stack images as a basis for a Binder-compatible Dockerfile. See the [docker-stacks example](https://mybinder.readthedocs.io/en/latest/sample_repos.html#using-a-docker-image-from-the-jupyter-docker-stacks-repository) and [Using a Dockerfile](https://mybinder.readthedocs.io/en/latest/dockerfile.html) sections in the [Binder documentation](https://mybinder.readthedocs.io/en/latest/index.html) for instructions. [Binder](https://mybinder.org/) is a service that allows you to create and share custom computing environments for projects in version control. You can use any of the Jupyter Docker Stacks images as a basis for a Binder-compatible Dockerfile. See the [docker-stacks example](https://mybinder.readthedocs.io/en/latest/sample_repos.html#using-a-docker-image-from-the-jupyter-docker-stacks-repository) and [Using a Dockerfile](https://mybinder.readthedocs.io/en/latest/dockerfile.html) sections in the [Binder documentation](https://mybinder.readthedocs.io/en/latest/index.html) for instructions.
### Using JupyterHub ### Using JupyterHub