Outline, include some info from READMEs

This commit is contained in:
Peter Parente
2017-12-30 22:13:32 -05:00
parent 118198ac65
commit f70d52da58
6 changed files with 312 additions and 31 deletions

View File

@@ -1,8 +1,8 @@
# Copyright (c) Jupyter Development Team. # Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License. # Distributed under the terms of the Modified BSD License.
.PHONY: help test .PHONY: docs help test
# Use bash for inline if-statements in test target # Use bash for inline if-statements in arch_patch target
SHELL:=bash SHELL:=bash
OWNER:=jupyter OWNER:=jupyter
ARCH:=$(shell uname -m) ARCH:=$(shell uname -m)
@@ -54,9 +54,12 @@ dev/%: PORT?=8888
dev/%: ## run a foreground container for a stack dev/%: ## run a foreground container for a stack
docker run -it --rm -p $(PORT):8888 $(DARGS) $(OWNER)/$(notdir $@) $(ARGS) docker run -it --rm -p $(PORT):8888 $(DARGS) $(OWNER)/$(notdir $@) $(ARGS)
dev-env: # install libraries required to build docs and run tests dev-env: ## install libraries required to build docs and run tests
conda env create -f environment.yml conda env create -f environment.yml
docs: ## build HTML documentation
make -C docs html
test/%: ## run tests against a stack test/%: ## run tests against a stack
@TEST_IMAGE="$(OWNER)/$(notdir $@)" pytest test @TEST_IMAGE="$(OWNER)/$(notdir $@)" pytest test

View File

@@ -21,17 +21,23 @@
# import sys # import sys
# sys.path.insert(0, os.path.abspath('.')) # sys.path.insert(0, os.path.abspath('.'))
# For conversion from markdown to html
import recommonmark.parser
from recommonmark.transform import AutoStructify
# -- General configuration ------------------------------------------------ # -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here. # If your documentation needs a minimal Sphinx version, state it here.
# #
# needs_sphinx = '1.0' needs_sphinx = '1.4'
# Add any Sphinx extension module names here, as strings. They can be # Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones. # ones.
extensions = [] extensions = [
'jupyter_alabaster_theme'
]
# Add any paths that contain templates here, relative to this directory. # Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates'] templates_path = ['_templates']
@@ -78,12 +84,20 @@ pygments_style = 'sphinx'
todo_include_todos = False todo_include_todos = False
# -- Source -------------------------------------------------------------
source_parsers = {
'.md': 'recommonmark.parser.CommonMarkParser',
}
source_suffix = ['.rst', '.md']
# -- Options for HTML output ---------------------------------------------- # -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for # The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes. # a list of builtin themes.
# #
html_theme = 'alabaster' html_theme = 'jupyter_alabaster_theme'
# Theme options are theme-specific and customize the look and feel of a theme # Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the # further. For a list of options available for each theme, see the
@@ -96,18 +110,6 @@ html_theme = 'alabaster'
# so a file named "default.css" will overwrite the builtin "default.css". # so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static'] html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# This is required for the alabaster theme
# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars
html_sidebars = {
'**': [
'relations.html', # needs 'show_related': True theme option to display
'searchbox.html',
]
}
# -- Options for HTMLHelp output ------------------------------------------ # -- Options for HTMLHelp output ------------------------------------------

177
docs/configuration.md Normal file
View File

@@ -0,0 +1,177 @@
# Options and Configuration
## Notebook Options
The Docker container executes a [`start-notebook.sh` script](./start-notebook.sh) script by default. The `start-notebook.sh` script handles the `NB_UID`, `NB_GID` and `GRANT_SUDO` features documented in the next section, and then executes the `jupyter notebook`.
You can pass [Jupyter command line options](https://jupyter.readthedocs.io/en/latest/projects/jupyter-command.html) through the `start-notebook.sh` script when launching the container. For example, to secure the Notebook server with a custom password hashed using `IPython.lib.passwd()` instead of the default token, run the following:
```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.password='sha1:74ba40f8a388:c913541b7ee99d15d5ed31d4226bf7838f83a50e'
```
For example, to set the base URL of the notebook server, run the following:
```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.base_url=/some/path
```
For example, to disable all authentication mechanisms (which is not a recommended practice):
```
docker run -d -p 8888:8888 jupyter/base-notebook start-notebook.sh --NotebookApp.token=''
```
You can sidestep the `start-notebook.sh` script and run your own commands in the container. See the *Alternative Commands* section later in this document for more information.
## Docker Options
You may customize the execution of the Docker container and the command it is running with the following optional arguments.
* `-e GEN_CERT=yes` - Generates a self-signed SSL certificate and configures Jupyter Notebook to use it to accept encrypted HTTPS connections.
* `-e NB_UID=1000` - Specify the uid of the `jovyan` user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adjusting the user id.)
* `-e NB_GID=100` - Specify the gid of the `jovyan` user. Useful to mount host volumes with specific file ownership. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adjusting the group id.)
* `-e GRANT_SUDO=yes` - Gives the `jovyan` user passwordless `sudo` capability. Useful for installing OS packages. For this option to take effect, you must run the container with `--user root`. (The `start-notebook.sh` script will `su jovyan` after adding `jovyan` to sudoers.) **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.**
* `-v /some/host/folder/for/work:/home/jovyan/work` - Mounts a host machine directory as folder in the container. Useful when you want to preserve notebooks and other work even after the container is destroyed. **You must grant the within-container notebook user or group (`NB_UID` or `NB_GID`) write access to the host directory (e.g., `sudo chown 1000 /some/host/folder/for/work`).**
* `--group-add users` - use this argument if you are also specifying
a specific user id to launch the container (`-u 5000`), rather than launching the container as root and relying on *NB_UID* and *NB_GID* to set the user and group.
## SSL Certificates
You may mount SSL key and certificate files into a container and configure Jupyter Notebook to use them to accept HTTPS connections. For example, to mount a host folder containing a `notebook.key` and `notebook.crt`:
```
docker run -d -p 8888:8888 \
-v /some/host/folder:/etc/ssl/notebook \
jupyter/base-notebook start-notebook.sh \
--NotebookApp.keyfile=/etc/ssl/notebook/notebook.key
--NotebookApp.certfile=/etc/ssl/notebook/notebook.crt
```
Alternatively, you may mount a single PEM file containing both the key and certificate. For example:
```
docker run -d -p 8888:8888 \
-v /some/host/folder/notebook.pem:/etc/ssl/notebook.pem \
jupyter/base-notebook start-notebook.sh \
--NotebookApp.certfile=/etc/ssl/notebook.pem
```
In either case, Jupyter Notebook expects the key and certificate to be a base64 encoded text file. The certificate file or PEM may contain one or more certificates (e.g., server, intermediate, and root).
For additional information about using SSL, see the following:
* The [docker-stacks/examples](https://github.com/jupyter/docker-stacks/tree/master/examples) for information about how to use [Let's Encrypt](https://letsencrypt.org/) certificates when you run these stacks on a publicly visible domain.
* The [jupyter_notebook_config.py](jupyter_notebook_config.py) file for how this Docker image generates a self-signed certificate.
* The [Jupyter Notebook documentation](https://jupyter-notebook.readthedocs.io/en/latest/public_server.html#securing-a-notebook-server) for best practices about securing a public notebook server in general.
## Conda Environments
The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. The `/opt/conda/bin` directory is part of the default `jovyan` user's `$PATH`. That directory is also whitelisted for use in `sudo` commands by the `start.sh` script.
The `jovyan` user has full read/write access to the `/opt/conda` directory. You can use either `conda` or `pip` to install new packages without any additional permissions.
```
# install a package into the default (python 3.x) environment
pip install some-package
conda install some-package
```
## Alternative Commands
### start.sh
The `start.sh` script supports the same features as the default `start-notebook.sh` script (e.g., `GRANT_SUDO`), but allows you to specify an arbitrary command to execute. For example, to run the text-based `ipython` console in a container, do the following:
```
docker run -it --rm jupyter/base-notebook start.sh ipython
```
Or, to run JupyterLab instead of the classic notebook, run the following:
```
docker run -it --rm -p 8888:8888 jupyter/base-notebook start.sh jupyter lab
```
This script is particularly useful when you derive a new Dockerfile from this image and install additional Jupyter applications with subcommands like `jupyter console`, `jupyter kernelgateway`, etc.
### Others
You can bypass the provided scripts and specify your an arbitrary start command. If you do, keep in mind that certain features documented above will not function (e.g., `GRANT_SUDO`).
## Image Specifics
## Spark and PySpark
### Using Spark Local Mode
This configuration is nice for using Spark on small, local data.
0. Run the container as shown above.
2. Open a Python 2 or 3 notebook.
3. Create a `SparkContext` configured for local mode.
For example, the first few cells in the notebook might read:
```python
import pyspark
sc = pyspark.SparkContext('local[*]')
# do something to prove it works
rdd = sc.parallelize(range(1000))
rdd.takeSample(False, 5)
```
### Connecting to a Spark Cluster on Mesos
This configuration allows your compute cluster to scale with your data.
0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html).
1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `jovyan` user on every slave node.
2. Ensure Python 2.x and/or 3.x and any Python libraries you wish to use in your Spark lambda functions are installed on your Spark workers.
3. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
* NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
4. Open a Python 2 or 3 notebook.
5. Create a `SparkConf` instance in a new notebook pointing to your Mesos master node (or Zookeeper instance) and Spark binary package location.
6. Create a `SparkContext` using this configuration.
For example, the first few cells in a Python 3 notebook might read:
```python
import os
# make sure pyspark tells workers to use python3 not 2 if both are installed
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
import pyspark
conf = pyspark.SparkConf()
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
conf.setMaster("mesos://10.10.10.10:5050")
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.122.193.209/spark/spark-2.2.0-bin-hadoop2.7.tgz")
# set other options as desired
conf.set("spark.executor.memory", "8g")
conf.set("spark.core.connection.ack.wait.timeout", "1200")
# create the context
sc = pyspark.SparkContext(conf=conf)
# do something to prove it works
rdd = sc.parallelize(range(100000000))
rdd.sumApprox(3)
```
To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON` environment variable to point to the location of the Python 2.x interpreter binary. If you leave this environment variable unset, it defaults to `python`.
Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :)
## Connecting to a Spark Cluster on Standalone Mode
Connection to Spark Cluster on Standalone Mode requires the following set of steps:
0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark.
1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
* NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077

9
docs/contributing.md Normal file
View File

@@ -0,0 +1,9 @@
# Contributing
## Package Updates
## New Packages
## Tests
## Community Stacks

View File

@@ -1,20 +1,41 @@
.. docker-stacks documentation master file, created by Jupyter Docker Stacks
sphinx-quickstart on Fri Dec 29 20:32:10 2017. =====================
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to docker-stacks's documentation! Jupyter Docker Stacks are a set of ready-to-run Docker images containing Jupyter applications and interactive computing tools. You can use a stack image to start a personal Jupyter Notebook server in a local Docker container, to run JupyterLab servers for a team using JupyterHub, to write your own project Dockerfile, and so on.
=========================================
**Table of Contents**
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 1
:caption: Contents:
using
features
contributing
Quick Start
-----------
Indices and tables The examples below may help you get started if you have Docker installed, know which Docker image you want to use, and want to launch a single Jupyter Notebook server in a container. The other pages in this documentation describe additional uses and features in detail.::
==================
* :ref:`genindex` # Run a Jupyter Notebook server in a Docker container started
* :ref:`modindex` # from the jupyter/scipy-notebook image built from Git commit 27ba573.
* :ref:`search` # All files saved in the container are lost when the notebook server exits.
# -ti: pseudo-TTY+STDIN open, so the logs appear in the terminal
# -rm: remove the container on exit
# -p: publish the notebook port 8888 as port 8888 on the host
docker run -ti --rm -p 8888:8888 jupyter/scipy-notebook:27ba573
# Run a Jupyter Notebook server in a Docker container started from the
# jupyter/r-notebook image built from Git commit cf1a3aa.
# All files written to ~/work in the container are saved to the
# current working on the host and persist even when the notebook server
# exits.
docker run -ti --rm -p 8888:8888 -v "$PWD":/home/jovyan/work jupyter/r-notebook:cf1a3aa
# Run a Jupyter Notebook server in a background Docker container started
# from the latest jupyter/all-spark-notebook image available on the local
# machine or Docker Cloud. All files saved in the container are lost
# when the container is destroyed.
# -d: detach, run container in background.
# -P: Publish all exposed ports to random ports
docker run -d -P jupyter/all-spark-notebook:latest

69
docs/using.md Normal file
View File

@@ -0,0 +1,69 @@
# Users Guide
Using one of the Jupyter Docker Stacks requires two choices:
1. Which Docker image you wish to use
2. How you wish to start Docker containers from that image
This section provides details about the available images and runtimes to inform your choices.
## Selecting an Image
### Core Stacks
The Jupyter team maintains a set of Docker image definitions in the https://github.com/jupyter/docker-stacks GitHub repository. The following table describes these images, and links to their source on GitHub and their builds on Docker Cloud.
|Name |Description|GitHub |Image Tags|
|----------------------------|-----------|-----------|----------|
|jupyter/base-notebook ||||
|jupyter/minimal-notebook ||||
|jupyter/r-notebook ||||
|jupyter/scipy-notebook ||||
|jupyter/datascience-notebook||||
|jupyter/tensorflow-notebook ||||
|jupyter/pyspark-notebook ||||
|jupyter/all-spark-notebook ||||
|----------------------------|-|-|-|
#### Image Relationships
The following diagram depicts the build dependencies between the core images (aka the `FROM` statement in their Dockerfiles). Any image lower in the tree inherits
[![Image inheritance diagram](internal/inherit-diagram.svg)](http://interactive.blockdiag.com/?compression=deflate&src=eJyFzTEPgjAQhuHdX9Gws5sQjGzujsaYKxzmQrlr2msMGv-71K0srO_3XGud9NNA8DSfgzESCFlBSdi0xkvQAKTNugw4QnL6GIU10hvX-Zh7Z24OLLq2SjaxpvP10lX35vCf6pOxELFmUbQiUz4oQhYzMc3gCrRt2cWe_FKosmSjyFHC6OS1AwdQWCtyj7sfh523_BI9hKlQ25YdOFdv5fcH0kiEMA)
#### Versioning
[Click here for a commented build history of each image, with references to tag/SHA values.](https://github.com/jupyter/docker-stacks/wiki/Docker-build-history)
The following are quick-links to READMEs about each image and their Docker image tags on Docker Cloud:
* base-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/base-notebook), [SHA list](https://hub.docker.com/r/jupyter/base-notebook/tags/)
* minimal-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/minimal-notebook), [SHA list](https://hub.docker.com/r/jupyter/minimal-notebook/tags/)
* scipy-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/scipy-notebook), [SHA list](https://hub.docker.com/r/jupyter/scipy-notebook/tags/)
* r-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/r-notebook), [SHA list](https://hub.docker.com/r/jupyter/r-notebook/tags/)
* tensorflow-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebook), [SHA list](https://hub.docker.com/r/jupyter/tensorflow-notebook/tags/)
* datascience-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook), [SHA list](https://hub.docker.com/r/jupyter/datascience-notebook/tags/)
* pyspark-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook), [SHA list](https://hub.docker.com/r/jupyter/pyspark-notebook/tags/)
* all-spark-notebook: [README](https://github.com/jupyter/docker-stacks/tree/master/all-spark-notebook), [SHA list](https://hub.docker.com/r/jupyter/all-spark-notebook/tags/)
The latest tag in each Docker Hub repository tracks the master branch HEAD reference on GitHub. This is a moving target and will make backward-incompatible changes regularly.
Any 12-character image tag on Docker Hub refers to a git commit SHA here on GitHub. See the Docker build history wiki page for a table of build details.
Stack contents (e.g., new library versions) will be updated upon request via PRs against this project.
Users looking for reproducibility or stability should always refer to specific git SHA tagged images in their work, not latest.
For legacy reasons, there are two additional tags named 3.2 and 4.0 on Docker Hub which point to images prior to our versioning scheme switch.
If you haven't already, pin your image to a tag, e.g. FROM jupyter/scipy-notebook:7c45ec67c8e7. latest is a moving target which can change in backward-incompatible ways as packages and operating systems are updated.
## Community Stacks
The Jupyter community maintains additional
## Running a Container
### Using the Docker Command Line
### Using JupyterHub
Every notebook stack is compatible with JupyterHub 0.5 or higher. When running with JupyterHub, you must override the Docker run command to point to the `start-singleuser.sh script, which starts a single-user instance of the Notebook server. See each stack's README for instructions on running with JupyterHub.
### Using Binder