diff --git a/all-spark-notebook/README.md b/all-spark-notebook/README.md index f59011ff..8c808275 100644 --- a/all-spark-notebook/README.md +++ b/all-spark-notebook/README.md @@ -191,6 +191,15 @@ println(sc.master) val rdd = sc.parallelize(0 to 99999999) rdd.sum() ``` +## Connecting to a Spark Cluster on Standalone Mode + +Connection to Spark Cluster on Standalone Mode requires the following set of steps: + +0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark. +1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html). +2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).) + * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details. +3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077 ## Notebook Options diff --git a/pyspark-notebook/README.md b/pyspark-notebook/README.md index 315fe801..de2108bb 100644 --- a/pyspark-notebook/README.md +++ b/pyspark-notebook/README.md @@ -82,6 +82,16 @@ To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON` Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :) +## Connecting to a Spark Cluster on Standalone Mode + +Connection to Spark Cluster on Standalone Mode requires the following set of steps: + +0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark. +1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html). +2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).) + * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details. +3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077 + ## Notebook Options You can pass [Jupyter command line options](http://jupyter.readthedocs.org/en/latest/config.html#command-line-arguments) through the [`start-notebook.sh` command](https://github.com/jupyter/docker-stacks/blob/master/minimal-notebook/start-notebook.sh#L15) when launching the container. For example, to set the base URL of the notebook server you might do the following: