Added instructions for connecting to spark on standalone mode.

2025-10-14 21:42:57 +00:00 · 2016-03-07 14:57:44 +02:00
parent 4a4937f334
commit 4597488bc8
2 changed files with 19 additions and 0 deletions
--- a/pyspark-notebook/README.md
+++ b/pyspark-notebook/README.md
@@ -82,6 +82,16 @@ To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON`

 Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :)

+## Connecting to a Spark Cluster on Standalone Mode
+
+Connection to Spark Cluster on Standalone Mode requires the following set of steps:
+
+0. Verify that the docker image (check the Dockerfile) and the Spark Cluster which is being deployed, run the same version of Spark.
+1. [Deploy Spark on Standalone Mode](http://spark.apache.org/docs/latest/spark-standalone.html).
+2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).)
+    * NOTE: When using `--net=host`, you must also use the flags `--pid=host -e TINI_SUBREAPER=true`. See https://github.com/jupyter/docker-stacks/issues/64 for details.
+3. The language specific instructions are almost same as mentioned above for Mesos, only the master url would now be something like spark://10.10.10.10:7077
+
 ## Notebook Options

 You can pass [Jupyter command line options](http://jupyter.readthedocs.org/en/latest/config.html#command-line-arguments) through the [`start-notebook.sh` command](https://github.com/jupyter/docker-stacks/blob/master/minimal-notebook/start-notebook.sh#L15) when launching the container. For example, to set the base URL of the notebook server you might do the following: