From e268ad01b3d3f4d7dd35adcd2730bf633c3122d6 Mon Sep 17 00:00:00 2001 From: mariusvniekerk Date: Fri, 6 Jan 2017 16:42:42 -0800 Subject: [PATCH] Add spylon-kernel to all-spark --- all-spark-notebook/Dockerfile | 4 ++++ all-spark-notebook/README.md | 16 +++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/all-spark-notebook/Dockerfile b/all-spark-notebook/Dockerfile index 47d09589..570cba52 100644 --- a/all-spark-notebook/Dockerfile +++ b/all-spark-notebook/Dockerfile @@ -31,3 +31,7 @@ RUN conda config --add channels r && \ # Apache Toree kernel RUN pip --no-cache-dir install https://dist.apache.org/repos/dist/dev/incubator/toree/0.2.0/snapshots/dev1/toree-pip/toree-0.2.0.dev1.tar.gz RUN jupyter toree install --sys-prefix + +# Spylon-kernel +RUN pip --no-cache-dir install metakernel spylon findspark spylon-kernel +RUN python -m spylon_kernel install --sys-prefix diff --git a/all-spark-notebook/README.md b/all-spark-notebook/README.md index 8f4d4a4f..6c68d2e3 100644 --- a/all-spark-notebook/README.md +++ b/all-spark-notebook/README.md @@ -8,11 +8,12 @@ * Jupyter Notebook 4.3.x * Conda Python 3.x and Python 2.7.x environments * Conda R 3.3.x environment -* Scala 2.10.x +* Scala 2.11.x * pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed for Python * ggplot2, rcurl preinstalled for R * Spark 2.0.2 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers * Mesos client 0.25 binary that can communicate with a Mesos master +* spylon-kernel * Unprivileged user `jovyan` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/jovyan` and `/opt/conda` * [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../base-notebook/start-notebook.sh) as the default command * A [start-singleuser.sh](../base-notebook/start-singleuser.sh) script useful for running a single-user instance of the Notebook server, as required by JupyterHub @@ -81,6 +82,19 @@ val rdd = sc.parallelize(0 to 999) rdd.takeSample(false, 5) ``` +### In spylon-kernel - Scala Notebook + +0. Run the container as shown above. +1. Open a spylon-kernel notebook +2. Lazily instantiate the sparkcontext by just running any cell without magics + +For example + +``` +val rdd = sc.parallelize(0 to 999) +rdd.takeSample(false, 5) +``` + ## Connecting to a Spark Cluster on Mesos This configuration allows your compute cluster to scale with your data.