From 8341ed5268735fd7bcd307aa8e028c01d514ad71 Mon Sep 17 00:00:00 2001 From: Peter Parente Date: Mon, 11 Apr 2016 14:07:59 -0400 Subject: [PATCH] Update README for current Apache Toree Never got updated when Toree started getting pip installed (c) Copyright IBM Corp. 2016 --- all-spark-notebook/README.md | 44 +++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/all-spark-notebook/README.md b/all-spark-notebook/README.md index 5b70d8fa..d750c175 100644 --- a/all-spark-notebook/README.md +++ b/all-spark-notebook/README.md @@ -67,10 +67,10 @@ df <- createDataFrame(sqlContext, iris) head(filter(df, df$Petal_Width > 0.2)) ``` -### In a Scala Notebook +### In an Apache Toree (Scala) Notebook 0. Run the container as shown above. -1. Open a Scala notebook. +1. Open an Apache Toree (Scala) notebook. 2. Use the pre-configured `SparkContext` in variable `sc`. For example: @@ -157,34 +157,42 @@ df <- createDataFrame(sqlContext, iris) head(filter(df, df$Petal_Width > 0.2)) ``` -### In a Scala Notebook +### In an Apache Toree (Scala) Notebook 0. Open a terminal via *New -> Terminal* in the notebook interface. -1. Add information about your cluster to the Scala kernel spec file in `~/.ipython/kernels/scala/kernel.json`. (See below.) -2. Open a Scala notebook. +1. Add information about your cluster to the Scala kernel spec file in `~/.local/share/jupyter/kernels/apache_toree/kernel.json`. (See below.) +2. Open an Apache Toree (Scala) notebook. 3. Use the pre-configured `SparkContext` in variable `sc`. -The Scala kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environments. Therefore, you must add it to the Scala kernel spec file. You cannot, at present, configure it yourself within a notebook. +The Apache Toree kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environments. Therefore, you must add it to the Toree kernel spec file. You cannot, at present, configure it yourself within a notebook. For instance, a kernel spec file with information about a Mesos master, Spark binary location in HDFS, and an executor option appears here: ``` { - "display_name": "Scala 2.10.4", - "language": "scala", - "argv": [ - "/opt/sparkkernel/bin/sparkkernel", - "--profile", - "{connection_file}", - "--master=mesos://10.10.10.10:5050" - ], - "env": { - "SPARK_CONFIGURATION": "spark.executor.memory=8g,spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz" - } + "codemirror_mode": "scala", + "display_name": "Apache_Toree", + "language_info": { + "name": "scala" + }, + "argv": [ + "/home/jovyan/.local/share/jupyter/kernels/apache_toree/bin/run.sh", + "--profile", + "{connection_file}" + ], + "env": { + "CAPTURE_STANDARD_ERR": "true", + "SPARK_HOME": "/usr/local/spark", + "SEND_EMPTY_OUTPUT": "false", + "SPARK_OPTS": "--master=mesos://10.10.10.10:5050 --driver-java-options=-Xms1024M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --spark.executor.memory=8g --spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.6.0-bin-hadoop2.6.tgz", + "CAPTURE_STANDARD_OUT": "true", + "PYTHONPATH": "/usr/local/spark/python:/usr/local/spark/python/lib/py4j-0.9-src.zip", + "MAX_INTERPRETER_THREADS": "16" + } } ``` -Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in a Scala notebook like so: +Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so: ``` // should print the value of --master in the kernel spec