Upgrade Spark to 2.2.0

This commit is contained in:
uodna
2017-07-18 23:31:44 +09:00
parent c1b0cf6bf4
commit c740fbb1ca
3 changed files with 13 additions and 13 deletions

View File

@@ -11,7 +11,7 @@
* Scala 2.11.x
* pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed for Python
* ggplot2, rcurl preinstalled for R
* Spark 2.1.1 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Spark 2.2.0 with Hadoop 2.7 for use in local mode or to connect to a cluster of Spark workers
* Mesos client 1.2 binary that can communicate with a Mesos master
* spylon-kernel
* Unprivileged user `jovyan` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/jovyan` and `/opt/conda`
@@ -124,8 +124,8 @@ conf = pyspark.SparkConf()
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)
conf.setMaster("mesos://10.10.10.10:5050")
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.1.1-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz")
# nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz)
conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz")
# set other options as desired
conf.set("spark.executor.memory", "8g")
conf.set("spark.core.connection.ack.wait.timeout", "1200")
@@ -157,10 +157,10 @@ library(SparkR)
# point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)\
# as the first argument
# point to spark binary package in HDFS or on local filesystem on all slave
# nodes (e.g., file:///opt/spark/spark-2.1.1-bin-hadoop2.7.tgz) in sparkEnvir
# nodes (e.g., file:///opt/spark/spark-2.2.0-bin-hadoop2.7.tgz) in sparkEnvir
# set other options in sparkEnvir
sc <- sparkR.session("mesos://10.10.10.10:5050", sparkEnvir=list(
spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz",
spark.executor.uri="hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz",
spark.executor.memory="8g"
)
)
@@ -183,7 +183,7 @@ The Apache Toree kernel automatically creates a `SparkContext` when it starts ba
For instance, to pass information about a Mesos master, Spark binary location in HDFS, and an executor options, you could start the container like so:
`docker run -d -p 8888:8888 -e SPARK_OPTS '--master=mesos://10.10.10.10:5050 \
--spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.1.1-bin-hadoop2.7.tgz \
--spark.executor.uri=hdfs://10.10.10.10/spark/spark-2.2.0-bin-hadoop2.7.tgz \
--spark.executor.memory=8g' jupyter/all-spark-notebook`
Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in an Apache Toree notebook like so:
@@ -309,7 +309,7 @@ c.DockerSpawner.container_image = 'jupyter/all-spark-notebook'
# Have the Spawner override the Docker run command
c.DockerSpawner.extra_create_kwargs.update({
'command': '/usr/local/bin/start-singleuser.sh'
'command': '/usr/local/bin/start-singleuser.sh'
})
```