* add scala version choise
* add ; \ fi
* change checksum and removed default scala version
* remove RUN
* add { } and remove old code
* remove 3 duplicated lines.
* Add the commint as a comment
* Add back #Fix
* Rename downloadeds as spark.tgz
* Fix doc
* Update specifics.md
* New fix
* Fix wget
* Remove make link to spark
* Set full path to /usr/local/spark
* Change /usr/local/spark to ${SPARK_HOME}
* fix RUN with if
* Remove empty lines
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com>
* Remove scala
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove scala from web
* Remove scala from specifics
* Remove scala and spylon
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Spark installation improved by sourcing the `spark-config.sh` in the `before-notebook.d` hook that is run by `start.sh`. It permits to add automatically the right Py4J dependency version in the `PYTHONPATH`. So it is not needed anymore to set this variable at build time.
Documentation describing the installation of a custom Spark version modified to remove this step. Also updated to install the latest `2.x` Spark version.
`test_pyspark` fixed (was always OK before that).
Allow to build `pyspark-notebook` image with an alternative Spark version.
- Define arguments for Spark installation
- Add a note in "Image Specifics" explaining how to build an image with an alternative Spark version
- Remove Toree documentation from "Image Specifics" since its support has been droped in #1115
* Test added for all kernels
* Same examples as provided in the documentation (`specifics.md`)
* Used the same use case for all examples: sum of the first 100 whole numbers
Note: I've not automatically tested `local_sparklyr.ipynb` since it creates by default the `metastore_db` dir and the `derby.log` file in the working directory. Since I mount it in `RO` it's not working. I'm struggling to set it elsewhere...
Some changes to the Spark documentation
for local and standalone use cases with the following drivers
* Simplify some of them (removing options, etc.)
* Use the same code as much as possible in each example to be consistent (only kept R different from the others)
* Add Sparklyr as an option for R
* Add some notes about prerequisites (same version of Python, R installed on workers)