Improve spark installation

Spark installation improved by sourcing the `spark-config.sh` in the `before-notebook.d` hook that is run by `start.sh`. It permits to add automatically the right Py4J dependency version in the `PYTHONPATH`. So it is not needed anymore to set this variable at build time.

Documentation describing the installation of a custom Spark version modified to remove this step. Also updated to install the latest `2.x` Spark version.

`test_pyspark` fixed (was always OK before that).
This commit is contained in:
romainx
2020-11-24 20:40:06 +01:00
parent e7f4ca6495
commit 1dca39182b
3 changed files with 17 additions and 39 deletions

View File

@@ -12,19 +12,19 @@ def test_spark_shell(container):
tty=True,
command=['start.sh', 'bash', '-c', 'spark-shell <<< "1+1"']
)
c.wait(timeout=30)
c.wait(timeout=60)
logs = c.logs(stdout=True).decode('utf-8')
LOGGER.debug(logs)
assert 'res0: Int = 2' in logs
assert 'res0: Int = 2' in logs, "spark-shell does not work"
def test_pyspark(container):
"""PySpark should be in the Python path"""
c = container.run(
tty=True,
command=['start.sh', 'python', '-c', '"import pyspark"']
command=['start.sh', 'python', '-c', 'import pyspark']
)
rv = c.wait(timeout=30)
assert rv == 0 or rv["StatusCode"] == 0
assert rv == 0 or rv["StatusCode"] == 0, "pyspark not in PYTHONPATH"
logs = c.logs(stdout=True).decode('utf-8')
LOGGER.debug(logs)