Python version mismatch in Jupyter + Spark instance

Category: 
Resolution: 
Resolved
Affected Software: 

You may encounter the following errorĀ message when running a Spark instance using a custom kernel in the Jupyter + Spark app:

25/04/25 10:49:01 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (10.6.7.6 executor 22): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/apps/spack/0.21/ascend/linux-rhel9-zen2/spark/gcc/11.4.1/3.5.1-lbffccn/python/lib/pyspark.zip/pyspark/worker.py", line 1100, in main
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: [PYTHON_VERSION_MISMATCH] Python in worker has different version (3, 12) than that in driver 3.9, PySpark cannot run with different minor versions.
Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

This error indicates a mismatch between the Python version used in the Jupyter kernel and the Python version used by the Spark cluster. In this case, Python 3.12 was used by the Spark cluster, while the kernel was using Python 3.9. PySpark requires that both the driver and the worker use the same minor version of Python.

Workaround

You can tell PySpark to use a specific Python executable for both the driver and the executor nodes in a distributed Spark cluster by setting the PYSPARK_PYTHON environment variable at the beginning of the notebook:

import os, sys
# Ensure that PySpark uses the Python defined in the kernel
os.environ["PYSPARK_PYTHON"] = sys.executable

As an alternative, the Jupyter and Spark environment now provides the ability to choose a specific Python version. If you need a particular version that is not already selected, please contact OSC Help for assistance.

Available Python versions on each cluster

  • Pitzer: 3.10, 3.12
  • Ascend: 3.10, 3.12
  • Cardinal: 3.12