We've been experiencing some instability on the clusters (particularly Cardinal and Ascend). 

Python version mismatch in Jupyter + Spark instance

Category: 
Resolution: 
Resolved
Affected Software: 

You may encounter the following error message when running a Spark instance using a custom kernel in the Jupyter + Spark app:

25/04/25 10:49:01 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (10.6.7.6 executor 22): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/apps/spack/0.21/ascend/linux-rhel9-zen2/spark/gcc/11.4.1/3.5.1-lbffccn/python/lib/pyspark.zip/pyspark/worker.py", line 1100, in main
    raise PySparkRuntimeError(
pyspark.errors.exceptions.base.PySparkRuntimeError: [PYTHON_VERSION_MISMATCH] Python in worker has different version (3, 12) than that in driver 3.9, PySpark cannot run with different minor versions.
Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

This error indicates a mismatch between the Python version used in the Jupyter kernel and the Python version used by the Spark cluster. In this case, Python 3.12 was used by the Spark cluster, while the kernel was using Python 3.9. PySpark requires that both the driver and the worker use the same minor version of Python.

To resolve this issue, ensure that your Conda environment is created with Python 3.12, and that you correctly create a Spark session using the SparkSession module.

Affected version

3.5.1