You may encounter the following warning message when running a Spark instance using the default PySpark kernel in a Jupyter + Spark application:
WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will have an impact.
This warning occurs because, in the Jupyter + Spark setup, the default PySpark kernel sets the environment variable: PYTHONSTARTUP=${SPARK_HOME}/python/pyspark/shell.py
. This variable triggers the initialization of a default Spark session when the kernel starts. Therefore, when you later create a Spark session using the following code:
spark = SparkSession.builder().appName(“my-app”).getOrCreate()
the getOrCreate()
method detects the existing Spark session and reuses it. The warning is simply informing you that a Spark session is already active, and any new configuration options passed to builder()
will not override the existing session's configuration except for runtime SQL configurations.
Woarkaround
To suppress this warning message, simply add a cell with the command spark.stop()
. This will terminate the automatically created Spark session, allowing you to start a new one when needed.
If you are using a custom kernel based on a Conda environment or a Python virtual environment (one that does not include the PySpark startup script), this warning should not appear.