Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)

2 min read 04-10-2024

Spark-submit yarn-client mode hangs even though spark task completed (pyspark 3.4.1)

Spark-Submit in Yarn-Client Mode Hangs: A Common Pyspark 3.4.1 Issue and Its Solution

Understanding the Problem

You've submitted a PySpark job using spark-submit in yarn-client mode, and while your task successfully completes, the process hangs indefinitely, preventing you from interacting with the terminal. This behavior can be frustrating, especially when you need to quickly move on to other tasks. This article delves into the root cause of this issue and provides a simple solution.

The Scenario and Original Code

Let's imagine you have a simple PySpark script named my_script.py:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("MySparkJob").getOrCreate()

# Your Spark logic here
# ...

spark.stop()

You submit this script to your Yarn cluster using:

spark-submit --master yarn --deploy-mode client my_script.py

The job runs and completes successfully. However, your terminal remains in a hung state, unresponsive to any commands.

The Root Cause: Client Mode and Driver Execution

The yarn-client mode instructs Spark to execute the driver (the main program) on the client node (your local machine) and launch executors on the Yarn cluster. The problem arises because spark.stop() gracefully shuts down the SparkSession and the executors, but the driver process on your client node continues running. This happens because, in client mode, the driver process waits for the executors to finish and then terminates itself.

The Solution: Force Termination

The easiest way to resolve this hang is to forcibly terminate the driver process. You can do this by pressing Ctrl+C in your terminal. This will interrupt the driver process and allow you to regain control of your terminal.

Additional Considerations:

Log Analysis: Inspect the logs in your Spark application directory (usually $SPARK_HOME/work/) to ensure there are no errors or warnings that could be contributing to the hang.
Alternative Deploy Mode: yarn-cluster: Consider switching to yarn-cluster mode if you prefer to have your driver run directly within the Yarn cluster. This will prevent the driver process from hanging on the client node.
Spark Driver Configuration: If you're encountering this issue frequently, consider configuring the driver to automatically terminate after the application completes. You can set spark.driver.stopOnSuccess.enabled to true in your spark-defaults.conf file.

Conclusion

While seemingly perplexing, the spark-submit hang in yarn-client mode is a common quirk stemming from how Spark handles driver processes. By understanding the root cause and applying the solution of forcibly terminating the driver, you can efficiently manage your Spark jobs and avoid unnecessary delays.

Remember to explore the yarn-cluster deployment mode for potentially smoother execution and consider fine-tuning your Spark configurations for optimal performance.