Configure Spark Resources
Here is an example of a job submission for a Spark app to a Kubernetes cluster. In this example, the total provisioned cluster would be 3 executors of 4 cores and 3G memory each. In total, it represents 12 CPU cores and 9G of memory.
spark-submit \
--driver-memory 2G \ (1)
--class <ClassName of the Spark application to launch> \ (2)
--conf spark.executor.memory=3G \ (3)
--conf spark.executor.cores=4 \ (4)
--conf spark.kubernetes.executor.limit.cores=4 (5)
--conf spark.executor.instances=3 \ (6)
{file} (7)
Where:
1 | --driver-memory 2G is the amount of memory allocated to the driver process of the Spark app. |
2 | <ClassName of the Spark application to launch> must be replaced with class name of your Spark app. |
3 | spark.executor.memory is the amount of memory allocated to each executor in the Spark app (request and limit). |
4 | spark.executor.cores is the number of CPU cores allocated to each executor in the Spark app. |
5 | spark.kubernetes.executor.limit.cores is the limit of CPU cores that each executor in the Spark app can use on Kubernetes. |
6 | spark.executor.instances is the number of executor instances to be launched for the Spark app. |
7 | {file} must be replaced with the path to the JAR file with your Spark app code. |
|
For more information on performance tuning in Spark, how to detect performance issues, and best practices for avoiding slowdowns or bottlenecks in your workflow, read the following articles: