Configure Spark Resources
Using Spark on Saagie means running Spark jobs on Kubernetes. Here is an example of a job submission with Kubernetes:
spark-submit \ --driver-memory 2G \ --class <ClassName of the Spark Application to launch> \ --conf spark.executor.memory=3G \ (1) --conf spark.executor.cores=4 \ (2) --conf spark.kubernetes.executor.limit.cores=4 (3) --conf spark.executor.instances=3 \ (4) {file}
Where:
1 | spark.executor.memory is the amount of memory for each executor (request and limit). |
2 | spark.executor.cores is the amount of CPU cores requested for each executor. |
3 | spark.kubernetes.executor.limit.cores is the limit of CPU cores for each executor. |
4 | spark.executor.instances is the amount of executors for the application. |
In the example above, the total provisioned cluster would be 3 executors of 4 cores and 3G memory each, or 12 CPU cores and 9G memory in total. |
|
For more information on performance tuning in Spark, how to detect performance issues, and best practices for avoiding slowdowns or bottlenecks in your workflow, read the following articles: