Package Your Spark Scala Code With the Assembly Plugin

To run any Spark Scala jobs on Saagie, you must package your code with the Assembly plugin to gather all your project dependencies into a fat JAR file.

  1. Install the Assembly plugin by adding the following lines in the project/assembly.sbt file:

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
  2. Add the following lines in the build.sbt file to configure it:

    import sbt.Keys._
    assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    Example of the build.sbt file
    import sbt.Keys._
    assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
    
    name := "my-spark-application"
    version := "0.1"
    scalaVersion := "2.11.12"
    val SPARK_VERSION = "2.4.0"
    
    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % SPARK_VERSION % "provided",
      "org.apache.spark" %% "spark-sql" % SPARK_VERSION % "provided"
    )
    
    assemblyMergeStrategy in assembly := {
      case PathList("META-INF", xs@_*) =>
        xs map {
          _.toLowerCase
        } match {
          case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) => MergeStrategy.discard
          case _ => MergeStrategy.discard
        }
      case "conf/application.conf" => MergeStrategy.concat
      case _ => MergeStrategy.first
    }
    
    test in assembly := {}
    parallelExecution in Test := false
    As the Spark Scala dependencies are already present in Saagie, specify them as "provided" in your build.sbt file to avoid having a heavy .jar file.