Package Your Spark Scala Code With the Assembly Plugin
-
Install the plugin in your projects by adding the following line in the
project/assembly.sbt
file:addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
-
Configure the plugin by adding the following lines in the
build.sbt
file:import sbt.Keys._ assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
This configuration ensures that the created fat JAR only includes your project’s dependencies and not the Scala standard library.
Example of thebuild.sbt
fileimport sbt.Keys._ assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) name := "my-spark-application" version := "0.1" scalaVersion := "2.11.12" val SPARK_VERSION = "2.4.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % SPARK_VERSION % "provided", (1) "org.apache.spark" %% "spark-sql" % SPARK_VERSION % "provided" (1) ) assemblyMergeStrategy in assembly := { case PathList("META-INF", xs@_*) => xs map { _.toLowerCase } match { case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) => MergeStrategy.discard case _ => MergeStrategy.discard } case "conf/application.conf" => MergeStrategy.concat case _ => MergeStrategy.first } test in assembly := {} parallelExecution in Test := false
Where:
1 As the Spark Scala dependencies already exist in Saagie, specify them as "provided"
in yourbuild.sbt
file to avoid having a heavy JAR file.