Package Your Spark Scala Code With the Assembly Plugin
-
Install the Assembly plugin by adding the following lines in the
project/assembly.sbt
file:addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.1")
-
Add the following lines in the
build.sbt
file to configure it:import sbt.Keys._ assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false)
Example of thebuild.sbt
fileimport sbt.Keys._ assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false) name := "my-spark-application" version := "0.1" scalaVersion := "2.11.12" val SPARK_VERSION = "2.4.0" libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % SPARK_VERSION % "provided", "org.apache.spark" %% "spark-sql" % SPARK_VERSION % "provided" ) assemblyMergeStrategy in assembly := { case PathList("META-INF", xs@_*) => xs map { _.toLowerCase } match { case ("manifest.mf" :: Nil) | ("index.list" :: Nil) | ("dependencies" :: Nil) => MergeStrategy.discard case _ => MergeStrategy.discard } case "conf/application.conf" => MergeStrategy.concat case _ => MergeStrategy.first } test in assembly := {} parallelExecution in Test := false
As the Spark Scala dependencies are already present in Saagie, specify them as "provided"
in yourbuild.sbt
file to avoid having a heavy.jar
file.