Read and Write Tables From Hive With Spark Scala
-
Install the following sbt (Scala Build Tool) dependencies:
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided" libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided"
As the Spark Scala dependencies already exist in Saagie, they are specified as
"provided"
to avoid having a heavy JAR file. -
Create your Spark session by running the following lines of code:
// Initialize a Spark session with Hive support enabled and configures it to connect to a Hive Metastore. val sparkSession = SparkSession.builder() .appName("example-spark-scala-read-and-write-from-hive") .config("hive.metastore.warehouse.dir", params.hiveHost + "user/hive/warehouse") .enableHiveSupport() .getOrCreate()
-
You can now read and write tables from Hive by running the following lines of code:
// Read tables // Read Hive tables in a Spark DataFrame. val dfHive = sql("SELECT * from helloworld") logger.info("Reading hive table : OK") logger.info(dfHive.show())
// Create a DataFrame with 1 partition. import sparkSession.implicits._ val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1) // Write tables // Write the DataFrame as a Hive table. import sparkSession.sql sql("DROP TABLE IF EXISTS helloworld") sql("CREATE TABLE helloworld (message STRING)") df.write.mode(SaveMode.Overwrite).saveAsTable("helloworld") //Log the successful message. logger.info("Writing hive table : OK")