Read and Write Tables From Hive With Spark Scala

How to read and write tables from Hive with Spark Scala.

  1. Install the following SBT dependencies:

    libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided"
    libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0" % "provided"
  2. Create your Spark session by running the following lines of code:

    // To create your Spark session.
    val sparkSession = SparkSession.builder()
     .appName("example-spark-scala-read-and-write-from-hive")
     .config("hive.metastore.warehouse.dir", params.hiveHost + "user/hive/warehouse")
     .enableHiveSupport()
     .getOrCreate()
  3. You can now read and write tables from Hive by running the following lines of code:

    • Read Tables

    • Write Tables

    // ======= To read files.
    // To read Hive tables in a Spark Dataframe.
    val dfHive = sql("SELECT * from helloworld")
    logger.info("Reading hive table : OK")
    logger.info(dfHive.show())
    // ====== To create a Dataframe with 1 partition.
    import sparkSession.implicits._
    val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1)
    
    // ======= To write files
    // To write Dataframe as a Hive table.
    import sparkSession.sql
    
    sql("DROP TABLE IF EXISTS helloworld")
    sql("CREATE TABLE helloworld (message STRING)")
    df.write.mode(SaveMode.Overwrite).saveAsTable("helloworld")
    logger.info("Writing hive table : OK")