Create Hive Dynamic Tables

You can use an R script to automatically create Raw SQL tables from an HDFS directory.

The script will create the database, if it does not exist yet. Then, the script goes through all the directory sub-folders to create Raw Hive tables associated with the .gz file of each sub-folder.

The table name is the same as the sub-folder name.

To automatically create Raw SQL tables from an HDFS directory, you must:

  1. Download the Create_Table_Hive.tar from our GitHub repository.

  2. In Saagie, create a job and upload the Create_Table_Hive.tar file as a package.

  3. Add the following script as a command line to start the job:

    Rscript Create_Table.R "http://IP_HDFS:PORT_HDFS/webhdfs/v1" "jdbc:hive2://IP_HIVE:PORT_HIVE/;ssl=false" "USER_HDFS" "PWD_HDFS" "NAME_BDD" "PATH_DIRECTORY" "SEPARATOR_FILE" "QUOTE_FILE"

    Where:

    • IP_HDFS is the Internet Protocol of HDFS.

    • PORT_HDFS is the HDFS port.

    • IP_HIVE is the Internet Protocol of Hive.

    • PORT_HIVE is the Hive port.

    • USER_HDFS is the HDFS user.

    • PWD_HDFS is the HDFS password.

    • NAME_BDD is the name of database.

    • PATH_DIRECTORY is the path of the directory.

    • SEPARATOR_FILE is the separator field in the files.

    • QUOTE_FILE is the quote field in the files.