Import Data From a MySQL, Oracle, PostgreSQL, or SQL Server Database
-
Download the table data.
# Specify the JDBC driver to connect to the MySQL database. driver=mysql # Specify the IP or DNS address of the MySQL database server. ip=127.0.0.1 # Specify the port number on which the MySQL database server is listening. port=3306 # Specify the username used to log in to the MySQL database. username=myuser (1) # Specify the password used to log in the user to the MySQL database. password=mypwd (1) # Specify the name of the MySQL database from which the data will be imported. database=mydb (1) # Specify the name of the table in the MySQL database from which the data will be imported. table=mytable (1) # Specify the destination folder in HDFS where data will be stored. hdfsdest=/user/hdfs/$table (1) # Import data from the specified table in the MySQL database into HDFS. sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Where:
1 These fields must be replaced with the correct values according to your database and HDFS setup. # Specify the JDBC driver to connect to the Oracle database. driver=oracle:thin # Specify the IP or DNS address of the Oracle database server. ip=127.0.0.1 # Specify the port number on which the Oracle database server is listening. port=1521 # Specify the username used to log in to the Oracle database. username=myuser (1) # Specify the password used to log in the user to the Oracle database. password=mypwd (1) # Specify the name of the Oracle database from which the data will be imported. database=mydb (1) # Specify the name of the table in the Oracle database from which the data will be imported. table=mytable (1) # Specify the destination folder in HDFS where data will be stored. hdfsdest=/user/hdfs/$table (1) # Import data from the specified table in the Oracle database into HDFS. sqoop import --connect jdbc:$driver:@//$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Where:
1 These fields must be replaced with the correct values according to your database and HDFS setup. # Specify the JDBC driver to connect to the PostgreSQL database. driver=postgresql # Specify the IP or DNS address of the PostgreSQL database server. ip=127.0.0.1 # Specify the port number on which the PostgreSQL database server is listening. port=5432 # Specify the username used to log in to the PostgreSQL database. username=myuser (1) # Specify the password used to log in the user to the PostgreSQL database. password=mypwd (1) # Specify the name of the PostgreSQL database from which the data will be imported. database=mydb (1) # Specify the name of the table in the PostgreSQL database from which the data will be imported. table=mytable (1) # Specify the destination folder in HDFS where data will be stored. hdfsdest=/user/hdfs/$table (1) # Import data from the specified table in the PostgreSQL database into HDFS. sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Where:
1 These fields must be replaced with the correct values according to your database and HDFS setup. # Specify the JDBC driver to connect to the SQL Server database. driver=sqlserver # Specify the IP or DNS address of the SQL Server database server. ip=127.0.0.1 # Specify the port number on which the SQL Server database server is listening. port=1433 # Specify the username used to log in to the SQL Server database. username=myuser (1) # Specify the password used to log in the user to the SQL Server database. password=mypwd (1) # Specify the name of the SQL Server database from which the data will be imported. database=mydb (1) # Specify the name of the table in the SQL Server database from which the data will be imported. table=mytable (1) # Specify the destination folder in HDFS where data will be stored. hdfsdest=/user/hdfs/$table (1) # Import data from the specified table in the SQL Server database into HDFS. sqoop import --connect "jdbc:$driver://$ip:$port;DatabaseName=$database" --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Where:
1 These fields must be replaced with the correct values according to your database and HDFS setup. -
OPTIONAL: You can add the following options to your code:
Option Description --verbose
Shows more logs.
--delete-target-dir
Deletes the HDFS directory before import.
--query "SELECT * FROM $table WHERE \$CONDITIONS"
Imports data from a query instead of a table.
The --table
option must be removed from the command line.--num-mappers n
Splits the data flow into n data flow, where
n
is the number of data nodes, and must be replaced with a number.This option must be used with the --split-by 'column'
option, which specifies the column to split.--as-textfile
Imports data in text mode.
--null-non-string '\\N'
The string to be written for a null value for non-string columns.
Used for Impala and Hive. --null-string '\\N'
The string to be written for a null value for string columns.
Used for Impala and Hive. --hive-import
Creates a Hive and Impala table with the imported data.
--hive-table $table
Specifies the name of the Hive and Impala table.
--hive-overwrite
Overwrites the data of the table. If this option is not present, the data is concatenated.
--hive-database db
Specifies the name of the Hive and Impala database. Where
db
must be replaced with the name of your database.--hive-drop-import-delims
Removes \n, \r, and \01 from string fields when importing into Hive.
-D oracle.sessionTimeZone=Europe/Paris
For Oracle database only. Selects the time zone of your date format.