Import Data From a MySQL, Oracle, PostgreSQL, or SQL Server Database
-
Download the table data.
# Driver driver=mysql # IP or DNS ip=127.0.0.1 # Port port=3306 # User username=myuser # Password password=mypwd # Database database=mydb # Table table=mytable # Folder in HDFS hdfsdest=/user/hdfs/$table # To import tables sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
# Driver driver=oracle:thin # IP or DNS ip=127.0.0.1 # Port port=1521 # User username=myuser # Password password=mypwd # Database database=mydb # Table table=mytable # Folder in HDFS hdfsdest=/user/hdfs/$table # To import tables sqoop import --connect jdbc:$driver:@//$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
# Driver driver=postgresql # IP or DNS ip=127.0.0.1 # Port port=5432 # User username=myuser # Password password=mypwd # Database database=mydb # Table table=mytable # Folder in HDFS hdfsdest=/user/hdfs/$table # To import tables sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
# Driver driver=sqlserver # IP or DNS ip=127.0.0.1 # Port port=1433 # User username=myuser # Password password=mypwd # Database database=mydb # Table table=mytable # Folder in HDFS hdfsdest=/user/hdfs/$table # To import tables sqoop import --connect "jdbc:$driver://$ip:$port;DatabaseName=$database" --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
You can add the following options to your lines of code:
Option Description --verbose
Shows more logs.
--delete-target-dir
Deletes the HDFS directory before import.
--query "SELECT * FROM $table WHERE \$CONDITIONS"
Imports data from a query instead of a table.
The --table
option must be removed from the command line.--num-mappers n
Splits the data flow into n data flow, where
n
, being the number of data nodes, must be replaced with a number.This option must be used with the --split-by 'column'
option, which specifies the column to split.--as-textfile
Imports data in text mode.
--null-non-string '\\N'
The string to be written for a null value for non-string columns.
Used for Impala and Hive. --null-string '\\N'
The string to be written for a null value for string columns.
Used for Impala and Hive. --hive-import
Creates a Hive and Impala table with the imported data.
--hive-table $table
Specifies the name of the Hive and Impala table.
--hive-overwrite
Overwrites the data of the table. If this option is not present, the data is concatenated.
--hive-database db
Specifies the name of the Hive and Impala database, where
db
must be replaced with the name of your database.--hive-drop-import-delims
Removes \n, \r, and \01 from string fields when importing into Hive.
-D oracle.sessionTimeZone=Europe/Paris
For Oracle database only. Selects the time zone of your date format.