Import Data From a MySQL, Oracle, PostgreSQL, or SQL Server Database

You can import data from various databases, such as MySQL, Oracle, PostgreSQL, SQL Server, and others.

Make sure you have enough rights to access the database.
  1. Download the table data.

    • MySQL

    • Oracle

    • PostgreSQL

    • SQL Server

    # Driver
    driver=mysql
    # IP or DNS
    ip=127.0.0.1
    # Port
    port=3306
    # User
    username=myuser
    # Password
    password=mypwd
    # Database
    database=mydb
    # Table
    table=mytable
    # Folder in HDFS
    hdfsdest=/user/hdfs/$table
    
    # To import tables
    sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \
    --target-dir $hdfsdest \
    --num-mappers 1 \
    --table $table
    # Driver
    driver=oracle:thin
    # IP or DNS
    ip=127.0.0.1
    # Port
    port=1521
    # User
    username=myuser
    # Password
    password=mypwd
    # Database
    database=mydb
    # Table
    table=mytable
    # Folder in HDFS
    hdfsdest=/user/hdfs/$table
    
    # To import tables
    sqoop import --connect jdbc:$driver:@//$ip:$port/$database --username $username --password $password \
    --target-dir $hdfsdest \
    --num-mappers 1 \
    --table $table
    # Driver
    driver=postgresql
    # IP or DNS
    ip=127.0.0.1
    # Port
    port=5432
    # User
    username=myuser
    # Password
    password=mypwd
    # Database
    database=mydb
    # Table
    table=mytable
    # Folder in HDFS
    hdfsdest=/user/hdfs/$table
    
    # To import tables
    sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \
    --target-dir $hdfsdest \
    --num-mappers 1 \
    --table $table
    # Driver
    driver=sqlserver
    # IP or DNS
    ip=127.0.0.1
    # Port
    port=1433
    # User
    username=myuser
    # Password
    password=mypwd
    # Database
    database=mydb
    # Table
    table=mytable
    # Folder in HDFS
    hdfsdest=/user/hdfs/$table
    
    # To import tables
    sqoop import --connect "jdbc:$driver://$ip:$port;DatabaseName=$database" --username $username --password $password \
    --target-dir $hdfsdest \
    --num-mappers 1 \
    --table $table

    You can add the following options to your lines of code:

    Option Description

    --verbose

    Shows more logs.

    --delete-target-dir

    Deletes the HDFS directory before import.

    --query "SELECT * FROM $table WHERE \$CONDITIONS"

    Imports data from a query instead of a table.

    The --table option must be removed from the command line.

    --num-mappers n

    Splits the data flow into n data flow, where n, being the number of data nodes, must be replaced with a number.

    This option must be used with the --split-by 'column' option, which specifies the column to split.

    --as-textfile

    Imports data in text mode.

    --null-non-string '\\N'

    The string to be written for a null value for non-string columns.

    Used for Impala and Hive.

    --null-string '\\N'

    The string to be written for a null value for string columns.

    Used for Impala and Hive.

    --hive-import

    Creates a Hive and Impala table with the imported data.

    --hive-table $table

    Specifies the name of the Hive and Impala table.

    --hive-overwrite

    Overwrites the data of the table. If this option is not present, the data is concatenated.

    --hive-database db

    Specifies the name of the Hive and Impala database, where db must be replaced with the name of your database.

    --hive-drop-import-delims

    Removes \n, \r, and \01 from string fields when importing into Hive.

    -D oracle.sessionTimeZone=Europe/Paris

    For Oracle database only.

    Selects the time zone of your date format.