Import Data From Other Relational Database Management System (RDBMS)

You can import data from other RDBMS than MySQL, Oracle, PostgreSQL, SQL Server, and others.

Sqoop uses JDBC (Java Database Connectivity) to connect databases. Make sure you have the .jar file to access your database.
  1. Check that your job can download the .jar file from HDFS:

    hdfs dfs -get /path/folder/my_JDBC_file.jar (1) (2)

    Where

    1 /path/folder/ is the path to your folder.
    2 my_JDBC_file.jar is the name of your .jar file.
  2. Add the .jar file to the HADOOP_CLASSPATH environment variable:

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:./my_JDBC_file.jar
  3. Run the following Sqoop script:

    # Driver
    driver=<rdms> (1)
    # IP or DNS
    ip=<database-ip> (2)
    # Port
    port=<port> (3)
    # User
    username=myuser
    # Password
    password=mypwd
    # Database
    database=mydb
    # Table
    table=mytable
    # Folder in HDFS
    hdfsdest=/user/hdfs/$table
    
    # To import tables
    sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \
    --target-dir $hdfsdest \
    --num-mappers 1 \
    --table $table

    Where:

    1 <rdms> must be replaced with the RDMS you want to use.
    2 <database-ip> must be replaced with the IP of your database.
    3 <port> must be replaced with your port number.