Import Data From Other Relational Database Management System (RDBMS)
-
Check that your job can download the
.jar
file from HDFS:hdfs dfs -get /path/folder/my_JDBC_file.jar (1) (2)
Where
1 /path/folder/
is the path to your folder.2 my_JDBC_file.jar
is the name of your JAR file. -
Add the
.jar
file to theHADOOP_CLASSPATH
environment variable:export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:./my_JDBC_file.jar
Adding the JAR file to the classpath is essential, as it enables Hadoop to find and use the classes contained in the JAR file at runtime.
-
Run the following Sqoop script:
# Specify the database driver used to connect to your RDBMS. driver=<rdms> (1) # Specify the IP or DNS address of the database server. ip=<database-ip> (2) # Specify the port number on which the database server is listening. port=<port> (3) # Specify the username used to log in to the database. username=myuser (4) # Specify the password used to log in to the database. password=mypwd (4) # Specify the name of the database from which the data will be imported. database=mydb (4) # Specify the name of the table in the database from which the data will be imported. table=mytable (4) # Specify the destination folder in HDFS where data will be stored. hdfsdest=/user/hdfs/$table (4) # Import data from the specified table in the RDBMS into HDFS. sqoop import --connect jdbc:$driver://$ip:$port/$database --username $username --password $password \ --target-dir $hdfsdest \ --num-mappers 1 \ --table $table
Where:
1 <rdms>
must be replaced with the specific JDBC driver for the RDMS you want to use.2 <database-ip>
must be replaced with the IP address of your database server.3 <port>
must be replaced with the port number on which the database server is listening.4 The other fields must also be replaced with the correct values corresponding to the database and HDFS setup.