Read and Write Files From HDFS With Java/Scala
Without a Kerberized Cluster
-
Install the following Maven dependency:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency>
-
Initialize the HDFS
FileSystem
object that will allow you to interact with HDFS by running the following lines of code:private static String HADOOP_CONF_DIR = System.getenv("HADOOP_CONF_DIR"); // ====== To initialize the HDFS file system object. Configuration conf = new Configuration(); conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/core-site.xml")); conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/hdfs-site.xml")); conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName()); // To set the HADOOP user. System.setProperty("HADOOP_USER_NAME", "hdfs"); System.setProperty("hadoop.home.dir", "/"); // To get the HDFS file system. FileSystem fs = FileSystem.get(conf);
The HDFS connection URL format must be hdfs://namenodedns:port
, where8020
is the default port. The connection URLs are already defined in the.xml
configuration files. -
Initialize HDFS directory creation by running the following lines of code:
// ==== To create a folder if it does not exist. Path workingDir=fs.getWorkingDirectory(); Path newFolderPath= new Path(path); (1) if(!fs.exists(newFolderPath)) { // To create a new directory. fs.mkdirs(newFolderPath); logger.info("Path "+path+" created."); }
Where:
1 path
(in brackets) must be replaced with your folder path. -
You can now read and write files from HDFS by running the following lines of code:
// ==== To read file. logger.info("Read file from hdfs"); // To create a path. Path hdfsreadpath = new Path(newFolderPath + "/" + fileName); // To initialize input stream. FSDataInputStream inputStream = fs.open(hdfsreadpath); // Classic input stream usage. String out= IOUtils.toString(inputStream, "UTF-8"); logger.info(out); inputStream.close(); fs.close();
// ==== To write file. logger.info("Begin Write file into hdfs"); // To create a path. Path hdfswritepath = new Path(newFolderPath + "/" + fileName); // To initialize output stream. FSDataOutputStream outputStream=fs.create(hdfswritepath); // Classic output stream usage. outputStream.writeBytes(fileContent); outputStream.close(); logger.info("End Write file into hdfs");
With a Kerberized Cluster
-
Install the following Maven dependencies:
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.6.0-cdh5.16.1.1</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0-cdh5.16.1.1</version> </dependency>
-
Add the
jaas.conf
file undersrc/main/resources
by running the following lines of code:Main { com.sun.security.auth.module.Krb5LoginModule required client=TRUE; };
-
Initialize your connection by creating a login context function with the following lines of code:
private static final String JDBC_DRIVER_NAME = "org.apache.hive.jdbc.HiveDriver"; private static String username; private static String password; private static String HADOOP_CONF_DIR = System.getenv("HADOOP_CONF_DIR"); private static LoginContext kinit(String username, String password) throws LoginException { LoginContext lc = new LoginContext(Main.class.getSimpleName(), callbacks -> { for (Callback c : callbacks) { if (c instanceof NameCallback) ((NameCallback) c).setName(username); if (c instanceof PasswordCallback) ((PasswordCallback) c).setPassword(password.toCharArray()); } }); lc.login(); return lc; }
-
Initialize the HDFS
FileSystem
object that will allow to you to interact with HDFS by running the following lines of code:URL url = Main.class.getClassLoader().getResource("jaas.conf"); System.setProperty("java.security.auth.login.config", url.toExternalForm()); // ====== To initialize the HDFS file system object. Configuration conf = new Configuration(); conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/core-site.xml")); conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/hdfs-site.xml")); UserGroupInformation.setConfiguration(conf); LoginContext lc = kinit(username, password); UserGroupInformation.loginUserFromSubject(lc.getSubject()); // To get the HDFS file system. FileSystem fs = FileSystem.get(conf);
-
Initialize HDFS directory creation by running the following lines of code:
// ==== To create a folder if it does not exist. Path workingDir=fs.getWorkingDirectory(); Path newFolderPath= new Path(path); (1) if(!fs.exists(newFolderPath)) { // To create a new directory. fs.mkdirs(newFolderPath); logger.info("Path "+path+" created."); }
Where:
1 path
(in brackets) must be replaced with your folder path. -
You can now read and write files from HDFS with Kerberos by running the following lines of code:
// ==== To read file. logger.info("Read file from hdfs"); // To create a path. Path hdfsreadpath = new Path(newFolderPath + "/" + fileName); // To initialize input stream. FSDataInputStream inputStream = fs.open(hdfsreadpath); // Classic input stream usage. String out= IOUtils.toString(inputStream, "UTF-8"); logger.info(out); inputStream.close(); fs.close();
// ==== To write file. logger.info("Begin Write file into hdfs"); // To create a path. Path hdfswritepath = new Path(newFolderPath + "/" + fileName); // To initialize output stream. FSDataOutputStream outputStream=fs.create(hdfswritepath); // Classic output stream usage. outputStream.writeBytes(fileContent); outputStream.close(); logger.info("End Write file into hdfs");