Read and Write Files From HDFS With Java/Scala

How to read and write files from HDFS with Java/Scala.

Without a Kerberized Cluster

  1. Install the following Maven dependency:

    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
     </dependency>
  2. Initialize the HDFS FileSystem object that will allow you to interact with HDFS by running the following lines of code:

    private static String HADOOP_CONF_DIR = System.getenv("HADOOP_CONF_DIR");
    
    // ====== To initialize the HDFS file system object.
    Configuration conf = new Configuration();
    conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/core-site.xml"));
    conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/hdfs-site.xml"));
    conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
    
    // To set the HADOOP user.
    System.setProperty("HADOOP_USER_NAME", "hdfs");
    System.setProperty("hadoop.home.dir", "/");
    
    // To get the HDFS file system.
    FileSystem fs = FileSystem.get(conf);
    The HDFS connection URL format must be hdfs://namenodedns:port, where 8020 is the default port. The connection URLs are already defined in the .xml configuration files.
  3. Initialize HDFS directory creation by running the following lines of code:

    // ==== To create a folder if it does not exist.
    Path workingDir=fs.getWorkingDirectory();
    Path newFolderPath= new Path(path); (1)
    if(!fs.exists(newFolderPath)) {
       // To create a new directory.
       fs.mkdirs(newFolderPath);
       logger.info("Path "+path+" created.");
    }

    Where:

    1 path (in brackets) must be replaced with your folder path.
  4. You can now read and write files from HDFS by running the following lines of code:

    • Read Files

    • Write Files

    // ==== To read file.
    logger.info("Read file from hdfs");
    
    // To create a path.
    Path hdfsreadpath = new Path(newFolderPath + "/" + fileName);
    
    // To initialize input stream.
    FSDataInputStream inputStream = fs.open(hdfsreadpath);
    
    // Classic input stream usage.
    String out= IOUtils.toString(inputStream, "UTF-8");
    logger.info(out);
    inputStream.close();
    fs.close();
    // ==== To write file.
    logger.info("Begin Write file into hdfs");
    
    // To create a path.
    Path hdfswritepath = new Path(newFolderPath + "/" + fileName);
    
    // To initialize output stream.
    FSDataOutputStream outputStream=fs.create(hdfswritepath);
    
    // Classic output stream usage.
    outputStream.writeBytes(fileContent);
    outputStream.close();
    logger.info("End Write file into hdfs");

With a Kerberized Cluster

  1. Install the following Maven dependencies:

    <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-hdfs</artifactId>
         <version>2.6.0-cdh5.16.1.1</version>
    </dependency>
    <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-common</artifactId>
         <version>2.6.0-cdh5.16.1.1</version>
    </dependency>
  2. Add the jaas.conf file under src/main/resources by running the following lines of code:

    Main {
     com.sun.security.auth.module.Krb5LoginModule required client=TRUE;
    };
  3. Initialize your connection by creating a login context function with the following lines of code:

    private static final String JDBC_DRIVER_NAME = "org.apache.hive.jdbc.HiveDriver";
    private static String username;
    private static String password;
    private static String HADOOP_CONF_DIR = System.getenv("HADOOP_CONF_DIR");
    
    private static LoginContext kinit(String username, String password) throws LoginException {
        LoginContext lc = new LoginContext(Main.class.getSimpleName(), callbacks -> {
            for (Callback c : callbacks) {
                if (c instanceof NameCallback)
                    ((NameCallback) c).setName(username);
                if (c instanceof PasswordCallback)
                    ((PasswordCallback) c).setPassword(password.toCharArray());
            }
        });
        lc.login();
        return lc;
    }
  4. Initialize the HDFS FileSystem object that will allow to you to interact with HDFS by running the following lines of code:

    URL url = Main.class.getClassLoader().getResource("jaas.conf");
    System.setProperty("java.security.auth.login.config", url.toExternalForm());
    
    // ====== To initialize the HDFS file system object.
    Configuration conf = new Configuration();
    conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/core-site.xml"));
    conf.addResource(new Path("file:///" + HADOOP_CONF_DIR + "/hdfs-site.xml"));
    
    UserGroupInformation.setConfiguration(conf);
    
    LoginContext lc = kinit(username, password);
    UserGroupInformation.loginUserFromSubject(lc.getSubject());
    
    // To get the HDFS file system.
    FileSystem fs = FileSystem.get(conf);
  5. Initialize HDFS directory creation by running the following lines of code:

    // ==== To create a folder if it does not exist.
    Path workingDir=fs.getWorkingDirectory();
    Path newFolderPath= new Path(path); (1)
    if(!fs.exists(newFolderPath)) {
       // To create a new directory.
       fs.mkdirs(newFolderPath);
       logger.info("Path "+path+" created.");
    }

    Where:

    1 path (in brackets) must be replaced with your folder path.
  6. You can now read and write files from HDFS with Kerberos by running the following lines of code:

    • Read Files

    • Write Files

    // ==== To read file.
    logger.info("Read file from hdfs");
    
    // To create a path.
    Path hdfsreadpath = new Path(newFolderPath + "/" + fileName);
    
    // To initialize input stream.
    FSDataInputStream inputStream = fs.open(hdfsreadpath);
    
    // Classic input stream usage.
    String out= IOUtils.toString(inputStream, "UTF-8");
    logger.info(out);
    inputStream.close();
    fs.close();
    // ==== To write file.
    logger.info("Begin Write file into hdfs");
    
    // To create a path.
    Path hdfswritepath = new Path(newFolderPath + "/" + fileName);
    
    // To initialize output stream.
    FSDataOutputStream outputStream=fs.create(hdfswritepath);
    
    // Classic output stream usage.
    outputStream.writeBytes(fileContent);
    outputStream.close();
    logger.info("End Write file into hdfs");