Read and Write Files With HDFS

How to read and write files from HDFS using HDFS, WebHDFS, and HTTPFS protocols.

Before you begin

To execute the following examples, make sure that you have created the following environment variables:

Key Value

HDFS

IP_HDFS

IP or full name of the namenode1

PORT_HDFS

8020

WebHDFS

IP_WEBHDFS

IP or full name of the namenode1

PORT_WEBHDFS

50070

50470 for Kerberized cluster.

HTTPFS

IP_HTTPFS

IP or full name of the namenode1

PORT_HTTPFS

14000

These are default values.

Read and Write Files With the HDFS Protocol

For more information, see the official HDFS protocol documentation.
  • Read Files

  • Write Files

# To authenticate.
export HADOOP_USER_NAME="my_user"

# To get file.
hdfs dfs -get hdfs://$IP_HDFS:$PORT_HDFS/distant/path/my_distant_file my_local_file
# To authenticate.
export HADOOP_USER_NAME="my_user"

# To place file.
hdfs dfs -put my_local_file hdfs://$IP_HDFS:$PORT_HDFS/distant/path/

Read and Write Files With the WebHDFS Protocol

For more information, see the official WebHDFS protocol documentation.
  • Read Files

  • Write Files

# To get file.
curl -L -X GET "http://$IP_WEBHDFS:$PORT_WEBHDFS/webhdfs/v1/distant/path/my_distant_file?user.name=my_user&op=OPEN"
  1. Request the node name to get the data node location by running the following lines of code:

    # To get location.
    RET=$(curl -XPUT --silent --include "http://$IP_WEBHDFS:$PORT_WEBHDFS/webhdfs/v1/distant/path/my_distant_file?user.name=my_user&op=CREATE" | grep 'Location' | cut -d" " -f2)
    echo $RET

    Where:

    • curl sends the HTTP PUT request.

    • grep retrieves only the value of Location.

    • cut retrieves only the second element.

    • echo displays the return.

  2. Put the file in the data node location by running the following lines of code:

    # To place file.
    curl -XPUT --include -T my_local_file "$RET"

    Where the variable $RET is the return of the first step.

Read and Write Files With the WebHDFS Protocol With a Kerberized Cluster

When using the WebHDFS protocol with a Kerberized cluster, make sure you are using the correct port (50470). Then, run the following curl command after getting a valid ticket from a kinit command.

# With Kerberos, provided you have a valid Kerberos ticket obtained with kinit.
curl -k --negotiate -u : "https://nn1:50470/webhdfs/v1/?op=LISTSTATUS"

Write Files With the HTTPFS Protocol

  • Read Files

  • Write Files

# To get file.
curl -X GET "http://$IP_HTTPFS:$PORT_HTTPFS//webhdfs/v1/distant/path/my_distant_file?user.name=my_user&op=OPEN" --header "Content-Type:application/octet-stream" -o "my_local_file"
# To place file.
curl -X PUT "http://$IP_HTTPFS:$PORT_HTTPFS/webhdfs/v1/distant/path/my_distant_file?user.name=my_user&op=CREATE&data=true" --header "Content-Type:application/octet-stream" -T "my_local_file"