Manually Upload Large Files to HDFS

Use the suggested workarounds if you have issues uploading files larger than a few gigabytes to HDFS via HUE.

To manually upload files larger than a few gigabytes to HDFS via HUE, you can:

  • Split your files into several Zip archives smaller than 1GB.

    To create Zip archives, use 7-Zip under Windows or the split command under Linux.
  • Upload files to HDFS via HUE.

  • Create and run a Sqoop job as follows:

    mypath="/hdfspath/to/data/"
    myzip="name of my file" # The file name must be without the .zip extension.
    
    hadoop fs -chmod 777 "$mypath"
    
    hadoop fs -ls "$mypath$myzip.zip".*
    
    hadoop fs -cat "$mypath$myzip.zip".* > file.zip
    ls -la
    unzip file.zip -d "$myzip"
    ls -la "$myzip/"
    
    hadoop fs -put -f $( echo "$myzip/" | sed s/\ /\%20/g ) "$mypath"

    Where the mypath and myzip variables must be replaced with your values.