Read and Write Files From Amazon S3 Bucket With R

How to read and write files from Amazon S3 Bucket with R using the arrow or aws.s3 package.

  1. Declare your environment variables in your Saagie project to allow easy modifications and not store your credentials on Git when using version control on your project.

    You can also declare your environment variables directly from your R code, but we do not recommend this solution.

    key <- 'BLKIUG450KFBB'
    secret <- 'oihKJFuhfuh/953oiof'
    region <- 'eu-west-3'
    Sys.setenv(AWS_ACCESS_KEY_ID = key, AWS_SECRET_ACCESS_KEY = secret, AWS_DEFAULT_REGION = region)
  2. You can now read and write files from Amazon S3 Bucket using the arrow or aws.s3 package with the following lines of code:

    • Using arrow

    • Using aws.s3

    The arrow package is a library that can interact with Amazon S3 Bucket to write CSV and Parquet files locally and directly to Amazon S3 Bucket.

    library(arrow)
    
    # To get a bucket.
    bucket <- s3_bucket(bucket_name)
    # Create a path to the file
    path <- bucket$path(object_name)
    
    # To write a CSV file from the created path.
    write_csv_arrow(iris, path)
    
    # To read the file from the path.
    iris2 <- read_csv_arrow(path))

    The aws.s3 package is a library that can interact with Amazon S3 Bucket in different ways. It is slower than the arrow package, but has more features.

    library(aws.s3)
    library(data.table) # Required to read from and write to the RAM
    
    # To upload the file from the RAM.
    s3write_using(iris, FUN = fwrite, object = object_name, bucket = bucket_name)
    
    # To read the file from Amazon S3 Bucket to the RAM.
    iris3 <- s3read_using(FUN = fread, object = object_name, bucket = bucket_name)

    Where :

    • bucket_name must be declared using bucket_name ← 'saagie-service'.

    • object_name must be replaced with object_name ← 'documentation-s3/doc-r/iris.csv'.

    Additional Method
    # List the available buckets.
    bucketlist()
    
    # List the files in the bucket.
    get_bucket(bucket_name)
    
    ##### Uploading a file from the disk #####
    # To write the file to the disk.
    write.csv(iris, 'iris.csv', row.names = F)
    put_object(file = 'iris.csv', object = object_name, bucket = bucket_name)
    
    # Another way to read from RAM.
    iris4 <- data.table::fread(rawToChar(get_object(object = object_name, bucket = bucket_name)))
    
    # To read the file from the disk.
    # To write binary to the disk and read it. No additional library is needed.
    writeBin(get_object(object = object_name, bucket = bucket_name, as = 'raw'), con = 'iris5.csv')
    iris5 <- read.csv('iris5.csv')