Read and Write Files From Amazon S3 Bucket With Python

How to read and write files from Amazon S3 Bucket with Python using the pandas package.

  1. Install the s3fs package and import the following libraries by running the following lines of code:

    import pandas as pd
    import s3fs
    import os
    Since version 0.20.0, pandas uses s3fs to manage S3 connections.
  2. Declare your environment variables in your Saagie project to allow easy modifications and not store your credentials on Git when using version control on your project.

    You can also declare your environment variables directly from your Python code, but we do not recommend this solution.

    # Environment variables must be defined outside the .py file, in your Saagie project.
    # Credential values
    key = 'BLKIUG450KFBB'
    secret = 'oihKJFuhfuh/953oiof'
    region = 'eu-west-3'
    
    # To configure credentials.
    os.environ['AWS_ACCESS_KEY_ID'] = key
    os.environ['AWS_SECRET_ACCESS_KEY'] = secret
    os.environ['AWS_DEFAULT_REGION'] = region
  3. You can now read and write files from Amazon S3 Bucket by running the following lines of code:

    # Go here if you have already configured your environment variables.
    
    # File parameters
    s3_file = 's3://bucket-name/path/to/file/titanic.csv'
    
    # To import an example file.
    df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')
    
    ##### To write file #####
    df.to_csv(s3_file)
    
    ##### To read file #####
    df_s3 = pd.read_csv(s3_file)