Read and Write Files From Amazon S3 Bucket With Python
-
Install the
s3fs
package and import the following libraries by running the following lines of code:import pandas as pd import s3fs import os
Since version 0.20.0
, pandas usess3fs
to manage S3 connections. -
Declare your environment variables in your Saagie project to allow easy modifications and not store your credentials on Git when using version control on your project.
You can also declare your environment variables directly from your Python code, but we do not recommend this solution.
# Environment variables must be defined outside the .py file, in your Saagie project. # Credential values key = 'BLKIUG450KFBB' secret = 'oihKJFuhfuh/953oiof' region = 'eu-west-3' # To configure credentials. os.environ['AWS_ACCESS_KEY_ID'] = key os.environ['AWS_SECRET_ACCESS_KEY'] = secret os.environ['AWS_DEFAULT_REGION'] = region
-
You can now read and write files from Amazon S3 Bucket by running the following lines of code:
# Go here if you have already configured your environment variables. # File parameters s3_file = 's3://bucket-name/path/to/file/titanic.csv' # To import an example file. df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv') ##### To write file ##### df.to_csv(s3_file) ##### To read file ##### df_s3 = pd.read_csv(s3_file)