S3
Connect AI Squared to S3
This page describes how to add AWS S3 as a source.
AI Squared lets you pull data from CSV and Parquet files stored in an Amazon S3 bucket and push them to downstream destinations. To get started, you need an S3 bucket and AWS credentials.
Connector Configuration and Credentials Guide
Prerequisites
Before proceeding, ensure you have the necessary information based on how you plan to authenticate to AWS. The two types of authentication we support are:
- IAM User with access id and secret access key.
- IAM Role with ARN configured with an external ID so that AI Square can connect to your S3 bucket.
Additional info you will need regardless of authentication type will be:
- Region
- Bucket name
- The type of file we are working with (CSV or Parquet)
- Path to the CSV or Parquet files
Setting Up AWS Requirements
Steps to Retrieve or Create an IAM Role User credentials
Steps to Retrieve or Create an IAM Role User credentials
Sign In
Log in to your AWS account at AWS Management Console.
Users
Navigate to the the Users. This can be found in the left navigation under “Access Management” -> “Users”.
Access/Secret Key
Once inside the Users page, Select the User you would like to authenticate with. If there are no users to select, create one and make sure to give it the required permissions to read from S3 buckets. If you haven’t created an access key pair before, click on “Create access key” to generate a new one. Make sure to copy the Secret Access Key as they are shown only once. After selecting the user, go to Security Credentials tab and in it you should be able to see the Access keys for that user.
Steps to Retrieve or Create an IAM Role ARN
Steps to Retrieve or Create an IAM Role ARN
Sign In
Log in to your AWS account at AWS Management Console.
External ID
The ARN is going to need an external ID which will be required during the configuration of the S3 source connector. The external ID will allow us to reach out to you S3 buckets and read data from it. You can generate an external Id using this GUID generator tool. Learn more about AWS STS external ID.
Roles
Navigate to the the Roles. This can be found in the left navigation under “Access Management” -> “Roles”.
Create or Select an existing role
Select an existing role to edit or create a new one by clicking on “Create Role”.
ARN Premissions Policy
The “Permissions Policy” should look something like this:
ARN Trust Relationship
The “Trust Relationship” should look something like this:
Step 2: Locate AWS S3 Configuration Details
Now you should be in the AWS and have found your credentials. Now we will navigate to the S3 service to find the necessary configuration details:
-
IAM User Access Key and Secret Access Key or IAM Role ARN and External ID:
- This has been gathered from the previous step.
-
Bucket:
- Once inside of the AWS S3 console you should be able to see the list of buckets available, if not go ahead and create a bucket by clicking on the “Create bucket” button.
-
Region:
- In the same list showing the buckets, there’s a region assotiated with it.
-
Path:
- The path where the file you wish to read from. This field is optional and can be left blank.
-
File type:
- The files within the path that was selected should help determine the file type.
Step 3: Configure S3 Connector in Your Application
Now that you have gathered all the necessary details enter the following information:
- Region: The AWS region where your S3 bucket resources are located.
- Access Key ID: Your AWS IAM user’s Access Key ID.
- Secret Access Key: The corresponding Secret Access Key.
- Bucket: The name of the bucket you want to use.
- Path: The path directory where the files are located at.
- File type: The type of file (csv, parquet).
Step 4: Test the S3 Connection
After configuring the connector in your application:
- Save the configuration settings.
- Test the connection to S3 from your application to ensure everything is set up correctly.
- Run a test query or check the connection status to verify successful connectivity.
Your S3 connector is now configured and ready to query data from your S3 data catalog.
Building a Model Query
The S3 source connector is powered by DuckDB S3 api support. This allows us to use SQL queries to describe and/or fetch data from an S3 bucket, for example:
From the example, we can notice some details that are required in order to perform the query:
- FROM command:
's3://my-bucket/path/to/file/file.parquet'
You need to provide a value in the same format as the example. - Bucket:
my-bucket
In that format you will need to provide the bucket name. The bucket name needs to be the same one provided when configuring the S3 source connector. - Path:
/path/to/file
In that format you will need to provide the path to the file. The path needs to be the same one provided when configuring the S3 source connector. - File name and type:
file.parquet
In that format you will need to provide the file name and type at the end of the path. The file type needs to be the same one provided when configuring the S3 source connector.
Supported sync modes
Mode | Supported (Yes/No/Coming soon) |
---|---|
Incremental sync | YES |
Full refresh | YES |