S3 - AI Squared

Connect AI Squared to S3

This page describes how to add AWS S3 as a source. AI Squared lets you pull data from CSV and Parquet files stored in an Amazon S3 bucket and push them to downstream destinations. To get started, you need an S3 bucket and AWS credentials.

Connector Configuration and Credentials Guide

Prerequisites

Before proceeding, ensure you have the necessary information based on how you plan to authenticate to AWS. The two types of authentication we support are:

IAM User with access id and secret access key.
IAM Role with ARN configured with an external ID so that AI Square can connect to your S3 bucket.

Additional info you will need regardless of authentication type will be:

Region
Bucket name
The type of file we are working with (CSV or Parquet)
Path to the CSV or Parquet files

Setting Up AWS Requirements

Steps to Retrieve or Create an IAM Role User credentials

Steps to Retrieve or Create an IAM Role ARN

External ID

The ARN is going to need an external ID which will be required during the configuration of the S3 source connector. The external ID will allow us to reach out to you S3 buckets and read data from it. You can generate an external Id using this GUID generator tool. Learn more about AWS STS external ID.

Roles

Navigate to the the Roles. This can be found in the left navigation under “Access Management” -> “Roles”.

Create or Select an existing role

Select an existing role to edit or create a new one by clicking on “Create Role”.

ARN Premissions Policy

The “Permissions Policy” should look something like this:

{
    "Version": "2012-10-17",
    "Statement": [
       {
             "Sid": "VisualEditor0",
             "Effect": "Allow",
             "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion",
                "s3:ListBucket"
             ],
             "Resource": [
                "arn:aws:s3:::{your-bucket-name}",
                "arn:aws:s3:::{your-bucket-name}/*"
             ]
       }
    ]
 }

ARN Trust Relationship

The “Trust Relationship” should look something like this:

{
    "Version": "2012-10-17",
    "Statement": [
       {
             "Sid": "Statement1",
             "Effect": "Allow",
             "Principal": {
                "AWS": "{iam-user-principal-arn}"
             },
             "Action": "sts:AssumeRole",
             "Condition": {
                "StringEquals": {
                   "sts:ExternalId": "{generated-external-id}"
                }
             }
       }
    ]
 }

Step 2: Locate AWS S3 Configuration Details

Now you should be in the AWS and have found your credentials. Now we will navigate to the S3 service to find the necessary configuration details:

IAM User Access Key and Secret Access Key or IAM Role ARN and External ID:
- This has been gathered from the previous step.
Bucket:
- Once inside of the AWS S3 console you should be able to see the list of buckets available, if not go ahead and create a bucket by clicking on the “Create bucket” button.
Region:
- In the same list showing the buckets, there’s a region assotiated with it.
Path:
- The path where the file you wish to read from. This field is optional and can be left blank.
File type:
- The files within the path that was selected should help determine the file type.

Step 3: Configure S3 Connector in Your Application

Now that you have gathered all the necessary details enter the following information:

Region: The AWS region where your S3 bucket resources are located.
Access Key ID: Your AWS IAM user’s Access Key ID.
Secret Access Key: The corresponding Secret Access Key.
Bucket: The name of the bucket you want to use.
Path: The path directory where the files are located at.
File type: The type of file (csv, parquet).

Step 4: Test the S3 Connection

After configuring the connector in your application:

Save the configuration settings.
Test the connection to S3 from your application to ensure everything is set up correctly.
Run a test query or check the connection status to verify successful connectivity.

Your S3 connector is now configured and ready to query data from your S3 data catalog.

Building a Model Query

The S3 source connector is powered by DuckDB S3 api support. This allows us to use SQL queries to describe and/or fetch data from an S3 bucket, for example:

SELECT * FROM 's3://my-bucket/path/to/file/file.parquet';

From the example, we can notice some details that are required in order to perform the query:

FROM command: 's3://my-bucket/path/to/file/file.parquet' You need to provide a value in the same format as the example.
Bucket: my-bucket In that format you will need to provide the bucket name. The bucket name needs to be the same one provided when configuring the S3 source connector.
Path: /path/to/file In that format you will need to provide the path to the file. The path needs to be the same one provided when configuring the S3 source connector.
File name and type: file.parquet In that format you will need to provide the file name and type at the end of the path. The file type needs to be the same one provided when configuring the S3 source connector.

Supported sync modes

Mode	Supported (Yes/No/Coming soon)
Incremental sync	YES
Full refresh	YES

Sources

Destinations

Data Modelling

Data Syncs

S3

Connect AI Squared to S3

Connector Configuration and Credentials Guide

Prerequisites

Setting Up AWS Requirements

Step 2: Locate AWS S3 Configuration Details

Step 3: Configure S3 Connector in Your Application

Step 4: Test the S3 Connection

Building a Model Query

Supported sync modes

Sources

Destinations

Data Modelling

Data Syncs

​Connect AI Squared to S3

​Connector Configuration and Credentials Guide

​Prerequisites

​Setting Up AWS Requirements

​Step 2: Locate AWS S3 Configuration Details

​Step 3: Configure S3 Connector in Your Application

​Step 4: Test the S3 Connection

​Building a Model Query

​Supported sync modes

Connect AI Squared to S3

Connector Configuration and Credentials Guide

Prerequisites

Setting Up AWS Requirements

Step 2: Locate AWS S3 Configuration Details

Step 3: Configure S3 Connector in Your Application

Step 4: Test the S3 Connection

Building a Model Query

Supported sync modes