Connect AI Squared to S3

This page describes how to add AWS S3 as a source.

AI Squared lets you pull data from CSV and Parquet files stored in an Amazon S3 bucket and push them to downstream destinations. To get started, you need an S3 bucket and AWS credentials.

Connector Configuration and Credentials Guide

Prerequisites

Before proceeding, ensure you have the necessary information based on how you plan to authenticate to AWS. The two types of authentication we support are:

  • IAM User with access id and secret access key.
  • IAM Role with ARN configured with an external ID so that AI Square can connect to your S3 bucket.

Additional info you will need regardless of authentication type will be:

  • Region
  • Bucket name
  • The type of file we are working with (CSV or Parquet)
  • Path to the CSV or Parquet files

Setting Up AWS Requirements

Step 2: Locate AWS S3 Configuration Details

Now you should be in the AWS and have found your credentials. Now we will navigate to the S3 service to find the necessary configuration details:

  1. IAM User Access Key and Secret Access Key or IAM Role ARN and External ID:

    • This has been gathered from the previous step.
  2. Bucket:

    • Once inside of the AWS S3 console you should be able to see the list of buckets available, if not go ahead and create a bucket by clicking on the “Create bucket” button.
  3. Region:

    • In the same list showing the buckets, there’s a region assotiated with it.
  4. Path:

    • The path where the file you wish to read from. This field is optional and can be left blank.
  5. File type:

    • The files within the path that was selected should help determine the file type.

Step 3: Configure S3 Connector in Your Application

Now that you have gathered all the necessary details enter the following information:

  • Region: The AWS region where your S3 bucket resources are located.
  • Access Key ID: Your AWS IAM user’s Access Key ID.
  • Secret Access Key: The corresponding Secret Access Key.
  • Bucket: The name of the bucket you want to use.
  • Path: The path directory where the files are located at.
  • File type: The type of file (csv, parquet).

Step 4: Test the S3 Connection

After configuring the connector in your application:

  1. Save the configuration settings.
  2. Test the connection to S3 from your application to ensure everything is set up correctly.
  3. Run a test query or check the connection status to verify successful connectivity.

Your S3 connector is now configured and ready to query data from your S3 data catalog.

Building a Model Query

The S3 source connector is powered by DuckDB S3 api support. This allows us to use SQL queries to describe and/or fetch data from an S3 bucket, for example:

SELECT * FROM 's3://my-bucket/path/to/file/file.parquet';

From the example, we can notice some details that are required in order to perform the query:

  • FROM command: 's3://my-bucket/path/to/file/file.parquet' You need to provide a value in the same format as the example.
  • Bucket: my-bucket In that format you will need to provide the bucket name. The bucket name needs to be the same one provided when configuring the S3 source connector.
  • Path: /path/to/file In that format you will need to provide the path to the file. The path needs to be the same one provided when configuring the S3 source connector.
  • File name and type: file.parquet In that format you will need to provide the file name and type at the end of the path. The file type needs to be the same one provided when configuring the S3 source connector.

Supported sync modes

ModeSupported (Yes/No/Coming soon)
Incremental syncYES
Full refreshYES

Was this page helpful?