Introduction

Multiwoven protocol defines a set of interfaces for building connectors. Connectors can be implemented independent of our server application, this protocol allows developers to create connectors without requiring in-depth knowledge of our core platform.

Concepts

Source - A source in business data storage typically refers to data warehouses like Snowflake, AWS Redshift and Google BigQuery, as well as databases.

Destination - A destination is a tool or third party service where source data is sent and utilised, often by end-users. It includes CRM systems, ad platforms, marketing automation, and support tools.

Stream - A Stream defines the structure and metadata of a resource, such as a database table, REST API resource, or data stream, outlining how users can interact with it using query or request.

Fields

FieldDescription
nameA string representing the name of the stream.
action (optional)Defines the action associated with the stream, e.g., “create”, “update”, or “delete”.
json_schemaA hash representing the JSON schema of the stream.
supported_sync_modes (optional)An array of supported synchronization modes for the stream.
source_defined_cursor (optional)A boolean indicating whether the source has defined a cursor for the stream.
default_cursor_field (optional)An array of strings representing the default cursor field(s) for the stream.
source_defined_primary_key (optional)An array of arrays of strings representing the source-defined primary key(s) for the stream.
namespace (optional)A string representing the namespace of the stream.
url (optional)A string representing the URL of the API stream.
request_method (optional)A string representing the request method (e.g., “GET”, “POST”) for the API stream.
batch_supportA boolean indicating whether the stream supports batching.
batch_sizeAn integer representing the batch size for the stream.
request_rate_limitAn integer value, specifying the maximum number of requests that can be made to the user data API within a given time limit unit.
request_rate_limit_unitA string value indicating the unit of time for the rate limit.
request_rate_concurrencyAn integer value which limits the number of concurrent requests.

Catalog - A Catalog is a collection of Streams detailing the data within a data store represented by a Source/Destination eg: Catalog = Schema, Streams = List[Tables] Fields

FieldDescription
streamsAn array of Streams detailing the data within the data store. This encapsulates various data streams available for synchronization or processing, each potentially with its own schema, sync modes, and other configurations.
request_rate_limitAn integer value, specifying the maximum number of requests that can be made to the user data API within a given time limit unit. This serves to prevent overloading the system by limiting the rate at which requests can be made.
request_rate_limit_unitA string value indicating the unit of time for the rate limit, such as “minute” or “second”. This defines the time window in which the request_rate_limit applies.
request_rate_concurrencyAn integer value which limits the number of concurrent requests that can be made. This is used to control the load on the system by restricting how many requests can be processed at the same time.
schema_mode A string value that identifies the schema handling mode for the connector. Supported values include static, dynamic, and schemaless. This parameter is crucial for determining how the connector handles data schema.

Rate limit specified in catalog will applied to stream if there is no stream specific rate limit is defined.

Model - Models specify the data to be extracted from a source Fields

  • name (optional): A string representing the name of the model.
  • query: A string representing the query used to extract data from the source.
  • query_type: A type representing the type of query used by the model.
  • primary_key: A string representing the primary key of the model.

Sync - A Sync sets the rules for data transfer from a chosen source to a destination

Fields

  • source: The source connector from which data is transferred.
  • destination: The destination connector where data is transferred.
  • model: The model specifying the data to be transferred.
  • stream: The stream defining the structure and metadata of the data to be transferred.
  • sync_mode: The synchronization mode determining how data is transferred.
  • cursor_field (optional): The field used as a cursor for incremental data transfer.
  • destination_sync_mode: The synchronization mode at the destination.

Interfaces

The output of each method in the interface is encapsulated in an MultiwovenMessage, serving as an envelope for the message’s return value. These are omitted in interface explanations for sake of simplicity.

Common

  1. connector_spec() -> ConnectorSpecification

Description - connector_spec returns information about how the connector can be configured

Input - None

Output - ConnectorSpecification -One of the main pieces of information the specification shares is what information is needed to configure an Actor.

  • documentation_url:
    URL providing information about the connector.

  • stream_type:
    The type of stream supported by the connector. Possible values include:

    • static: The connector catalog is static.
    • dynamic: The connector catalog is dynamic, which can be either schemaless or with a schema.
    • user_defined: The connector catalog is defined by the user.
  • connector_query_type:
    The type of query supported by the connector. Possible values include:

    • raw_sql: The connector is SQL-based.
    • soql: Specifically for Salesforce.
    • ai_ml: Specific for AI model source connectors.
  • connection_specification:
    The properties required to connect to the source or destination.

  • sync_mode:
    The synchronization modes supported by the connector.

  1. meta_data() -> Hash

Description - meta_data returns information about how the connector can be shown in the multiwoven ui eg: icon, labels etc.

Input - None

Output - Hash. Sample hash can be found here

  1. check_connection(connection_config) -> ConnectionStatus

Description: The check_connection method verifies if a given configuration allows successful connection and access to necessary resources for a source/destination, such as confirming Snowflake database connectivity with provided credentials. It returns a success response if successful or a failure response with an error message in case of issues like incorrect passwords

Input - Hash

Output - ConnectionStatus

  1. discover(connection_config) -> Catalog

Description: The discover method identifies and outlines the data structure in a source/destination. Eg: Given a valid configuration for a Snowflake source, the discover method returns a list of accessible tables, formatted as streams.

Input - Hash

Output - Catalog

Source

Source implements the following interface methods including the common methods.

connector_spec() -> ConnectorSpecification
meta_data() -> Hash
check_connection(connection_config) -> ConnectionStatus
discover(connection_config)  -> Catalog
read(SyncConfig) ->Array[RecordMessage]
  1. read(SyncConfig) ->Array[RecordMessage]

Description -The read method extracts data from a data store and outputs it as RecordMessages.

Input - SyncConfig

Output - List[RecordMessage]

Destination

Destination implements the following interface methods including the common methods.

connector_spec() -> ConnectorSpecification
meta_data() -> Hash
check_connection(connection_config) -> ConnectionStatus
discover(connection_config)  -> Catalog
write(SyncConfig,Array[records]) -> TrackingMessage
  1. write(SyncConfig,Array[records]) -> TrackingMessage

Description -The write method loads data to destinations.

Input - Array[Record]

Output - TrackingMessage

Note: Complete multiwoven protocol models can be found here

Acknowledgements

We’ve been significantly influenced by the Airbyte protocol, and their design choices greatly accelerated our project’s development.