Multiwoven Protocol
Introduction
Multiwoven protocol defines a set of interfaces for building connectors. Connectors can be implemented independent of our server application, this protocol allows developers to create connectors without requiring in-depth knowledge of our core platform.
Concepts
Source - A source in business data storage typically refers to data warehouses like Snowflake, AWS Redshift and Google BigQuery, as well as databases.
Destination - A destination is a tool or third party service where source data is sent and utilised, often by end-users. It includes CRM systems, ad platforms, marketing automation, and support tools.
Stream - A Stream defines the structure and metadata of a resource, such as a database table, REST API resource, or data stream, outlining how users can interact with it using query or request.
Fields
Field | Description |
---|---|
name | A string representing the name of the stream. |
action (optional) | Defines the action associated with the stream, e.g., “create”, “update”, or “delete”. |
json_schema | A hash representing the JSON schema of the stream. |
supported_sync_modes (optional) | An array of supported synchronization modes for the stream. |
source_defined_cursor (optional) | A boolean indicating whether the source has defined a cursor for the stream. |
default_cursor_field (optional) | An array of strings representing the default cursor field(s) for the stream. |
source_defined_primary_key (optional) | An array of arrays of strings representing the source-defined primary key(s) for the stream. |
namespace (optional) | A string representing the namespace of the stream. |
url (optional) | A string representing the URL of the API stream. |
request_method (optional) | A string representing the request method (e.g., “GET”, “POST”) for the API stream. |
batch_support | A boolean indicating whether the stream supports batching. |
batch_size | An integer representing the batch size for the stream. |
request_rate_limit | An integer value, specifying the maximum number of requests that can be made to the user data API within a given time limit unit. |
request_rate_limit_unit | A string value indicating the unit of time for the rate limit. |
request_rate_concurrency | An integer value which limits the number of concurrent requests. |
Catalog - A Catalog is a collection of Streams detailing the data within a data store represented by a Source/Destination eg: Catalog = Schema, Streams = List[Tables] Fields
Field | Description |
---|---|
streams | An array of Streams detailing the data within the data store. This encapsulates various data streams available for synchronization or processing, each potentially with its own schema, sync modes, and other configurations. |
request_rate_limit | An integer value, specifying the maximum number of requests that can be made to the user data API within a given time limit unit. This serves to prevent overloading the system by limiting the rate at which requests can be made. |
request_rate_limit_unit | A string value indicating the unit of time for the rate limit, such as “minute” or “second”. This defines the time window in which the request_rate_limit applies. |
request_rate_concurrency | An integer value which limits the number of concurrent requests that can be made. This is used to control the load on the system by restricting how many requests can be processed at the same time. |
schema_mode | A string value that identifies the schema handling mode for the connector. Supported values include static, dynamic, and schemaless. This parameter is crucial for determining how the connector handles data schema. |
Rate limit specified in catalog will applied to stream if there is no stream specific rate limit is defined.
Model - Models specify the data to be extracted from a source Fields
name
(optional): A string representing the name of the model.query
: A string representing the query used to extract data from the source.query_type
: A type representing the type of query used by the model.primary_key
: A string representing the primary key of the model.
Sync - A Sync sets the rules for data transfer from a chosen source to a destination
Fields
source
: The source connector from which data is transferred.destination
: The destination connector where data is transferred.model
: The model specifying the data to be transferred.stream
: The stream defining the structure and metadata of the data to be transferred.sync_mode
: The synchronization mode determining how data is transferred.cursor_field
(optional): The field used as a cursor for incremental data transfer.destination_sync_mode
: The synchronization mode at the destination.
Interfaces
The output of each method in the interface is encapsulated in an MultiwovenMessage, serving as an envelope for the message’s return value. These are omitted in interface explanations for sake of simplicity.
Common
connector_spec() -> ConnectorSpecification
Description - connector_spec returns information about how the connector can be configured
Input - None
Output - ConnectorSpecification -One of the main pieces of information the specification shares is what information is needed to configure an Actor.
-
documentation_url
:
URL providing information about the connector. -
stream_type
:
The type of stream supported by the connector. Possible values include:static
: The connector catalog is static.dynamic
: The connector catalog is dynamic, which can be either schemaless or with a schema.user_defined
: The connector catalog is defined by the user.
-
connector_query_type
:
The type of query supported by the connector. Possible values include:raw_sql
: The connector is SQL-based.soql
: Specifically for Salesforce.ai_ml
: Specific for AI model source connectors.
-
connection_specification
:
The properties required to connect to the source or destination. -
sync_mode
:
The synchronization modes supported by the connector.
meta_data() -> Hash
Description - meta_data returns information about how the connector can be shown in the multiwoven ui eg: icon, labels etc.
Input - None
Output - Hash
. Sample hash can be found here
check_connection(connection_config) -> ConnectionStatus
Description: The check_connection method verifies if a given configuration allows successful connection and access to necessary resources for a source/destination, such as confirming Snowflake database connectivity with provided credentials. It returns a success response if successful or a failure response with an error message in case of issues like incorrect passwords
Input - Hash
Output - ConnectionStatus
discover(connection_config) -> Catalog
Description: The discover method identifies and outlines the data structure in a source/destination. Eg: Given a valid configuration for a Snowflake source, the discover method returns a list of accessible tables, formatted as streams.
Input - Hash
Output - Catalog
Source
Source implements the following interface methods including the common methods.
read(SyncConfig) ->Array[RecordMessage]
Description -The read method extracts data from a data store and outputs it as RecordMessages.
Input - SyncConfig
Output - List[RecordMessage]
Destination
Destination implements the following interface methods including the common methods.
write(SyncConfig,Array[records]) -> TrackingMessage
Description -The write method loads data to destinations.
Input - Array[Record]
Output - TrackingMessage
Note: Complete multiwoven protocol models can be found here
Acknowledgements
We’ve been significantly influenced by the Airbyte protocol, and their design choices greatly accelerated our project’s development.
Was this page helpful?