Overview

Default Incremental Sync fetches all records from the source system and transfers only the new or updated ones to the destination. However, to optimize data transfer and reduce the number of duplicate fetches from the source, we implemented Incremental Sync with Cursor Field for those sources that support cursor fields

Cursor Field

A Cursor Field must be clearly defined within the dataset schema. It is identified based on its suitability for comparison and tracking changes over time.

  • It serves as a marker to identify modified or added records since the previous sync.
  • It facilitates efficient data retrieval by enabling the source to resume from where it left off during the last sync.

Note: Currently, only date fields are supported as Cursor Fields.

Sync Run 1

During the first sync run with the cursor field ‘UpdatedAt’, suppose we have the following data: cursor field UpdatedAt value is 2024-04-20 10:00:00

NamePlanUpdated At
Charles Beaumontfree2024-04-20 10:00:00
Eleanor Villiersfree2024-04-20 11:00:00

During this sync run, both Charles Beaumont’s and Eleanor Villiers’ records would meet the criteria since they both have an ‘UpdatedAt’ timestamp equal to ‘2024-04-20 10:00:00’ or later. So, during the first sync run, both records would indeed be considered and fetched.

Query
SELECT * FROM source_table
WHERE updated_at >= '2024-04-20 10:00:00';

Sync Run 2

Now cursor field UpdatedAt value is 2024-04-20 11:00:00 Suppose after some time, there are further updates in the source data:

NamePlanUpdated At
Charles Beaumontfree2024-04-20 10:00:00
Eleanor Villierspaid2024-04-21 10:00:00

During the second sync run with the same cursor field, only the records for Eleanor Villiers with ‘Updated At’ timestamp after the last sync would be fetched, ensuring minimal data transfer.

Query
SELECT * FROM source_table
WHERE updated_at >= '2024-04-20 11:00:00';

Sync Run 3

If there are additional updates in the source data: Now cursor field UpdatedAt value is 2024-04-21 10:00:00

NamePlanUpdated At
Charles Beaumontpaid2024-04-22 08:00:00
Eleanor Villierspro2024-04-22 09:00:00

During the third sync run with the same cursor field, only the records for Charles Beaumont and Eleanor Villiers with ‘Updated At’ timestamp after the last sync would be fetched, continuing the process of minimal data transfer.

Query
SELECT * FROM source_table
WHERE updated_at >= '2024-04-21 10:00:00 ';

Handling Ambiguity and Inclusive Cursors

When syncing data incrementally, we ensure at least one delivery. Limited cursor field granularity may cause sources to resend previously sent data. For example, if a cursor only tracks dates, distinguishing new from old data on the same day becomes unclear.

Scenario

Imagine sales transactions with a cursor field transaction_date. If we sync on April 1st and later sync on the same day, distinguishing new transactions becomes ambiguous. To mitigate this, we guarantee at least one delivery, allowing sources to resend data as needed.

Known Limitations

Modifications to underlying records without updating the cursor field may result in updated records not being picked up by the Incremental sync as expected.

Edit or remove of cursor field can mess up tracking data changes, causing issues and data loss. So Don’t change or remove the cursor field to keep sync smooth and reliable.

Was this page helpful?