Overview
Default Incremental Sync fetches all records from the source system and transfers only the new or updated ones to the destination. However, to optimize data transfer and reduce the number of duplicate fetches from the source, we implemented Incremental Sync with Cursor Field for those sources that support cursor fieldsCursor Field
A Cursor Field must be clearly defined within the dataset schema. It is identified based on its suitability for comparison and tracking changes over time.- It serves as a marker to identify modified or added records since the previous sync.
- It facilitates efficient data retrieval by enabling the source to resume from where it left off during the last sync.
Sync Run 1
During the first sync run with the cursor field ‘UpdatedAt’, suppose we have the following data: cursor field UpdatedAt value is 2024-04-20 10:00:00Name | Plan | Updated At |
---|---|---|
Charles Beaumont | free | 2024-04-20 10:00:00 |
Eleanor Villiers | free | 2024-04-20 11:00:00 |
Query
Sync Run 2
Now cursor field UpdatedAt value is 2024-04-20 11:00:00 Suppose after some time, there are further updates in the source data:Name | Plan | Updated At |
---|---|---|
Charles Beaumont | free | 2024-04-20 10:00:00 |
Eleanor Villiers | paid | 2024-04-21 10:00:00 |
Query
Sync Run 3
If there are additional updates in the source data: Now cursor field UpdatedAt value is 2024-04-21 10:00:00Name | Plan | Updated At |
---|---|---|
Charles Beaumont | paid | 2024-04-22 08:00:00 |
Eleanor Villiers | pro | 2024-04-22 09:00:00 |
Query
Handling Ambiguity and Inclusive Cursors
When syncing data incrementally, we ensure at least one delivery. Limited cursor field granularity may cause sources to resend previously sent data. For example, if a cursor only tracks dates, distinguishing new from old data on the same day becomes unclear.Scenario
Imagine sales transactions with a cursor fieldtransaction_date
. If we sync on April 1st and later sync on the same day, distinguishing new transactions becomes ambiguous. To mitigate this, we guarantee at least one delivery, allowing sources to resend data as needed.