WIP: perf(syncs): implement partial idempotent syncs interface #2309
Reference in New Issue
Block a user
Delete Branch "zachgoll/partial-syncs"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The current behavior for a "data sync" of a "Syncable" (
Family,PlaidItem,Account) is a "sync everything every time" model. This behavior was put in place early on to keep the data syncing process simple, predictable, and avoid the burden of keeping state to determine when to start each new successive data sync. This model has now reached its limit with the size of our user base and requires a move towards "partial data syncs". This PR implements logic to keep track of each syncable's last known "good data window" and uses that information to decide when to start subsequent sync operations.Syncables are idempotent
Each
Syncablerecord now has a date column calleddata_synced_through, which represents the latest date which we have synced data up to. This allows for a simpler model to compute partial syncs:data_synced_through)data_synced_throughso the "sync cache" is invalidated starting from the date in which the modification affects the series of syncable balances and stateSyncs still track the sync window
While
sync_laterno longer accepts start/end date arguments, theSyncrecord still captures thewindow_start_dateandwindow_end_dateof each sync.Furthermore, syncs can be triggered with
sync_later(clear_cache: true)to perform a "full sync" that ignores and resets thedata_synced_throughcolumn. These sorts of syncs should be used sparingly and only for data repair.Repairing data
If code logic changes that affects the calculation of historical balances and requires a full re-sync of all syncables in the DB, the easiest way to force full re-syncs is a quick
scope.update_all(data_synced_through: nil), which makes it so each syncable is forced to do a full sync next time.In the future, as sync logic becomes more stable, we may think about adding a
SYNC_VERSIONto handle this:Handling sync direction
Depending on the type of account, we may sync data in either the forward or backward direction. For example, Plaid connected accounts have a known, "source of truth" current balance that we start with, and work reverse chronologically from, while manual accounts start at a balance of 0 and work chronologically to the current date.
Regardless of sync direction,
data_synced_throughis a chronological indicator.So if current day is
2025-05-20...And
data_synced_throughis2025-05-15...2025-05-20and sync backwards to2025-05-152025-05-15and sync forwards to2025-05-20Pull request closed