perf: Add index to sync status #2337

Merged
iuri-gg merged 3 commits from add-index-to-sync-status into main 2025-06-09 22:18:52 +08:00
iuri-gg commented 2025-06-05 11:29:16 +08:00 (Migrated from github.com)

Some sync DB queries can take a couple of seconds. This PR adds an index on the status column to speed up the lookup for syncing? check on PlaidItem.

Query Plans

Before

EXPLAIN SELECT "syncs".* FROM "syncs" LEFT JOIN accounts a ON a.id = syncs.syncable_id AND syncs.syncable_type = 'Account' LEFT JOIN plaid_accounts pa ON pa.id = a.plaid_account_id WHERE (syncs.syncable_id = $1 OR pa.plaid_item_id = $2) AND (syncs.status IN ($3, $4)) AND (syncs.created_at > $5) [[nil, "6c101272-584b-49a2-8f36-436479aa1fae"], [nil, "85e89e18-85fc-4c04-b0a9-0f6e1019a916"], [nil, "pending"], [nil, "syncing"], [nil, "2025-06-05 03:10:16.460321"]]
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.29..23.19 rows=1 width=232)
   Join Filter: ((syncs.syncable_type)::text = 'Account'::text)
   Filter: ((syncs.syncable_id = '6c101272-584b-49a2-8f36-436479aa1fae'::uuid) OR (pa.plaid_item_id = '85e89e18-85fc-4c04-b0a9-0f6e1019a916'::uuid))
   ->  Seq Scan on syncs  (cost=0.00..14.65 rows=1 width=232)
         Filter: (((status)::text = ANY ('{pending,syncing}'::text[])) AND (created_at > '2025-06-05 03:10:16.460321'::timestamp without time zone))
   ->  Nested Loop Left Join  (cost=0.29..8.53 rows=1 width=32)
         ->  Index Scan using accounts_pkey on accounts a  (cost=0.14..8.16 rows=1 width=32)
               Index Cond: (id = syncs.syncable_id)
         ->  Index Scan using plaid_accounts_pkey on plaid_accounts pa  (cost=0.14..0.36 rows=1 width=32)
               Index Cond: (id = a.plaid_account_id)

After

EXPLAIN SELECT "syncs".* FROM "syncs" LEFT JOIN accounts a ON a.id = syncs.syncable_id AND syncs.syncable_type = 'Account' LEFT JOIN plaid_accounts pa ON pa.id = a.plaid_account_id WHERE (syncs.syncable_id = $1 OR pa.plaid_item_id = $2) AND (syncs.status IN ($3, $4)) AND (syncs.created_at > $5) [[nil, "b13c09ee-c139-4063-864c-f7f417e34200"], [nil, "f4247403-1cf4-4c6b-8dda-a62f29dd3727"], [nil, "pending"], [nil, "syncing"], [nil, "2025-06-05 03:13:27.973903"]]
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=4.46..19.83 rows=1 width=232)
   Join Filter: ((syncs.syncable_type)::text = 'Account'::text)
   Filter: ((syncs.syncable_id = 'b13c09ee-c139-4063-864c-f7f417e34200'::uuid) OR (pa.plaid_item_id = 'f4247403-1cf4-4c6b-8dda-a62f29dd3727'::uuid))
   ->  Bitmap Heap Scan on syncs  (cost=4.17..11.29 rows=1 width=232)
         Recheck Cond: ((status)::text = ANY ('{pending,syncing}'::text[]))
         Filter: (created_at > '2025-06-05 03:13:27.973903'::timestamp without time zone)
         ->  Bitmap Index Scan on index_syncs_on_status  (cost=0.00..4.17 rows=3 width=0)
               Index Cond: ((status)::text = ANY ('{pending,syncing}'::text[]))
   ->  Nested Loop Left Join  (cost=0.29..8.53 rows=1 width=32)
         ->  Index Scan using accounts_pkey on accounts a  (cost=0.14..8.16 rows=1 width=32)
               Index Cond: (id = syncs.syncable_id)
         ->  Index Scan using plaid_accounts_pkey on plaid_accounts pa  (cost=0.14..0.36 rows=1 width=32)
               Index Cond: (id = a.plaid_account_id)

Bitmap Heap Scan is not ideal, but should be better than a sequential scan.

[Some sync DB queries can take a couple of seconds](https://oss.skylight.io/app/applications/XDpPIXEX52oi/recent/6h/endpoints/TransactionsController%23index?responseType=html). This PR adds an index on the status column to speed up the lookup for `syncing?` check on PlaidItem. ### Query Plans #### Before ``` EXPLAIN SELECT "syncs".* FROM "syncs" LEFT JOIN accounts a ON a.id = syncs.syncable_id AND syncs.syncable_type = 'Account' LEFT JOIN plaid_accounts pa ON pa.id = a.plaid_account_id WHERE (syncs.syncable_id = $1 OR pa.plaid_item_id = $2) AND (syncs.status IN ($3, $4)) AND (syncs.created_at > $5) [[nil, "6c101272-584b-49a2-8f36-436479aa1fae"], [nil, "85e89e18-85fc-4c04-b0a9-0f6e1019a916"], [nil, "pending"], [nil, "syncing"], [nil, "2025-06-05 03:10:16.460321"]] QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.29..23.19 rows=1 width=232) Join Filter: ((syncs.syncable_type)::text = 'Account'::text) Filter: ((syncs.syncable_id = '6c101272-584b-49a2-8f36-436479aa1fae'::uuid) OR (pa.plaid_item_id = '85e89e18-85fc-4c04-b0a9-0f6e1019a916'::uuid)) -> Seq Scan on syncs (cost=0.00..14.65 rows=1 width=232) Filter: (((status)::text = ANY ('{pending,syncing}'::text[])) AND (created_at > '2025-06-05 03:10:16.460321'::timestamp without time zone)) -> Nested Loop Left Join (cost=0.29..8.53 rows=1 width=32) -> Index Scan using accounts_pkey on accounts a (cost=0.14..8.16 rows=1 width=32) Index Cond: (id = syncs.syncable_id) -> Index Scan using plaid_accounts_pkey on plaid_accounts pa (cost=0.14..0.36 rows=1 width=32) Index Cond: (id = a.plaid_account_id) ``` #### After ``` EXPLAIN SELECT "syncs".* FROM "syncs" LEFT JOIN accounts a ON a.id = syncs.syncable_id AND syncs.syncable_type = 'Account' LEFT JOIN plaid_accounts pa ON pa.id = a.plaid_account_id WHERE (syncs.syncable_id = $1 OR pa.plaid_item_id = $2) AND (syncs.status IN ($3, $4)) AND (syncs.created_at > $5) [[nil, "b13c09ee-c139-4063-864c-f7f417e34200"], [nil, "f4247403-1cf4-4c6b-8dda-a62f29dd3727"], [nil, "pending"], [nil, "syncing"], [nil, "2025-06-05 03:13:27.973903"]] QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=4.46..19.83 rows=1 width=232) Join Filter: ((syncs.syncable_type)::text = 'Account'::text) Filter: ((syncs.syncable_id = 'b13c09ee-c139-4063-864c-f7f417e34200'::uuid) OR (pa.plaid_item_id = 'f4247403-1cf4-4c6b-8dda-a62f29dd3727'::uuid)) -> Bitmap Heap Scan on syncs (cost=4.17..11.29 rows=1 width=232) Recheck Cond: ((status)::text = ANY ('{pending,syncing}'::text[])) Filter: (created_at > '2025-06-05 03:13:27.973903'::timestamp without time zone) -> Bitmap Index Scan on index_syncs_on_status (cost=0.00..4.17 rows=3 width=0) Index Cond: ((status)::text = ANY ('{pending,syncing}'::text[])) -> Nested Loop Left Join (cost=0.29..8.53 rows=1 width=32) -> Index Scan using accounts_pkey on accounts a (cost=0.14..8.16 rows=1 width=32) Index Cond: (id = syncs.syncable_id) -> Index Scan using plaid_accounts_pkey on plaid_accounts pa (cost=0.14..0.36 rows=1 width=32) Index Cond: (id = a.plaid_account_id) ``` `Bitmap Heap Scan` is not ideal, but should be better than a sequential scan.
zachgoll (Migrated from github.com) approved these changes 2025-06-09 22:03:56 +08:00
zachgoll (Migrated from github.com) left a comment

Good idea. Will merge when tests pass!

Good idea. Will merge when tests pass!
Sign in to join this conversation.