Refactor transaction enrichment to support batch processing #1803

Merged
Shpigford merged 7 commits from data-enricher-optimization into main 2025-02-06 00:34:28 +08:00
Shpigford commented 2025-02-05 04:30:34 +08:00 (Migrated from github.com)
No description provided.
zachgoll (Migrated from github.com) reviewed 2025-02-05 05:00:18 +08:00
@@ -8,49 +8,61 @@ class Account::DataEnricher
end
zachgoll (Migrated from github.com) commented 2025-02-05 04:50:39 +08:00

Both merchant_id and category_id are stored on the Account::Transaction record, while enriched_at is stored on Account::Entry (delegated type).

Given we're primarily enriching Account::Transaction records (and not other entry types like Account::Valuation and Account::Trade), I think it probably makes sense to move the enriched_at field down to the Account::Transaction and only deal with those types of records in this process.

Both `merchant_id` and `category_id` are stored on the `Account::Transaction` record, while `enriched_at` is stored on `Account::Entry` (delegated type). Given we're primarily enriching `Account::Transaction` records (and not other entry types like `Account::Valuation` and `Account::Trade`), I think it probably makes sense to _move_ the `enriched_at` field down to the `Account::Transaction` and only deal with those types of records in this process.
zachgoll (Migrated from github.com) commented 2025-02-05 04:53:59 +08:00

Since this was originally written, we've added null validations to ensure name is present, so this can safely be removed now.

Since this was originally written, we've added null validations to ensure `name` is present, so this can safely be removed now.
zachgoll (Migrated from github.com) reviewed 2025-02-05 05:10:27 +08:00
@@ -8,49 +8,61 @@ class Account::DataEnricher
end
zachgoll (Migrated from github.com) commented 2025-02-05 05:10:27 +08:00

Actually, second guessing the idea of moving enriched_at to Account::Transaction. This enrichment is also modifying enriched_name on the Account::Entry model, so it probably makes sense to keep this as-is.

Will still need to update this query to read merchant_id and category_id off the Account::Transaction record though.

Actually, second guessing the idea of moving `enriched_at` to `Account::Transaction`. This enrichment is also modifying `enriched_name` on the `Account::Entry` model, so it probably makes sense to keep this as-is. Will still need to update this query to read `merchant_id` and `category_id` off the `Account::Transaction` record though.
Shpigford (Migrated from github.com) reviewed 2025-02-05 09:31:52 +08:00
@@ -8,49 +8,61 @@ class Account::DataEnricher
end
Shpigford (Migrated from github.com) commented 2025-02-05 09:31:52 +08:00

@zachgoll Is the entryable join necessary? AI seems to think so.

@zachgoll Is the entryable join necessary? AI seems to think so.
zachgoll (Migrated from github.com) reviewed 2025-02-05 22:07:47 +08:00
@@ -8,49 +8,61 @@ class Account::DataEnricher
end
zachgoll (Migrated from github.com) commented 2025-02-05 22:07:46 +08:00

The join is necessary, but joins(:entryable) is not possible since this is a polymorphic association and will throw an error. Here's what I would use:

account.entries.joins("JOIN account_transactions at ON at.id = account_entries.entryable_id AND account_entries.entryable_type = 'Account::Transaction'").where("account_entries.enriched_at IS NULL OR at.category_id IS NULL
 OR at.merchant_id IS NULL")
The join is necessary, but `joins(:entryable)` is not possible since this is a polymorphic association and will throw an error. Here's what I would use: ```rb account.entries.joins("JOIN account_transactions at ON at.id = account_entries.entryable_id AND account_entries.entryable_type = 'Account::Transaction'").where("account_entries.enriched_at IS NULL OR at.category_id IS NULL OR at.merchant_id IS NULL") ```
Shpigford commented 2025-02-05 23:23:04 +08:00 (Migrated from github.com)

@zachgoll let me know how this looks now.

@zachgoll let me know how this looks now.
zachgoll (Migrated from github.com) reviewed 2025-02-05 23:53:41 +08:00
zachgoll (Migrated from github.com) left a comment

Yep, functionality looks good now!

The last thing I'm thinking about here is whether we even need EnrichDataJob anymore. All that job is doing now is:

  1. Calculating the number of batches
  2. Enqueueing the batch jobs

I'm thinking we could get rid of that entirely (the job itself) and do this work directly in account.enrich_data.

So in total, we'd:

  • DELETE account.enrich_data_later method
  • DELETE EnrichDataJob
Yep, functionality looks good now! The last thing I'm thinking about here is whether we even need `EnrichDataJob` anymore. All that job is doing now is: 1. Calculating the number of batches 2. Enqueueing the batch jobs I'm thinking we could get rid of that entirely (the job itself) and do this work directly in `account.enrich_data`. So in total, we'd: - DELETE `account.enrich_data_later` method - DELETE `EnrichDataJob`
Shpigford commented 2025-02-06 00:26:38 +08:00 (Migrated from github.com)

@zachgoll Good call. Updated. We good to push to production?

@zachgoll Good call. Updated. We good to push to production?
zachgoll (Migrated from github.com) approved these changes 2025-02-06 00:33:36 +08:00
zachgoll (Migrated from github.com) left a comment

Yep, looks good!

Yep, looks good!
Sign in to join this conversation.