Clarify backend data pipeline naming concepts (importers, processors, materializers, calculators, and syncers) #2255

Merged
zachgoll merged 4 commits from zachgoll/clarify-sync-concepts into main 2025-05-18 04:37:16 +08:00
zachgoll commented 2025-05-18 01:41:07 +08:00 (Migrated from github.com)

The Maybe app heavily relies on data providers/aggregators, which includes, but is not limited to:

  • Fetching and caching (in DB) exchange rates and security prices
  • Fetching Plaid data, processing it, and storing as normalized model data
  • "Syncing" various entities (Family, PlaidItem, Account)

As this domain has continued to expand, the concepts and naming have started to become blurry. This PR is purely a renaming exercise to make the backend "ETL" processes clearer and easier to follow.

At time of writing, the list below represents the core concepts we're using; each with standardized naming conventions to express specific intents:

  • Importers - an "importer" is any class which fetches external data and caches/saves it with minimal transformation in our database.
    • Example: ExchangeRate::Importer
    • Fetch raw data from provider
    • Idempotent, no domain logic / rules
  • Processors - a "processor" is responsible for reading raw provider data and transforming that data into our internal domain models
    • Example: None yet, but we will eventually have something like PlaidAccount::Processor to turn a Plaid account into an internal Account entity
    • Idempotent, with business rules
  • Materializers - a "materializer" takes domain inputs (i.e. Transaction, Trade) and uses them to build a "materialized view" (application-level, not DB-level)
    • Example: Holding::Materializer generates daily holdings based on Trade entries of an Account
    • Idempotent, with business rules
    • Never touches provider data directly
  • Calculators - a "calculator" applies a specific algorithm to generate data; typically for a materializer
    • Example: Balance::ForwardCalculator generates a series of account balances by applying entries chronologically
    • A set of "pure functions"; does not fetch provider data and does not persist data
  • Syncers - a "syncer" is a process that operates on a Syncable and generates a Sync row in the database
    • Is a high-level orchestrator of importers, processors, materializers, calculators, etc.
    • Idempotent, retryable, auditable
    • Distinct lifecycle
The Maybe app heavily relies on data providers/aggregators, which includes, but is not limited to: - Fetching and caching (in DB) exchange rates and security prices - Fetching Plaid data, processing it, and storing as normalized model data - "Syncing" various entities (Family, PlaidItem, Account) As this domain has continued to expand, the concepts and naming have started to become blurry. This PR is purely a renaming exercise to make the backend "ETL" processes clearer and easier to follow. At time of writing, the list below represents the core concepts we're using; each with standardized naming conventions to express specific intents: - **Importers** - an "importer" is any class which fetches _external_ data and caches/saves it with minimal transformation in our database. - Example: `ExchangeRate::Importer` - Fetch raw data from provider - Idempotent, no domain logic / rules - **Processors** - a "processor" is responsible for reading raw provider data and transforming that data into our internal domain models - Example: None yet, but we will eventually have something like `PlaidAccount::Processor` to turn a Plaid account into an internal `Account` entity - Idempotent, with business rules - **Materializers** - a "materializer" takes domain inputs (i.e. `Transaction`, `Trade`) and uses them to build a "materialized view" (application-level, not DB-level) - Example: `Holding::Materializer` generates daily holdings based on `Trade` entries of an `Account` - Idempotent, with business rules - Never touches provider data directly - **Calculators** - a "calculator" applies a specific algorithm to generate data; typically for a materializer - Example: `Balance::ForwardCalculator` generates a series of account balances by applying entries chronologically - A set of "pure functions"; does not fetch provider data and does not persist data - **Syncers** - a "syncer" is a process that operates on a `Syncable` and generates a `Sync` row in the database - Is a high-level **orchestrator** of importers, processors, materializers, calculators, etc. - Idempotent, retryable, auditable - Distinct lifecycle
Sign in to join this conversation.