Implement swappable provider infrastructure #561
Reference in New Issue
Block a user
Delete Branch "jose/providers"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is a reference PR, superseded by https://github.com/maybe-finance/maybe/pull/574
It is meant as an exploration to validate how well a proposed solution might scale.
It might also serve as reference when we decide to implement providers other than Synth.
The superseding PR should be preferred with regards to the small differences between this PR and that one.
Original PR message:
This PR serves as a proposal for a generic data provider system that allows for swapping interchangeable providers.
It shouldn't be merged as-is; as it implements merchant and real estate valuation providers in an effort to show how this solution scales.
Now, on to the PR...
Mental model
Providers are implemented as a
Provider::<Name>class. The following providers exist in this PR:Provider::Base,Provider::Synth,Provider::Zillow,Provider::Null.Provider::Baseshould never be instantiated. It is meant to hold the (currently small) amount of shared concepts between providers.Provider::Nullis a default provider for use when Maybe doesn't have an opinion on which provider you should use and the user hasn't configured a provider. It's designed to blow up and communicate the service degradation to the user. I think this is better than implementing e.g.Provider::Local, since I don't think it's viable to mimic provider functionality offline.Each provider class should be thought of as a world separate from Maybe. This allows them to evolve with the external platform without having to worry about how that affects our app. On that same note, I don't recommend sharing much code between providers. Such that they can evolve on their own timeline without affecting our app's internals.
Http calls made to external servers live inside these clases. Domain-agnostic objects are built and returned when the http calls conclude.
Domain-aware objects (i.e. the regular Maybe models) use these provider classes, interpret their responses, and turn them into Maybe domain objects like
Transaction,Valuation, orMerchant.How it works
The chosen providers and their API keys are configured as environment variables, which are read from
data_providers.yml. Implementing users only need to change their env vars to change providers.Models which need provided data include the
Providableconcern — which knows whichProvider::<Name>class to instantiate for each concept depending ondata_providers.yml.Finally,
Provider::<Name>classes are instantiated and called wherever needed (class methods, instance methods, scripts, etc) — and their response objects are used and interpreted as necessary.Enforcing a contract between providers
As we add more providers, it becomes hard to test all possible combinations. If a common provider's interface drifts away from a less common provider's — it can be weeks until we get a report from users saying their setups stopped working.
This is why this PR implements interface testing. By including an interface test in
Provider::<Name>Test, you can guarantee that the provider returns an object of the expected shape. If it stops doing so, the test will fail — alerting the developer that their changes either need to be backwards-compatible, or they need to make the change for all providers of the same concept.Before we merge
The first step is deciding whether this is the path we want to go down. If it is, the next step is reverting the illustrative aspects of this PR — the bits that have to do with merchants and real estate valuations. Finally, I'd suggest merging the remaining code in a new PR, post-review. EDIT: This is what we've decided to do.
This is ready for specific feedback.
@@ -0,0 +1,4 @@class MerchantAdded so tests don't fail. This would be an ActiveRecord in practice.
@@ -0,0 +3,4 @@class Provider::NullTest < ActiveSupport::TestCaseinclude ExchangeRateProviderInterfaceTestinclude MerchantDataProviderInterfaceTestinclude RealEstateValuationsProviderInterfaceTestAs much as the dynamically typed nature of ruby gains us, this is one area where type safety would've come in handy. These tests are a good middle ground I think.
I usually wouldn't implement them. But the project being open-source means many hands with varying degrees of familiarity will be touching this system. And I don't think providers will be added often after an initial flurry. So it's easy to forget things.
Haven't given any thought to how rate limiting comes into play. I think
Provided::<Concept>classes can be aware of how many requests have been performed viaProvidableand modify their actions accordingly.@@ -0,0 +3,4 @@class Provider::NullTest < ActiveSupport::TestCaseinclude ExchangeRateProviderInterfaceTestinclude MerchantDataProviderInterfaceTestinclude RealEstateValuationsProviderInterfaceTestHave you played around much with the (newish) type systems in Ruby?
Hmm. If we unroll this -
So all this... looks like it evaluates to
@provider = Providers::Synth.new?Update:
I would recommend something like:
You can then eliminate a few layers of abstraction.
edit (complete comment rewrite): goddamnit nerd sniped myself on this :)
Okay. TL;DR, I think your approach here is overcomplicated things. Seems like you've got a few goals/needs/etc:
Here's some quick drafts:
Right? One place with a very clear link between configuration value and results. Doesn't worry about how each provider is implemented, all it does it choose which one is in use.
Right? The provider is fully and solely responsible for knowing how to instantiate itself... but we can also override that behavior, if we want to, so we can have different instances if we find ourselves wanting to. We can also pull out anything common to every request; this is just one of many ways to do that (and, honestly, one I'd like to explore further).
Now, I'm also getting overly fancy with the
yield, and that might not work out, but it's fun for now so eh :)For errors and retry logic, Faraday has a bunch of things you can use :)
So now we get extremely concise API wrappers:
But, this on it's own doesn't accomplish your #2. We can add a bit:
So now we've got parts. One part that knows how to get the data from Synth, and one part that knows how to turn the data we get from Synth into the data we want to use, and one that sticks them together. Plus, each part is testable independent of each other part, and, we've got a couple different levels we can
PS -
...is valid code; it's the "pass-through" language feature :)Final note - I really don't mean to be "here's exactly how to do it", I mean it much more like "here's some inspiration for some very simple ways to approach this". The
yieldmight not work out at all, andto_exchange_ratestuff might get wordy and overly boiler-plate. But, I think all this avoids locking future-you into any decisions but the critical ones (how it gets used). On the implementation side of things, all of this should be really flexible and easy to change.@@ -4,0 +15,4 @@config.filter_sensitive_data("<EXCHANGE_RATES_API_KEY>") { ENV["EXCHANGE_RATES_API_KEY"] }config.filter_sensitive_data("<MERCHANT_DATA_API_KEY>") { ENV["MERCHANT_DATA_API_KEY"] }config.filter_sensitive_data("<REAL_ESTATE_VALUATIONS_API_KEY>") { ENV["REAL_ESTATE_VALUATIONS_API_KEY"] }endVCR is amazing :) I've got a couple of utility functions you might find handy, let me get back to you tomorrow...
@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>I think I'd recommend something like this, instead:
Mostly, this lets each provider define it's own instantiation, but still lets your user define which provider they want to use.
For example, I think it's Google Cloud whose library wants a JSON file of credentials, instead of an API key.
Like - hmm. Your abstraction isn't "all providers are the same", it's, "all providers can produce the same info". So you don't need the configuration of each provider to be identical.
@@ -0,0 +1,19 @@module RetryableFaraday has a pretty sophisticated retry package: https://github.com/lostisland/faraday-retry
@@ -0,0 +1,83 @@class Provider::Synthhmm I think you can have som fun like:
That eliminates the two outermost layers of abstraction only, which are the thinnest wrappers.
Those were implemented on purpose. I think it's better for the caller to only know the name of a method rather than the name of a module, a method, and a slightly magical symbol argument.
Personally not worried about those two thin wrappers. They're meant to provide a purposeful public API.
EDIT: to be clear, I mean
Providers.for(:exchange_rates)would have to implement some logic equivalent to the innermost layers. Even if not identical to the current proposal, they'd be equivalent.@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>We don't need the configuration of each provider to be identical. But it's rather nice that it is.
I can see some value in adding the provider names as keys here, instead of generic "CONCEPT_PROVIDER_API_KEY". Kind of annoying that the example .env file would get bigger with each provider.
I originally had even removed SYNTH_API_KEY, but left it in as a fallback because CI would break otherwise. I think if we want to remove the synth one we'd have to add the others as secrets to GH cc @zachgoll up to you!
@@ -0,0 +1,19 @@module RetryableAh didn't know about this. I could go either way. This is a pretty simple implementation.
Unless that gem can effectively handle rate limiting out of the box the way we need it. In that case I'd probably go with the gem.
@@ -0,0 +1,83 @@class Provider::SynthYeah I've done this kinda thing in HTTP adapters before. It's dry-er. But I always ended up regretting it. It's easier to come back to all the request details being colocated.
@@ -0,0 +1,83 @@class Provider::SynthUntil they get out of hand that is. This kind of cleanup is always a small effort away if needed.
Thanks for your comments @rangerscience! Responded to some of them.
This proposal prioritizes clear extension points, colocation of behavior, and minimal configuration.
I can appreciate the desire for less layers. But, as presented, I think that would be to the detriment of the colocation and clear delineation of responsibilities.
Less code is nice. But a clear mental model is nicer.
Debatable whether this mental model is actually clear. It is to me. But happy to have the maintainers disagree. All good!
Finally, I don't think what you're proposing paints a full picture. It doesn't consider implications of instantiating providers for more than one concept. And the obviation of the
Provided::<Concept>classes will lead to each provider implementing the same logic to instantiate models outside their domain. There might be something to not needing to pass the API key to the provider class, though.@@ -0,0 +3,4 @@class Provider::NullTest < ActiveSupport::TestCaseinclude ExchangeRateProviderInterfaceTestinclude MerchantDataProviderInterfaceTestinclude RealEstateValuationsProviderInterfaceTestI haven't! Considered those for this. But I wouldn't recommend we adopt them for the whole codebase. Seems worse to use them for a subset of the app than to implement these tests which are a simple
includeaway.@zachgoll one thing we could do to avoid premature abstraction (which depending on internal conversations I think this PR might do) is remove everything swapping-related and just always instantiate
Provider::SynthwithinProvidable. I'd also remove the interface tests in that case. Would be left with a pretty lean PR and we can keep this as reference for when we do want to add other providers.@josefarias thanks for taking a pass at this! I've left a few comments and overall thoughts below.
Overall thoughts
I'm good with the overall direction here. I like how this optimizes for clear public APIs and easy provider configuration for self-hosting. Obviously lots of evolutions to go through with this as we fill in the implementation, but at the very least, this forces us to think proactively about when and how we're introducing new providers and implements a clear boundary between the Maybe domain and external world.
RE: Premature abstraction
While I agree it's a bit smelly, I do think this is a rare case where we already know the importance/necessity of data providers from Maybe 1.0 which makes this one worth thinking through earlier than later. And if we end up evolving or completely changing this down the road, totally fine with that too.
That said, I do really like your idea of instantiating
Provider::Synthdirectly withinProvidableand leaving this PR as the "guide" for future extension. I think we could:ProvidableUltimately, I think what you have here is a good long term direction that we'll be able to make even clearer as we start implementing real use-cases.
Would you agree with that approach?
@@ -1,42 +1,22 @@class ExchangeRate < ApplicationRecordinclude ProvidedOpen to opinions on this, but I wonder if we should be exposing both a
find_rateandfind_rate_or_fetchso that clients can "opt-in" to certain fallback behavior.I've been going back and forth on this, but starting to lean towards treating provider data as a "Write only" concern. In other words, methods like
Money.exchange_toshould either find something in the database or returnniland we leave the responsibility of writing exchange rates solely to the account sync process.In other words, the app works without any provider data and we know that operations like
Money.exchange_toare "free" (of both side-effects and API usage).Any thoughts/opinions on that?
Yeah I think we're fine to revisit rate-limiting later down the road.
I'd probably lean towards optimizing for a clearer public API with a few extra layers of abstraction at the
Providablelayer as @josefarias has implemented here.Providableshouldn't change often and even with a large number of providers I think the path to extension is pretty clear and trivial.I see for each of the
Provided::classes, we're implementing a singlefetchmethod. As we think about extending these to fetch several types of data, are you thinking these classes will adopt additional public methods? Or are you thinking for each new type of data, we introduce a brand newProvided::class with a singlefetch?For example, I'm guessing if we wanted to implement a method to fetch a historical series of exchange rates from the provider, we'd just throw a
fetch_historical(from:, to:, date_range:)directly in this class?@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>I'd probably vote for explicitness over brevity. I think a longer
.env.examplefile is totally fine as long as we have good defaults.I think the app should work without any of these environment variables, and then as the self-hoster adds in providers, the app functionality is enhanced.
We'll have several flows within the app UI that alert the user that different providers are missing and explains how it is affecting their experience. For example, they might see an alert that says:
@@ -0,0 +1,19 @@module RetryableI think this is simple enough to start. Would rather try our own implementation first and resort to gems only if 100% necessary.
@@ -0,0 +1,19 @@module RetryableThe less you do, the less you have to maintain :)
@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>Right. So, don't lock yourself into requiring them to be identical - then you can have them identical anyway, but the ones that don't fit aren't forced to conform.
@@ -0,0 +1,83 @@class Provider::SynthFair! I would at least reduce the boiler plate with:
Just makes it a smidge easier to modify (and read, I'd say).
@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>I don't think we need to solve for this right now. Any non-standard behavior can be addressed in the concrete provider initializer. Also not sure data providers would require much more than an API key usually — I don't anticipate Google Cloud or similar being a data provider in this capacity.
Might push up a commit where the provider classes know about their provider-specific env vars. Would certainly be nice to be able to instantiate them without further config. Will revisit after work.
Heh :) I mean, I nerd sniped myself at 3:00 AM, so yeah, doesn't paint a full picture :)
I think the most critical bit I'm adding is pulling apart the code that chooses which provider gets used, from the code that defines those providers.
I don't think I follow? If you have more concepts, you just have more
Providers.conceptinitializers. If you have more than one provider for a given concept, you just wire them into those initializers; and, if you've got common code, pull it out into (as you propose)Provider::Conceptbase classes (or any of the other ways to handle that in Ruby). There just isn't any common to pull out yet.I'd toss in: What happens when you have a provider that provides a unique mix of multiple concepts? Like, if Zillow does both exchange rates and housing prices, while Wall Street Bets does exchange rates and stock prices?
Don't need it yet, and you add it when you've got common code to start pulling out. If you want to do something resembling interface definitions, I'd do it in test code rather than inheritance stuff - Ruby's a duck-typed language, so it... "fits" a bit better to write up tests that say "in order to be an X, it has to pass these tests", than to define and interface you inherit from.
Yah! I mean, for most things that's what you want, but, how many times have you run across an API that tries to get weird with it (looking at you, GCP, with your credentials file)?
We're on the same page. I meant your proposal didn't include that code where you add more concepts, ultimately calling for an abstraction. I just went directly to that abstraction because this PR is about sketching out a multi-concept, multi-provider, scaled, solution. That's what was asked for per the learnings in Maybe 1.0. This is not meant to be merged as-is — as articulated in the PR description.
I think that's very reasonably addressed in this PR. Synth, Null, and Zillow provide a mix of multiple concepts. Some shared, some not.
Agreed. It's done in test code in this PR.
@zachgoll thanks for the comments! I'll look into those after work. That plan sounds perfect. I'll push up some final changes and post some notes for future consideration — will then close this and implement a pared down, only-what-we-need-right-now version in a separate PR as described in our comments above. Which I think is what @rangerscience is advocating for.
Pleasure discussing all this with y'all, and thanks for my first real experience participating in open source :)
I might chuck in a link to these PRs in code comments, so that it's easy for someone to find all this additional context :)
You know? Come to think of it, I think this abstraction might be anemic.
I think we can just call the provider and instantiate a record from within the model. Would probably live within its own concern.
This would happen inside
ExchangeRate.fetch_from_providerandExchangeRate.fetch_historical_from_providermethods.I'll spike on it tonight and see where it gets us. Will probably move the retryable stuff to the provider instead.
@@ -1,42 +1,22 @@class ExchangeRate < ApplicationRecordinclude ProvidedYes. I think that makes perfect sense. Definitely smelly that we'd write to the database when fetching something. And if done in an HTTP GET request, that's even worse! Great callout.
@@ -0,0 +7,4 @@key: <%= ENV.fetch "MERCHANT_DATA_API_KEY", ENV["SYNTH_API_KEY"] %>reale_state_valuations:provider: <%= ENV.fetch "REAL_ESTATE_VALUATIONS_PROVIDER", "null" %>key: <%= ENV["REAL_ESTATE_VALUATIONS_API_KEY"] %>ef7ced2e4c@@ -0,0 +1,83 @@class Provider::SynthMoved some things around in
c49782ebfaand these classes ended up arguably messier.I think that's fine. It's all self contained and easily refactored as patterns emerge within each provider.
Gut tells me to avoid sharing too much code between providers. Yes, they're all providers in the realm of Maybe. But they're each their own thing and have their own lifecycles. Probably best not to tie them together?
c49782ebfaI've moved some things around. Left a summary in the commit message but re-sharing here for convenience:
Will update the PR description tomorrow to match.
Ran out of time today to cherry pick just the parts we need. I'll do that tomorrow. Of course, the Maybe team should feel free to do it themselves if they're in a hurry.
Gotta say I really like where we're at with this. And I like the approach we took: take the learnings from a previously live product, validate the approach by scaling it artificially, implement a pared down version of the validated approach. Let's hope it pans out in the long run as well!
Closing before cherry-picking the immediately relevant bits in a separate PR.
This can be used as reference when we introduce new concepts and providers. Judgement should be used considering this PR was based on assumptions about how non-exchange rate concepts would work.
@josefarias awesome, I agree that this is at a good spot. The mental model is pretty clear and intuitive to me so hopefully that holds true for future contributors to the project as well and gives us a good launch point. Will go check out the new PR when it's up!
New PR is up https://github.com/maybe-finance/maybe/pull/574 @zachgoll
Pull request closed