-
Notifications
You must be signed in to change notification settings - Fork 149
Added Feast example #222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Feast example #222
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Will need some rationale for why this is better for the reader of this example in the README.
Otherwise I was thinking (which is more work) we could do something like:
Change this:
@on_demand_feature_view(
sources=[driver_stats_fresh_fv, input_request], # relies on fresh version of FV
schema=[
Field(name="conv_rate_plus_val1", dtype=Float64),
Field(name="conv_rate_plus_val2", dtype=Float64),
],
)
def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
return df
to this:
# ... I'm being lazy here, but writing the `FeatureView` `*Source` as functions?
@extract_columns("conv_rate_plus_val1", "conv_rate_plus_val2")
def transformed_conv_rate_fresh(conv_rate: pd.Series, val_to_add: pd.Series, val_to_add_2: pd.Series) -> pd.DataFrame:
df = pd.DataFrame()
df["conv_rate_plus_val1"] = conv_rate + val_to_add
df["conv_rate_plus_val2"] = conv_rate + val_to_add_2
return df
and have an adapter or something that essentially does what you're doing.
But yeah I haven't looked into the feast spec to realize how hard this could be. But yeah the weirdness of the API that does an implicit join -- e.g. transformed_conv_rate_fresh
in the original takes in ony a dataframe and hides the join of the two sources -- might make what I'm proposing infeasible.
So if anything I'm saying what you have works (and is a valid way to use Hamilton), but I also wonder if we could reduce the boilerplate in general.
|
||
final_vars = [ | ||
"apply" | ||
# "driver_activity_v1_fs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want people to be able to switch to the other sets of values by uncommenting them. I can making "uncommentable" from outside main()
or define a small click CLI. In the notebook, it will be different cells
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, make a comment here then?
examples/feast/simple_feature_store/feature_repo/store_definitions.py
Outdated
Show resolved
Hide resolved
Otherwise, what I think what most people are doing now, is that they use Hamilton to create a table(s) of features, that is then pushed to feast, and manually creating the feast definitions. So to round out this example, it would just mean adding a diagram pointing out using Hamilton for that feature transform process and then pointing to some examples in this repo for that. Does that make sense? |
Yup that makes sense. My motivation for V1 was to create a Hamilton + Feast feature repository that maps 1-to-1 with the demo from Feast. It serves the goal of "improving Feast with Hamilton". Then, I was thinking of having a 3rd feature repository where I show a fuller example with the feature preprocessing steps as you described. This would be for the purpose of "integrate/extend Feast with Hamilton". I realize that having a strong Feast integration has compounding benefit for our story about orchestration systems. It would be a huge value add to have your Airflow/Metaflow/Prefect/etc. task have Hamilton take care of feature loading |
Yep that works. We should just make sure the README makes the use of Hamilton here pretty clear :) |
examples/feast/integration_feature_store/feature_repo/store_definitions.py
Show resolved
Hide resolved
examples/feast/integration_feature_store/feature_repo/store_definitions.py
Outdated
Show resolved
Hide resolved
examples/feast/integration_feature_store/feature_repo/feature_transformations.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simple_feature_store and integration_feature_store should also have README.md files with anything relevant that wasn't already mentioned in the root README.md (it can reference to it).
Otherwise we should have you demo this perhaps?
|
||
final_vars = [ | ||
"apply" | ||
# "driver_activity_v1_fs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, make a comment here then?
Essentially, there is:
default_feature_store
which contains an autogenerated "quickstart" repo from Feast when you callfeast init
simple_feature_store
reimplements all of the functionalities, but with Hamilton. The autogenerated DAG here make a huuuuge difference for the dev experience. Just by the DAG structure you can notice all of the things that need to be manually changed when you would edit the codebase.In
store_definitions.py
, instead of using the Feast objects, I build an Hamilton DAG that defines these objects. At the end of the file, I explicit all the objects I want to collect with a@subdag
. This object is read by apply() in store_operations.pyIn
store_operations.py
, I essentially move all the methods from theFeatureStore
class to functions.The only friction point remaining is that the Feast CLI is built to collect objects of specific type in a Python file. However,
store_definitions.py
doesn't instantiate these objects for the CLI. I don't see a clean way to create these variables without manually writing function calls...