Skip to content

Added Feast example #222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 1, 2023
Merged

Added Feast example #222

merged 1 commit into from
Aug 1, 2023

Conversation

zilto
Copy link
Contributor

@zilto zilto commented Jul 14, 2023

Essentially, there is:

  • default_feature_store which contains an autogenerated "quickstart" repo from Feast when you call feast init
  • simple_feature_store reimplements all of the functionalities, but with Hamilton. The autogenerated DAG here make a huuuuge difference for the dev experience. Just by the DAG structure you can notice all of the things that need to be manually changed when you would edit the codebase.

In store_definitions.py, instead of using the Feast objects, I build an Hamilton DAG that defines these objects. At the end of the file, I explicit all the objects I want to collect with a @subdag. This object is read by apply() in store_operations.py

In store_operations.py, I essentially move all the methods from the FeatureStore class to functions.

The only friction point remaining is that the Feast CLI is built to collect objects of specific type in a Python file. However, store_definitions.py doesn't instantiate these objects for the CLI. I don't see a clean way to create these variables without manually writing function calls...

Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Will need some rationale for why this is better for the reader of this example in the README.

Otherwise I was thinking (which is more work) we could do something like:

Change this:

@on_demand_feature_view(
    sources=[driver_stats_fresh_fv, input_request],  # relies on fresh version of FV
    schema=[
        Field(name="conv_rate_plus_val1", dtype=Float64),
        Field(name="conv_rate_plus_val2", dtype=Float64),
    ],
)
def transformed_conv_rate_fresh(inputs: pd.DataFrame) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = inputs["conv_rate"] + inputs["val_to_add"]
    df["conv_rate_plus_val2"] = inputs["conv_rate"] + inputs["val_to_add_2"]
    return df

to this:

# ... I'm being lazy here, but writing the `FeatureView` `*Source` as functions?

@extract_columns("conv_rate_plus_val1", "conv_rate_plus_val2")
def transformed_conv_rate_fresh(conv_rate: pd.Series, val_to_add: pd.Series, val_to_add_2: pd.Series) -> pd.DataFrame:
    df = pd.DataFrame()
    df["conv_rate_plus_val1"] = conv_rate + val_to_add
    df["conv_rate_plus_val2"] = conv_rate + val_to_add_2
    return df

and have an adapter or something that essentially does what you're doing.

But yeah I haven't looked into the feast spec to realize how hard this could be. But yeah the weirdness of the API that does an implicit join -- e.g. transformed_conv_rate_fresh in the original takes in ony a dataframe and hides the join of the two sources -- might make what I'm proposing infeasible.

So if anything I'm saying what you have works (and is a valid way to use Hamilton), but I also wonder if we could reduce the boilerplate in general.


final_vars = [
"apply"
# "driver_activity_v1_fs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want people to be able to switch to the other sets of values by uncommenting them. I can making "uncommentable" from outside main() or define a small click CLI. In the notebook, it will be different cells

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, make a comment here then?

@skrawcz
Copy link
Contributor

skrawcz commented Jul 16, 2023

Otherwise, what I think what most people are doing now, is that they use Hamilton to create a table(s) of features, that is then pushed to feast, and manually creating the feast definitions. So to round out this example, it would just mean adding a diagram pointing out using Hamilton for that feature transform process and then pointing to some examples in this repo for that. Does that make sense?

@zilto
Copy link
Contributor Author

zilto commented Jul 16, 2023

Otherwise, what I think what most people are doing now, is that they use Hamilton to create a table(s) of features, that is then pushed to feast, and manually creating the feast definitions. So to round out this example, it would just mean adding a diagram pointing out using Hamilton for that feature transform process and then pointing to some examples in this repo for that. Does that make sense?

Yup that makes sense. My motivation for V1 was to create a Hamilton + Feast feature repository that maps 1-to-1 with the demo from Feast. It serves the goal of "improving Feast with Hamilton". Then, I was thinking of having a 3rd feature repository where I show a fuller example with the feature preprocessing steps as you described. This would be for the purpose of "integrate/extend Feast with Hamilton".

I realize that having a strong Feast integration has compounding benefit for our story about orchestration systems. It would be a huge value add to have your Airflow/Metaflow/Prefect/etc. task have Hamilton take care of feature loading

@skrawcz
Copy link
Contributor

skrawcz commented Jul 17, 2023

Otherwise, what I think what most people are doing now, is that they use Hamilton to create a table(s) of features, that is then pushed to feast, and manually creating the feast definitions. So to round out this example, it would just mean adding a diagram pointing out using Hamilton for that feature transform process and then pointing to some examples in this repo for that. Does that make sense?

Yup that makes sense. My motivation for V1 was to create a Hamilton + Feast feature repository that maps 1-to-1 with the demo from Feast. It serves the goal of "improving Feast with Hamilton". Then, I was thinking of having a 3rd feature repository where I show a fuller example with the feature preprocessing steps as you described. This would be for the purpose of "integrate/extend Feast with Hamilton".

I realize that having a strong Feast integration has compounding benefit for our story about orchestration systems. It would be a huge value add to have your Airflow/Metaflow/Prefect/etc. task have Hamilton take care of feature loading

Yep that works. We should just make sure the README makes the use of Hamilton here pretty clear :)

Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simple_feature_store and integration_feature_store should also have README.md files with anything relevant that wasn't already mentioned in the root README.md (it can reference to it).

Otherwise we should have you demo this perhaps?


final_vars = [
"apply"
# "driver_activity_v1_fs",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, make a comment here then?

@zilto zilto marked this pull request as ready for review July 31, 2023 17:32
@skrawcz skrawcz merged commit a480d54 into main Aug 1, 2023
@skrawcz skrawcz deleted the example/feast branch August 1, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants