-
Notifications
You must be signed in to change notification settings - Fork 70
Add user-provided documentation to metadata #973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
55c03ef
to
ac6d2c2
Compare
ac6d2c2
to
0fee623
Compare
0fee623
to
6738d60
Compare
Programmatic access to feature documentation is useful for integrating with systems aimed at ML explainability (e.g. [SHAP](https://shap.readthedocs.io/en/latest/)) and feature discovery (e.g. feature catalogs). | ||
|
||
Currently, it's possible to inspect Chronon definitions to determine _how_ a feature was computed, which is a type of documentation. However, there are other aspects of ML feature development that are not | ||
captured and documented within the feature definition itself, for example: context, domain-specific knowledge, assumptions, caveats. This CHIP aims to fill those gaps with user-provided documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a valid motivation!
```thrift | ||
struct Aggregation { | ||
... | ||
xx: optional MetaData metaData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metaData
has more fields that are irrelevant to the description itself.
And the original thought was to keep params non related to feature calculation in the metadata, therefore, when it coms to versioning check etc, we don't have to maintain a potential increasing list of fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metaData has more fields that are irrelevant to the description itself.
Yeah, this is something I noticed too, I added some comments about it in the section about alternatives. I think this could be solved with a separate, more lightweight and specific metadata object (perhaps FeatureMetadata
?). I'm not opposed to this.
And the original thought was to keep params non related to feature calculation in the metadata
Sorry, I don't fully understand -- description is not related to feature calculation, so it would belong in metadata, no? Or do you mean that more generally you'd prefer not adding fields to the metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
description is not related to feature calculation, so it would belong in metadata
Yes, correct.
So we can think of a way to add the feature description to the existing metadata, rather than adding a new metadata field or feature metadata in the aggregation itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like how custom tags are handled? Basically keeping the descriptions in Join
metadata through the custom JSON:
def Derivation(name, expression, description):
derivation = ttypes.Derivation(name, expression)
if description:
derivation.description = description
return derivation
def Join(...):
...
derivation_descriptions = {}
for derivation in derivations:
if derivation.description:
derivation_descriptions[derivation.name] = derivation.description
if derivation_descriptions:
custom_json['derivation_descriptions'] = derivation_descriptions
...
I would still prefer keeping the metadata closer to the object it refers to (perhaps with a more lightweight metadata object), but I wouldn't be too opposed to this alternative approach if you like that direction better. Or let me know if you meant something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had a discussion with @hzding621 and @camweston-stripe today. Let's reuse the metaData
struct actually to use it in the aggregation itself.
During the versioning check for join job, let's unset the meta data field before calculating the semantic hash so it won't affect the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for the input! Could you point me to the versioning check that you're referring to? Do you mean this validator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
Adds CHIP to support user-provided documentation.
Why / Goal
See the CHIP.
Test Plan
Checklist
Reviewers