Skip to content

Commit 55c03ef

Browse files
CHIP for user-provided documentation.
1 parent 3e138e8 commit 55c03ef

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

proposals/CHIP-3.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# CHIP-3: User-provided documentation for feature definitions
2+
3+
https://github.com/airbnb/chronon/issues/<TODO>
4+
5+
## Motivation
6+
7+
Programmatic access to feature documentation is useful for integrating with systems aimed at ML explainability (e.g. [SHAP](https://shap.readthedocs.io/en/latest/)) and feature discovery (e.g. feature catalogs).
8+
9+
Currently, it's possible to inspect Chronon definitions to determine _how_ a feature was computed, which is a type of documentation. However, there are other aspects of ML feature development that are not
10+
captured and documented within the feature definition itself, for example: context, domain-specific knowledge, assumptions, caveats. This CHIP aims to fill those gaps with user-provided documentation.
11+
12+
## Proposed Change
13+
14+
There are 3 main changes in this proposal:
15+
16+
1. Add a `description` field to `MetaData` in the Thrift API.
17+
18+
```thrift
19+
struct MetaData {
20+
...
21+
xx: optional string description
22+
}
23+
```
24+
25+
2. Add a `metaData` field to `Aggregation` and `Derivation` in the Thrift API.
26+
27+
```thrift
28+
struct Aggregation {
29+
...
30+
xx: optional MetaData metaData
31+
}
32+
struct Derivation {
33+
...
34+
xx: optional MetaData metaData
35+
}
36+
```
37+
38+
3. Update Chronon's Python API to handle an optional `description` parameter for the following objects: `Join`, `GroupBy`, `ExternalSource`, `ContextualSource`, `StagingQuery`, `Aggregation`, `Derivation`. When present, it will be passed through to the enclosed `MetaData`. Example implementation for `Derivation` (the simplest one):
39+
40+
```python
41+
def Derivation(name: str, expression: str, description: Optional[str] = None) -> ttypes.Derivation:
42+
...
43+
metadata = ttypes.MetaData(description=description) if description else None
44+
return ttypes.Derivation(name, expression, metadata)
45+
```
46+
47+
## New or Changed Public Interfaces
48+
49+
The Thrift API will change. However, all the changes to the definition are additive, no existing fields will be touched.
50+
51+
There will be an effect (Chronon object diffs) on existing implementations that happen to coincidentally pass `description` as `kwargs`, since those arbitrary params get thrown into `MetaData.customJson`.
52+
However, this is not a public API contract and would not be expected to have an effect on feature computation.
53+
54+
## Rejected Alternatives
55+
56+
- Support a `description` parameter at the Python API level without changes to the Thrift definitions. In this implementation, the descriptions would be collected and bubbled up to the top-level object (e.g. `Join` or `GroupBy`), similar to how `tags` are handled ([code](https://github.com/airbnb/chronon/blob/3e138e86d9922a6742709adc69b9b6ccbd18852c/api/py/ai/chronon/group_by.py#L529)).
57+
- Pros
58+
- Consistency with implementation for `tags`.
59+
- No changes to the public Thrift API.
60+
- Cons
61+
- Obscures support for feature documentation, since data in `customJson` looks adhoc and generally less discoverable.
62+
- Would require some sort of mapping from objects to descriptions in the top-level metadata. E.g. mapping a description to its corresponding derivation, perhaps through output columns. This could be brittle and significantly increases the size of data in `customJson`.
63+
- Bubbling up parameters via dynamic and undocumented fields (as it's done with `tags`, which are not part of the Thrift definition for `Aggregation`) is less maintainable as things may easily break if code is moved around without handling those fields correctly.

0 commit comments

Comments
 (0)