Skip to content

feat: add @config.default #1213

Open
Open
@zilto

Description

@zilto

This is the result of discussing with a user on backwards compatibility of a dataflow.

Currently, @config offers 4 options:

  • @config.when(key="foo"): select this implementation when equality is True
  • @config.when_not(key="foo"): select this impl. when equality is False
  • @config.when_in(key=["foo", "bar"]): selects this impl. when key in list[] is True
  • @config.when_not_in(key=["foo", "bar"]): selects this impl. when key in list[] is False
    This covers a lot of cases, but there's no way to specify a default.

Example 1

Here's a simple illustration of limitations for backwards compatibility.

This is version1

# dataflow.py
def foo() -> int:
   return 1
   
# run.py
import dataflow
from hamilton import driver

dr = driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])

Example 2

Now I'm adding a version2 and I want to have version1 as my default.

problem

If you use @config.when(version="1") and @config.when(version="2"), this can break downstream drivers because there will be no node foo if .with_config() is not set.

# dataflow.py
from hamilton.function_modifiers import config

@config.when(version="1")
def foo__v1() -> int:
   return 1
   
@config.when(version="2")
def foo__v2() -> int:
   return 2  
   
 # run.py
import dataflow
from hamilton import driver

# breaks because `.with_config()` didn't set `version="1"` or `version="2"`
dr = driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])

solution

Best solution is to annotate when_not(version="2") to catch all configurations (including empty ones, i.e., when .with_config() is not present).

# dataflow.py
from hamilton.function_modifiers import config

@config.when_not(version="2")
def foo__v1() -> int:
   return 1
   
@config.when(version="2")
def foo__v2() -> int:
   return 2  
 
# run.py
import dataflow
from hamilton import driver

dr = driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])

Example 3

Now, I'm adding an implementation version3

Problem

If I'm conserving my previous code and adding @config.when(version="3"), it will never be hit. This is because the already existing when_not(version="2") will catch this configuration.

# dataflow.py
from hamilton.function_modifiers import config

@config.when_not(version="2")
def foo__v1() -> int:
   return 1
   
@config.when(version="2")
def foo__v2() -> int:
   return 2
   
@config.when(version="3")
def foo__v3() -> int:
   return 3  
 
# run.py
import dataflow
from hamilton import driver

# there will be no errors, but `v1` will be used actually
dr = driver.Builder().with_config({"version": "3"}).with_modules(dataflow).build()
dr.execute(["foo"])

Solution

The user has to modify the decorator for foo__v1() and set it to when_not_in(version=["2", "3"]) to catch all configurations.

The next problem is that whenever an implementation is added, you need to remember to add it to this list otherwise you will silently catch the new version="4".

# dataflow.py
from hamilton.function_modifiers import config

@config.when_not_in(version=["2", "3"])
def foo__v1() -> int:
   return 1
   
@config.when(version="2")
def foo__v2() -> int:
   return 2
   
@config.when(version="3")
def foo__v3() -> int:
   return 3  
 
# run.py
import dataflow
from hamilton import driver

dr = driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])

Consequences

The main issue is backwards compatibility. When refactoring from a single implementation to two implementations, users have to carefully use .when() and .when_not() in conjunction otherwise, they will break Driver that don't have a config. Then, when moving from 2 to 3+, they have to use when_not_in() and manually manage a list. It is also not obvious from the code that the when_not_in() means "default implementation".

Currently, using .when(version="1") and .when(version="2") implicitly creates a pattern of raising an error on invalid configurations (e.g., version=-1) because there would be a missing node foo, which will likely break a key path. If breaking the path didn't raise an error then a correct or incorrect config didn't matter.

This relates to a broader task of defining the space of valid configurations.

Solution

We should have a @config.default to ensure a node foo is always present in the DAG. Its name is also easy to understand. When you're moving from 1 implementation to 2+, you get a clear design decision: do I want a config.when with v1 and v2 or a default and v2?

Using @config.default would mean "select this implementation if no other config is resolved". This condition needs to be the last resolved and you can't have two nodes of the same name with @config.default.

# dataflow.py
from hamilton.function_modifiers import config

@config.default
def foo__v1() -> int:
   return 1
   
@config.when(version="2")
def foo__v2() -> int:
   return 2
   
@config.when(version="3")
def foo__v3() -> int:
   return 3  
 
# run.py
import dataflow
from hamilton import driver

# passing no config means `default` was used
dr = driver.Builder().with_modules(dataflow).build()
dr.execute(["foo"])

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions