Skip to content

Commit aff97bf

Browse files
doc: complete KEP for gsoc proposal(Project 6).
Signed-off-by: Electronic-Waste <[email protected]>
1 parent ef09f05 commit aff97bf

File tree

1 file changed

+52
-2
lines changed

1 file changed

+52
-2
lines changed

docs/proposals/push-based-metrics-collection.md

Lines changed: 52 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,56 @@ Fig.1 Architecture of the new design
2222

2323
## API
2424

25-
### New parameter in Python SDK function `tune`
25+
### New Parameter in Python SDK Function `tune`
2626

27-
27+
We decided to add `metrics_collection_mechanism` to `tune` function in Python SDK.
28+
29+
```Python
30+
def tune(
31+
self,
32+
name: str,
33+
objective: Callable,
34+
parameters: Dict[str, Any],
35+
base_image: str = constants.BASE_IMAGE_TENSORFLOW,
36+
namespace: Optional[str] = None,
37+
env_per_trial: Optional[Union[Dict[str, str], List[Union[client.V1EnvVar, client.V1EnvFromSource]]]] = None,
38+
algorithm_name: str = "random",
39+
algorithm_settings: Union[dict, List[models.V1beta1AlgorithmSetting], None] = None,
40+
objective_metric_name: str = None,
41+
additional_metric_names: List[str] = [],
42+
objective_type: str = "maximize",
43+
objective_goal: float = None,
44+
max_trial_count: int = None,
45+
parallel_trial_count: int = None,
46+
max_failed_trial_count: int = None,
47+
resources_per_trial: Union[dict, client.V1ResourceRequirements, None] = None,
48+
retain_trials: bool = False,
49+
packages_to_install: List[str] = None,
50+
pip_index_url: str = "https://pypi.org/simple",
51+
metrics_collection_mechanism: str = "pull", # The newly added parameter
52+
)
53+
```
54+
55+
## Implementation
56+
57+
### Add New Parameter in `tune`
58+
59+
As is mentioned above, we decided to add `metrics_collection_mechanism` to the tune function in Python SDK. Also, we have some changes to be made:
60+
61+
1. Disable injection: set `katib.kubeflow.org/metrics-collector-injection` to `disabled` when the push-based way of metrics collection is adopted so as to disable the injection of the metrics collection sidecar container.
62+
63+
2. Configure the way of metrics collection: set the configuration `spec.metricsCollectionSpec.collector.kind`(specify the way of metrics collection) to `NoneCollector`.
64+
65+
### Code Injection in Webhook
66+
67+
We decided to implement a code replacing function in Experiment Mutating Webhook. When `spec.metricsCollectionSpec.collector.kind` is set to `NoneCollector`, the code replacing function will recognize the metrics output lines (e.g. print, log.Info, e.t.c.) and replace them with push-based metrics collection code which will be discussed in the next section. It’s a better decision compared with offering users a `katib_client.push`-like interface, for that users can’t use a yaml file to define this operation.
68+
69+
### Push-based Metrics Collection Code
70+
71+
The push-based metrics collection code is a function making a grpc call to the persistent API to store training metrics. It will be injected to container args in the Experiment Mutating Webhook and then be called inside the Trial Worker Pod to push metrics to Katib DB.
72+
73+
### Collection of Final Metrics
74+
75+
The final metrics of worker pods should be pushed to Katib DB directly in the push mode of metrics collection.
76+
77+
\#WIP

0 commit comments

Comments
 (0)