Skip to content

Commit 1a2599d

Browse files
chore: update doc after the review in 10th, June.
Signed-off-by: Electronic-Waste <[email protected]>
1 parent 6969dd7 commit 1a2599d

File tree

1 file changed

+21
-8
lines changed

1 file changed

+21
-8
lines changed

docs/proposals/push-based-metrics-collection.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -79,21 +79,18 @@ def tune(
7979
8080
You can use `curl` to verify that Katib DB Manager is reachable: `curl <db-manager-address>`.
8181
82-
[!!!] Trial name should always be passed into Katib Trials as env variable `KATIB_TRIAL_NAME`.
82+
[!!!] Trial's namespace and name should always be passed into Katib Trials as env variable `KATIB_TRIAL_NAMESPACE` and `KATIB_TRIAL_NAME`.
8383
8484
Args:
8585
metrics: Dict of metrics pushed to Katib DB.
8686
For examle, `metrics = {"loss": 0.01, "accuracy": 0.99}`.
87-
namespace: Namespace for the trial metrics.
8887
db-manager-address: Address for the Katib DB Manager in this format: `ip-address:port`.
8988
9089
Raises:
9190
RuntimeError: Unable to push Trial metrics to Katib DB.
9291
"""
9392
def report_metrics(
94-
self,
9593
metrics: Dict[str, Any],
96-
namespace: Optional[str] = None,
9794
db_manager_address: str = constants.DEFAULT_DB_MANAGER_ADDRESS,
9895
)
9996
```
@@ -110,8 +107,7 @@ def objective(parameters):
110107
# Calculate objective function.
111108
result = 4 * int(parameters["a"]) - float(parameters["b"]) ** 2
112109
# Push metrics to Katib DB.
113-
katib_client = katib.KatibClient(namespace="kubeflow")
114-
katib_client.report_metrics({"result": result})
110+
katib.report_metrics({"result": result})
115111

116112
# Step 2. Create HyperParameter search space.
117113
parameters = {
@@ -148,9 +144,15 @@ As is mentioned above, we decided to add `metrics_collector_config` to the tune
148144

149145
3. Rename metrics collector from `None` to `Push`: It's not correct to call push-based metrics collection `None`. We should modify related code to rename it.
150146

147+
4. Write env variables into trial spec: set `KATIB_TRIAL_NAMEPSACE` and `KATIB_TRIAL_NAME` for `report_metrics` function to dial db manager.
148+
151149
### New Interface `report_metrics` in Python SDK
152150

153-
We decide to implement this function to push metrics directly to Katib DB with the help of grpc. Trial name should always be passed into Katib Trials (and then into this function) as env variable `KATIB_TRIAL_NAME`. Steps:
151+
We decide to implement this funcion to push metrics directly to Katib DB with the help of grpc. Trial's namespace and name should always be passed into Katib Trials (and then into this function) as env variable `KATIB_TRIAL_NAMESPACE` and `KATIB_TRIAL_NAME`.
152+
153+
Also, the function is supposed to be implemented as **global function** because it is called in the user container.
154+
155+
Steps:
154156

155157
1. Wrap metrics into `katib_api_pb2.ReportObservationLogRequest`:
156158

@@ -172,8 +174,19 @@ if jobStatus.Condition == trialutil.JobSucceeded && instance.Status.Observation
172174
return errMetricsNotReported
173175
}
174176
```
177+
1. Distinguish pull-based and push-based metrics collection
178+
179+
We decide to add a if-else statement in the code above to distinguish pull-based and push-based metrics collection. In the push-based collection, the trial does not need to be requeued. Instead, we'll insert a unavailable value to Katib DB.
175180

176-
We decide to add a if-else statement to distinguish pull-based and push-based metrics collection. In the push-based collection, the trial does not need to be requeued. Instead, we'll insert a unavailable value to Katib DB and change the status of trial to `MetricsUnavailable`
181+
2. Update the status of trial to `MetricsUnavailable`
182+
183+
In the current implementation of pull-based metrics collection, trials will be re-queued when the metrics collector finds the `.Status.Observation` is empty. However, it's not compatible with push-based metrics collection because the forgotten metrics won't be reported in the new round of reconcile. So, we need to update its status in the function `UpdateTrialStatusCondition` in accomodation with the pull-based metrics collection.
184+
185+
```Golang
186+
else if instance.Spec.MetricCollector.Collector.Kind == "Push" && instance.Status.Obeservation == nil {
187+
... // Update the status of this trial to `MetricsUnavailable` and output the reason.
188+
}
189+
```
177190

178191
### Collection of Final Metrics
179192

0 commit comments

Comments
 (0)