You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -148,9 +144,15 @@ As is mentioned above, we decided to add `metrics_collector_config` to the tune
148
144
149
145
3. Rename metrics collector from `None` to `Push`: It's not correct to call push-based metrics collection `None`. We should modify related code to rename it.
150
146
147
+
4. Write env variables into trial spec: set `KATIB_TRIAL_NAMEPSACE` and `KATIB_TRIAL_NAME` for `report_metrics` function to dial db manager.
148
+
151
149
### New Interface `report_metrics` in Python SDK
152
150
153
-
We decide to implement this function to push metrics directly to Katib DB with the help of grpc. Trial name should always be passed into Katib Trials (and then into this function) as env variable `KATIB_TRIAL_NAME`. Steps:
151
+
We decide to implement this funcion to push metrics directly to Katib DB with the help of grpc. Trial's namespace and name should always be passed into Katib Trials (and then into this function) as env variable `KATIB_TRIAL_NAMESPACE` and `KATIB_TRIAL_NAME`.
152
+
153
+
Also, the function is supposed to be implemented as **global function** because it is called in the user container.
154
+
155
+
Steps:
154
156
155
157
1. Wrap metrics into `katib_api_pb2.ReportObservationLogRequest`:
156
158
@@ -172,8 +174,19 @@ if jobStatus.Condition == trialutil.JobSucceeded && instance.Status.Observation
172
174
return errMetricsNotReported
173
175
}
174
176
```
177
+
1. Distinguish pull-based and push-based metrics collection
178
+
179
+
We decide to add a if-else statement in the code above to distinguish pull-based and push-based metrics collection. In the push-based collection, the trial does not need to be requeued. Instead, we'll insert a unavailable value to Katib DB.
175
180
176
-
We decide to add a if-else statement to distinguish pull-based and push-based metrics collection. In the push-based collection, the trial does not need to be requeued. Instead, we'll insert a unavailable value to Katib DB and change the status of trial to `MetricsUnavailable`
181
+
2. Update the status of trial to `MetricsUnavailable`
182
+
183
+
In the current implementation of pull-based metrics collection, trials will be re-queued when the metrics collector finds the `.Status.Observation` is empty. However, it's not compatible with push-based metrics collection because the forgotten metrics won't be reported in the new round of reconcile. So, we need to update its status in the function `UpdateTrialStatusCondition` in accomodation with the pull-based metrics collection.
0 commit comments