@@ -46,6 +46,103 @@ various AutoML algorithms.
46
46
alt="Katib Overview"
47
47
class="mt-3 mb-3">
48
48
49
+ This diagram shows how Katib performs Hyperparameter tuning:
50
+
51
+ <img src="/docs/components/katib/images/katib-architecture.drawio.svg"
52
+ alt="Katib Overview"
53
+ class="mt-3 mb-3">
54
+
55
+ First of all, user need to write ML training code which will be evaluated on every Katib Trial
56
+ with different Hyperparameters. Then, using Katib Python SDK user should set objective, search
57
+ space, search algorithm, Trial resources, and create the Katib Experiment.
58
+
59
+ Follow the [ quickstart guide] ( /docs/components/katib/hyperparameter/#quickstart-with-katib-sdk )
60
+ to create your first Katib Experiment.
61
+
62
+ Katib implements the following Custom Resource Definitions (CRDs) to tune Hyperparameters:
63
+
64
+ ### Experiment
65
+
66
+ An _ Experiment_ is a single tuning run, also called an optimization run.
67
+
68
+ You specify configuration settings to define the Experiment. The following are
69
+ the main configurations:
70
+
71
+ - ** Objective** : What you want to optimize. This is the objective metric, also
72
+ called the target variable. A common metric is the model's accuracy
73
+ in the validation pass of the training job (_ validation-accuracy_ ). You also
74
+ specify whether you want the hyperparameter tuning job to _ maximize_ or
75
+ _ minimize_ the metric.
76
+
77
+ - ** Search space** : The set of all possible hyperparameter values that the
78
+ hyperparameter tuning job should consider for optimization, and the
79
+ constraints for each hyperparameter. Other names for search space include
80
+ _ feasible set_ and _ solution space_ . For example, you may provide the
81
+ names of the hyperparameters that you want to optimize. For each
82
+ hyperparameter, you may provide a _ minimum_ and _ maximum_ value or a _ list_
83
+ of allowable values.
84
+
85
+ - ** Search algorithm** : The algorithm to use when searching for the optimal
86
+ hyperparameter values.
87
+
88
+ Katib Experiment is defined as a
89
+ [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
90
+
91
+ For details of how to define your Experiment, follow the guide to [ running an
92
+ experiment] ( /docs/components/katib/experiment/ ) .
93
+
94
+ ### Suggestion
95
+
96
+ A _ Suggestion_ is a set of hyperparameter values that the hyperparameter
97
+ tuning process has proposed. Katib creates a Trial to evaluate the suggested
98
+ set of values.
99
+
100
+ Katib Suggestion is defined as a
101
+ [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
102
+
103
+ ### Trial
104
+
105
+ A _ Trial_ is one iteration of the hyperparameter tuning process. A Trial
106
+ corresponds to one worker job instance with a list of parameter assignments.
107
+ The list of parameter assignments corresponds to a Suggestion.
108
+
109
+ Each Experiment runs several Trials. The Experiment runs the Trials until it
110
+ reaches either the objective or the configured maximum number of Trials.
111
+
112
+ Katib trial is defined as a
113
+ [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
114
+
115
+ ### Worker job
116
+
117
+ The _ worker job_ is the process that runs to evaluate a Trial and calculate
118
+ its objective value.
119
+
120
+ The worker job can be any type of Kubernetes resource or
121
+ [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
122
+ Follow the
123
+ [ Trial template guide] ( /docs/components/katib/trial-template/#custom-resource )
124
+ to check how to support your own Kubernetes resource in Katib.
125
+
126
+ Katib has these CRD examples in upstream:
127
+
128
+ - [ Kubernetes ` Job ` ] ( https://kubernetes.io/docs/concepts/workloads/controllers/job/ )
129
+
130
+ - [ Kubeflow ` TFJob ` ] ( /docs/components/training/tftraining/ )
131
+
132
+ - [ Kubeflow ` PyTorchJob ` ] ( /docs/components/training/pytorch/ )
133
+
134
+ - [ Kubeflow ` MXJob ` ] ( /docs/components/training/mxnet )
135
+
136
+ - [ Kubeflow ` XGBoostJob ` ] ( /docs/components/training/xgboost )
137
+
138
+ - [ Kubeflow ` MPIJob ` ] ( /docs/components/training/mpi )
139
+
140
+ - [ Tekton ` Pipelines ` ] ( https://github.com/kubeflow/katib/tree/master/examples/v1beta1/tekton )
141
+
142
+ - [ Argo ` Workflows ` ] ( https://github.com/kubeflow/katib/tree/master/examples/v1beta1/argo )
143
+
144
+ By offering the above worker job types, Katib supports multiple ML frameworks.
145
+
49
146
## Hyperparameters and hyperparameter tuning
50
147
51
148
_ Hyperparameters_ are the variables that control the model training process.
@@ -82,12 +179,12 @@ layers, and the optimizer):
82
179
_ (To run the example that produced this graph, follow the [ getting-started
83
180
guide] ( /docs/components/katib/hyperparameter/ ) .)_
84
181
85
- Katib runs several training jobs (known as _ trials _ ) within each
86
- hyperparameter tuning job (_ experiment _ ). Each trial tests a different set of
87
- hyperparameter configurations. At the end of the experiment , Katib outputs
182
+ Katib runs several training jobs (known as _ Trials _ ) within each
183
+ hyperparameter tuning job (_ Experiment _ ). Each Trial tests a different set of
184
+ hyperparameter configurations. At the end of the Experiment , Katib outputs
88
185
the optimized values for the hyperparameters.
89
186
90
- You can improve your hyperparameter tunning experiments by using
187
+ You can improve your hyperparameter tuning Experiments by using
91
188
[ early stopping] ( https://en.wikipedia.org/wiki/Early_stopping ) techniques.
92
189
Follow the [ early stopping guide] ( /docs/components/katib/early-stopping/ )
93
190
for the details.
@@ -125,7 +222,7 @@ part of the form for submitting a NAS job from the Katib UI:
125
222
126
223
You can use the following interfaces to interact with Katib:
127
224
128
- - A web UI that you can use to submit experiments and to monitor your results.
225
+ - A web UI that you can use to submit Experiments and to monitor your results.
129
226
Check the [ getting-started
130
227
guide] ( /docs/components/katib/hyperparameter/#katib-ui )
131
228
for information on how to access the UI.
@@ -145,92 +242,6 @@ You can use the following interfaces to interact with Katib:
145
242
146
243
- Katib Python SDK. Check the [ Katib Python SDK documentation on GitHub] ( https://github.com/kubeflow/katib/tree/master/sdk/python/v1beta1 ) .
147
244
148
- ## Katib concepts
149
-
150
- This section describes the terms used in Katib.
151
-
152
- ### Experiment
153
-
154
- An _ experiment_ is a single tuning run, also called an optimization run.
155
-
156
- You specify configuration settings to define the experiment. The following are
157
- the main configurations:
158
-
159
- - ** Objective** : What you want to optimize. This is the objective metric, also
160
- called the target variable. A common metric is the model's accuracy
161
- in the validation pass of the training job (_ validation-accuracy_ ). You also
162
- specify whether you want the hyperparameter tuning job to _ maximize_ or
163
- _ minimize_ the metric.
164
-
165
- - ** Search space** : The set of all possible hyperparameter values that the
166
- hyperparameter tuning job should consider for optimization, and the
167
- constraints for each hyperparameter. Other names for search space include
168
- _ feasible set_ and _ solution space_ . For example, you may provide the
169
- names of the hyperparameters that you want to optimize. For each
170
- hyperparameter, you may provide a _ minimum_ and _ maximum_ value or a _ list_
171
- of allowable values.
172
-
173
- - ** Search algorithm** : The algorithm to use when searching for the optimal
174
- hyperparameter values.
175
-
176
- Katib experiment is defined as a
177
- [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
178
-
179
- For details of how to define your experiment, follow the guide to [ running an
180
- experiment] ( /docs/components/katib/experiment/ ) .
181
-
182
- ### Suggestion
183
-
184
- A _ suggestion_ is a set of hyperparameter values that the hyperparameter
185
- tuning process has proposed. Katib creates a trial to evaluate the suggested
186
- set of values.
187
-
188
- Katib suggestion is defined as a
189
- [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
190
-
191
- ### Trial
192
-
193
- A _ trial_ is one iteration of the hyperparameter tuning process. A trial
194
- corresponds to one worker job instance with a list of parameter assignments.
195
- The list of parameter assignments corresponds to a suggestion.
196
-
197
- Each experiment runs several trials. The experiment runs the trials until it
198
- reaches either the objective or the configured maximum number of trials.
199
-
200
- Katib trial is defined as a
201
- [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
202
-
203
- ### Worker job
204
-
205
- The _ worker job_ is the process that runs to evaluate a trial and calculate
206
- its objective value.
207
-
208
- The worker job can be any type of Kubernetes resource or
209
- [ Kubernetes CRD] ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/ ) .
210
- Follow the
211
- [ trial template guide] ( /docs/components/katib/trial-template/#custom-resource )
212
- to check how to support your own Kubernetes resource in Katib.
213
-
214
- Katib has these CRD examples in upstream:
215
-
216
- - [ Kubernetes ` Job ` ] ( https://kubernetes.io/docs/concepts/workloads/controllers/job/ )
217
-
218
- - [ Kubeflow ` TFJob ` ] ( /docs/components/training/tftraining/ )
219
-
220
- - [ Kubeflow ` PyTorchJob ` ] ( /docs/components/training/pytorch/ )
221
-
222
- - [ Kubeflow ` MXJob ` ] ( /docs/components/training/mxnet )
223
-
224
- - [ Kubeflow ` XGBoostJob ` ] ( /docs/components/training/xgboost )
225
-
226
- - [ Kubeflow ` MPIJob ` ] ( /docs/components/training/mpi )
227
-
228
- - [ Tekton ` Pipelines ` ] ( https://github.com/kubeflow/katib/tree/master/examples/v1beta1/tekton )
229
-
230
- - [ Argo ` Workflows ` ] ( https://github.com/kubeflow/katib/tree/master/examples/v1beta1/argo )
231
-
232
- By offering the above worker job types, Katib supports multiple ML frameworks.
233
-
234
245
## Next steps
235
246
236
247
Follow the [ getting-started guide] ( /docs/components/katib/hyperparameter/ )
0 commit comments