Skip to content

Commit 8177767

Browse files
sarahmaddoxk8s-ci-robot
authored andcommitted
Expanded docs for the Metadata component (kubeflow#1061)
* WIP Expanded docs for Metadata component. * Added more about metadata. * Addressed review comments.
1 parent 5b6caa1 commit 8177767

File tree

6 files changed

+162
-22
lines changed

6 files changed

+162
-22
lines changed

content/docs/components/misc/metadata.md

+162-22
Original file line numberDiff line numberDiff line change
@@ -4,43 +4,183 @@ description = "Tracking and managing metadata of machine learning workflows in K
44
weight = 5
55
+++
66

7-
The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to help Kubeflow users understand and manage their machine learning workflows by tracking and managing the metadata of workflows.
7+
The goal of the [Metadata](https://github.com/kubeflow/metadata) project is to
8+
help Kubeflow users understand and manage their machine learning (ML) workflows
9+
by tracking and managing the metadata that the workflows produce.
810

9-
## Installation
11+
In this context, _metadata_ means information about executions (runs), models,
12+
datasets, and other artifacts. _Artifacts_ are the files and objects that form
13+
the inputs and outputs of the components in your ML workflow.
1014

11-
The Metadata component is installed by default for Kubeflow versions >= 0.6.1.
15+
{{% alert title="Alpha version" color="warning" %}}
16+
This is an <b>alpha</b> release of the Metadata API. The next version of Kubeflow
17+
will introduce breaking changes. The development team is interested in any
18+
feedback you have while using the Metadata component, and in particular your
19+
feedback on any gaps in the functionality that the component offers.
20+
{{% /alert %}}
1221

13-
If you want to install the latest version of the Metadata component or install it as an application in your Kubernetes cluster, you can follow these steps:
22+
## Installing the Metadata component
1423

15-
1. Download the Kubeflow manifests repository.
16-
```
17-
git clone https://github.com/kubeflow/manifests
18-
```
24+
Kubeflow v0.6.1 and later versions install the Metadata component by default.
25+
You can skip this section if you are running Kubeflow v0.6.1 or later.
1926

20-
2. Run the following commands in the manifest repository to deploy services of the Metadata component.
21-
```
22-
cd manifests/metadata/base
23-
kustomize build . | kubectl apply -n kubeflow -f -
24-
```
27+
If you want to install the latest version of the Metadata component or to
28+
install the component as an application in your Kubernetes cluster, follow these
29+
steps:
30+
31+
1. Download the Kubeflow manifests repository:
2532

26-
## Python Library
33+
```
34+
git clone https://github.com/kubeflow/manifests
35+
```
2736
28-
The Metadata project publishes a [Python library](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client) for logging metadata.
37+
2. Run the following commands to deploy the services of the Metadata component:
38+
39+
```
40+
cd manifests/metadata/base
41+
kustomize build . | kubectl apply -n kubeflow -f -
42+
```
43+
44+
## Using the Metadata SDK to record metadata
45+
46+
The Metadata project publishes a
47+
[Python library (SDK)](https://github.com/kubeflow/metadata/tree/master/sdk/python#python-client)
48+
that you can use to log (record) your metadata.
49+
50+
Run the following command to install the Metadata SDK:
2951
30-
You can install it via the following command:
3152
```
3253
pip install kfmd
3354
```
3455
35-
To help you describe your ML workflows, the Python library has [predefined types](https://github.com/kubeflow/metadata/tree/master/schema) to capture models, datasets, evaluation metrics, and executions.
56+
<a id="demo-notebook"></a>
57+
### Try the Metadata SDK in a sample Jupyter notebook
58+
59+
You can find an example of how to use the Metadata SDK in this
60+
[`demo` notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).
61+
62+
To run the notebook in your Kubeflow cluster:
63+
64+
1. Follow the guide to
65+
[setting up your Jupyter notebooks in Kubeflow](/docs/notebooks/setup/).
66+
1. Go to the [`demo` notebook on
67+
GitHub](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).
68+
1. Download the notebook code by opening the **Raw** view of the file, then
69+
right-clicking on the content and saving the file locally as `demo.ipynb`.
70+
1. Go back to your Jupyter notebook server in the Kubeflow UI. (If you've
71+
moved away from the notebooks section in Kubeflow, click
72+
**Notebook Servers** in the left-hand navigation panel to get back there.)
73+
1. In the Jupyter notebook UI, click **Upload** and follow the prompts to upload
74+
the `demo.ipynb` notebook.
75+
1. Click the notebook name (`demo.ipynb`) to open the notebook in your Kubeflow
76+
cluster.
77+
1. Run the steps in the notebook to install and use the Metadata SDK.
78+
79+
When you have finished running through the steps in the `demo.ipynb` notebook,
80+
you can view the resulting metadata on the Kubeflow UI:
81+
82+
1. Click **Artifact Store** in the left-hand navigation panel on the Kubeflow
83+
UI.
84+
1. On the **Artifacts** screen you should see the following items:
85+
86+
* A **model** metadata item with the name **MNIST**.
87+
* A **metrics** metadata item with the name **MNIST-evaluation**.
88+
* A **dataset** metadata item with the name **mytable-dump**.
89+
90+
You can click the name of each item to view the details. See the section
91+
below about the [Metadata UI](#metadata-ui) for more details.
92+
93+
### Learn more about the Metadata SDK
94+
95+
The Metadata SDK includes the following
96+
[predefined types](https://github.com/kubeflow/metadata/tree/master/schema)
97+
that you can use to describe your ML workflows:
98+
99+
* [`data_set.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/data_set.json)
100+
to capture metadata for a dataset that forms the input into or the output of
101+
a component in your workflow.
102+
* [`execution.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/execution.json)
103+
to capture metadata for an execution (run) of your ML workflow.
104+
* [`metrics.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/metrics.json)
105+
to capture metadata for the metrics used to evaluate an ML model.
106+
* [`model.json`](https://github.com/kubeflow/metadata/blob/master/schema/alpha/artifacts/model.json)
107+
to capture metadata for an ML model that your workflow produces.
36108
37-
You can find an example of how to use the logging API in this [notebook](https://github.com/kubeflow/metadata/blob/master/sdk/python/demo.ipynb).
109+
<a id="metadata-ui"></a>
110+
## Tracking artifacts on the Metadata UI
38111
39-
## Backend
112+
You can view a list of logged artifacts and the details of each individual
113+
artifact in the **Artifact Store** on the Kubeflow UI.
40114
41-
The backend uses [ML-Metadata](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md) to manage all the metadata and relations. It exposes a [REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/).
115+
1. Go to Kubeflow in your browser. (If you haven't yet opened the
116+
Kubeflow UI, find out how to [access the
117+
Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/).)
118+
1. Click **Artifact Store** in the left-hand navigation panel:
119+
<img src="/docs/images/metadata-ui-option.png"
120+
alt="Metadata UI"
121+
class="mt-3 mb-3 border border-info rounded">
42122
43-
## UI
123+
1. The **Artifacts** screen opens and displays a list of items for all the
124+
metadata events that your workflows have logged. You can click the name of
125+
each item to view the details.
126+
127+
The following examples show the items that appear when you run the
128+
`demo.ipynb` notebook described [above](#demo-notebook):
129+
130+
<img src="/docs/images/metadata-artifacts-list.png"
131+
alt="A list of metadata items"
132+
class="mt-3 mb-3 border border-info rounded">
133+
134+
* Example of **model** metadata with the name "MNIST":
135+
136+
<img src="/docs/images/metadata-model.png"
137+
alt="Model metadata for an example MNIST model"
138+
class="mt-3 mb-3 border border-info rounded">
139+
140+
* Example of **metrics** metadata with the name "MNIST-evaluation":
141+
142+
<img src="/docs/images/metadata-metrics.png"
143+
alt="Metrics metadata for an evaluation of an MNIST model"
144+
class="mt-3 mb-3 border border-info rounded">
145+
146+
* Example of **dataset** metadata with the name "mytable-dump":
147+
148+
<img src="/docs/images/metadata-dataset.png"
149+
alt="Dataset metadata"
150+
class="mt-3 mb-3 border border-info rounded">
151+
152+
153+
154+
## Backend and REST API
155+
156+
The Kubeflow metadata backend uses [ML Metadata
157+
(MLMD)](https://github.com/google/ml-metadata/blob/master/g3doc/get_started.md)
158+
to manage the metadata and relationships.
159+
160+
The backend exposes a
161+
[REST API](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/).
162+
163+
You can add your own metadata types so that you can log metadata for custom
164+
artifacts. To add a custom type, send a REST API request to the
165+
[`artifact_types` endpoint](/docs/reference/metadata/v1alpha1/kubeflow-metadata-api-spec/#operation--api-v1alpha1-artifact_types-post).
166+
167+
For example, The following request registers an artifact type with
168+
_name_ `myorg/mytype/v1` and three _properties_:
169+
170+
* `f1` (string)
171+
* `f2` (integer)
172+
* `f3` (double)
173+
174+
```
175+
curl -X POST http://localhost:8080/api/v1alpha1/artifact_types \
176+
--header "Content-Type: application/json" -d \
177+
'{"name":"myorg/mytype/v1","properties":{"f1":"STRING", "f2":"INT", "f3": "DOUBLE"}}'
178+
```
44179
45-
You can view a list of logged artifacts and the details of each individual artifact via the _Artifact Store_ on [Kubeflow UIs](https://www.kubeflow.org/docs/other-guides/accessing-uis/).
180+
## Next steps
46181
182+
Run the
183+
[xgboost-synthetic notebook](https://github.com/kubeflow/examples/tree/master/xgboost_synthetic)
184+
to build, train, and deploy an XGBoost model using Kubeflow Fairing and Kubeflow
185+
Pipelines with synthetic data. Examine the metadata output after running
186+
through the steps in the notebook.
92.4 KB
Loading
61.7 KB
Loading
64.6 KB
Loading
86.1 KB
Loading
283 KB
Loading

0 commit comments

Comments
 (0)