Skip to content

Commit beefae9

Browse files
OfficePopk8s-ci-robot
authored andcommitted
PR for Issue 795 (outdated Pipelines SDK guide) (kubeflow#971)
* Update pipelines-tutorial.md * Add files via upload * Update kubeflow-current-version.html * Delete kubeflow-current-version.html Unnecessary, kf-stable-tag already serves this purpose. * Update pipelines-tutorial.md * Update pipelines-tutorial.md * Update pipelines-tutorial.md * Update build-component.md * Update build-component.md * Update build-component.md * Update build-component.md * Update build-component.md * Update build-component.md
1 parent 2f5356c commit beefae9

File tree

1 file changed

+41
-34
lines changed

1 file changed

+41
-34
lines changed

content/docs/pipelines/sdk/build-component.md

+41-34
Original file line numberDiff line numberDiff line change
@@ -60,57 +60,64 @@ local file, such as `/output.txt`. In the Python class that defines your
6060
pipeline (see [below](#define-pipeline)) you can
6161
specify how to map the content of local files to component outputs.
6262

63-
## Create a Python class for your component
63+
## Create a Python function to wrap your component
6464

65-
Define a Python class to describe the interactions with the Docker container
65+
Define a Python function to describe the interactions with the Docker container
6666
image that contains your pipeline component. For example, the following
67-
Python class describes a component that trains an XGBoost model:
67+
Python function describes a component that trains an XGBoost model:
6868

6969
```python
70-
class TrainerOp(dsl.ContainerOp):
71-
72-
def __init__(self, name, project, region, cluster_name, train_data, eval_data,
73-
target, analysis, workers, rounds, output, is_classification=True):
70+
def dataproc_train_op(
71+
project,
72+
region,
73+
cluster_name,
74+
train_data,
75+
eval_data,
76+
target,
77+
analysis,
78+
workers,
79+
rounds,
80+
output,
81+
is_classification=True
82+
):
7483
if is_classification:
7584
config='gs://ml-pipeline-playground/trainconfcla.json'
7685
else:
7786
config='gs://ml-pipeline-playground/trainconfreg.json'
7887

79-
super(TrainerOp, self).__init__(
80-
name=name,
81-
image='gcr.io/ml-pipeline/ml-pipeline-dataproc-train:7775692adf28d6f79098e76e839986c9ee55dd61',
82-
arguments=[
83-
'--project', project,
84-
'--region', region,
85-
'--cluster', cluster_name,
86-
'--train', train_data,
87-
'--eval', eval_data,
88-
'--analysis', analysis,
89-
'--target', target,
90-
'--package', 'gs://ml-pipeline-playground/xgboost4j-example-0.8-SNAPSHOT-jar-with-dependencies.jar',
91-
'--workers', workers,
92-
'--rounds', rounds,
93-
'--conf', config,
94-
'--output', output,
95-
],
96-
file_outputs={'output': '/output.txt'})
88+
return dsl.ContainerOp(
89+
name='Dataproc - Train XGBoost model',
90+
image='gcr.io/ml-pipeline/ml-pipeline-dataproc-train:ac833a084b32324b56ca56e9109e05cde02816a4',
91+
arguments=[
92+
'--project', project,
93+
'--region', region,
94+
'--cluster', cluster_name,
95+
'--train', train_data,
96+
'--eval', eval_data,
97+
'--analysis', analysis,
98+
'--target', target,
99+
'--package', 'gs://ml-pipeline-playground/xgboost4j-example-0.8-SNAPSHOT-jar-with-dependencies.jar',
100+
'--workers', workers,
101+
'--rounds', rounds,
102+
'--conf', config,
103+
'--output', output,
104+
],
105+
file_outputs={
106+
'output': '/output.txt',
107+
}
108+
)
97109

98110
```
99111

100-
The above class is an extract from the
112+
The function must return a dsl.ContainerOp from the
101113
[XGBoost Spark pipeline sample](https://github.com/kubeflow/pipelines/blob/master/samples/xgboost-spark/xgboost-training-cm.py).
102114

103115
Note:
104116

105117
* Each component must inherit from
106118
[`dsl.ContainerOp`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_container_op.py).
107-
* In the `init` arguments, you can include Python native types (such as `str`
108-
and `int`) and
109-
[`dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py)
110-
types. Each `dsl.PipelineParam` represents a parameter whose value is usually
111-
only known at run time. The parameter can be a one for which the user provides
112-
a value at pipeline run time, or it can be an output from an upstream
113-
component.
119+
* Values in the `arguments` list that's used by the `dsl.ContainerOp` constructor above must be either Python scalar types (such as `str` and ` int`) or [`dsl.PipelineParam`](https://github.com/kubeflow/pipelines/blob/master/sdk/python/kfp/dsl/_pipeline_param.py) types. Each `dsl.PipelineParam` represents a parameter whose value is usually only known at run time. The value is
120+
either provided by the user at pipeline run time or received as an output from an upstream component.
114121
* Although the value of each `dsl.PipelineParam` is only available at run time,
115122
you can still use the parameters inline in the `arguments` by using `%s`
116123
variable substitution. At run time the argument contains the value of the
@@ -121,7 +128,7 @@ Note:
121128
component. To reference the output in code:
122129

123130
```python
124-
op = TrainerOp(...)
131+
op = dataproc_train_op(...)
125132
op.outputs['label']
126133
```
127134

0 commit comments

Comments
 (0)