You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this inprocessing and training steps, which are commonly used stepsin Pipelines.
933
+
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this in commonly used step typesin Pipelines.
934
934
935
935
The following example creates a processing step:
936
936
@@ -1055,6 +1055,218 @@ The following parameters from the example cause additional training step iterati
1055
1055
- :code:`entry_point`: The entry point fileis included in the training job’s `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ array. A unique hashis created from the file (andany other dependencies), and then the fileis uploaded to S3 with the hash included in the path. When a different entry point fileis used, a new hashis created and the S3 path for that `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ object changes, initiating a new step run. For examples of what the S3 paths look like, see the **S3 Artifact Folder Structure** section.
1056
1056
- :code:`inputs`: The inputs are also included in the training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__. Local inputs are uploaded to S3. If the S3 path changes, a new training job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
The following parameters from the example cause additional tuning (or training) step iterations when you change them:
1130
+
1131
+
- :code:`image_uri`: The :code:`image_uri` parameter defines the image used for training, andis used directly in the `AlgorithmSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html>`__ attribute of the training job(s) that are created from the tuning job.
1132
+
- :code:`hyperparameters`: All of the hyperparameters passed in the :code:`xgb_train.set_hyperparameters()` method are used directly in the `StaticHyperParameters <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html>`__ attribute for the tuning job.
1133
+
- The following parameters are all included in the `HyperParameterTuningJobConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html>`__ andifany one of them changes, a new tuning job is initiated:
1134
+
- :code:`hyperparameter_ranges`
1135
+
- :code:`objective_metric_name`
1136
+
- :code:`max_jobs`
1137
+
- :code:`max_parallel_jobs`
1138
+
- :code:`strategy`
1139
+
- :code:`objective_type`
1140
+
- :code:`inputs`: The inputs are included inany training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ that get created from the tuning job. Local inputs are uploaded to S3. If the S3 path changes, a new tuning job is initiated. For examples of S3 paths, see the S3 Artifact Folder Structure section.
1141
+
1142
+
The following examples creates a transform step:
1143
+
1144
+
.. code-block:: python
1145
+
1146
+
from sagemaker.transformer import Transformer
1147
+
from sagemaker.inputs import TransformInput
1148
+
from sagemaker.workflow.steps import TransformStep
The following parameters from the example cause additional batch transform step iterations when you change them:
1180
+
1181
+
- :code:`model_name`: The name of the SageMaker model being used for the transform job.
1182
+
- :code:`env`: Environment variables to be setfor use during the transform job.
1183
+
- :code:`batch_data`: The input data will be included in the transform job’s `TransformInputfield <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TransformInput.html>`__. If the S3 path changes, a new transform job is initiated.
1184
+
1185
+
The following example creates an automl step:
1186
+
1187
+
.. code-block:: python
1188
+
1189
+
from sagemaker.workflow.pipeline_context import PipelineSession
1190
+
from sagemaker.workflow.automl_step import AutoMLStep
The following parameters from the example cause additional automl step iterations when you change them:
1223
+
1224
+
- :code:`target_attribute_name`: The name of the target variable in supervised learning.
1225
+
- :code:`mode`: The method that AutoML job uses to train the model - either AUTO, ENSEMBLINGorHYPERPARAMETER_TUNING.
1226
+
- :code:`inputs`: The inputs passed to the auto_ml.fit() method are included in the automl job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLChannel.html>`__. If the included S3 path(s) change, a new automl job is initiated.
1227
+
1228
+
The following example creates an EMR step:
1229
+
1230
+
.. code-block:: python
1231
+
1232
+
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig
1233
+
1234
+
emr_config= EMRStepConfig(
1235
+
jar="jar-location", # required, path to jar file used
1236
+
args=["--verbose", "--force"], # optional list of arguments to pass to the jar
1237
+
main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest
1238
+
properties=[ # optional list of Java properties that are set when the step runs
cluster_id="j-1ABCDEFG2HIJK", # include cluster_id to use a running cluster
1257
+
step_config=emr_config, # required
1258
+
display_name="My EMR Step",
1259
+
description="Pipeline step to execute EMR job"
1260
+
)
1261
+
1262
+
The following parameters from the example cause additional EMR step iterations when you change them:
1263
+
1264
+
- :code:`cluster_id`: The id of a running cluster to leverage for the EMR job.
1265
+
- :code:`emr_config`: Configuration regarding the code that will run on the EMR cluster during the job.
1266
+
1267
+
:class:`Note`: A :code:`cluster_config` parameter may also be passed into :code:`EMRStep`in order to spin up a new cluster. This parameter will also trigger additional step iterations if changed.
0 commit comments