Skip to content

Commit 1f09c08

Browse files
brockwade633Brock Wade
and
Brock Wade
authored
documentation: update pipelines step caching examples to include more steps (#5121)
Co-authored-by: Brock Wade <[email protected]>
1 parent 28e07cf commit 1f09c08

File tree

1 file changed

+213
-1
lines changed

1 file changed

+213
-1
lines changed

doc/amazon_sagemaker_model_building_pipeline.rst

+213-1
Original file line numberDiff line numberDiff line change
@@ -930,7 +930,7 @@ Caching is supported for the following step types:
930930
- :class:`sagemaker.workflow.clarify_check_step.ClarifyCheckStep`
931931
- :class:`sagemaker.workflow.emr_step.EMRStep`
932932
933-
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this in processing and training steps, which are commonly used steps in Pipelines.
933+
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this in commonly used step types in Pipelines.
934934
935935
The following example creates a processing step:
936936
@@ -1055,6 +1055,218 @@ The following parameters from the example cause additional training step iterati
10551055
- :code:`entry_point`: The entry point file is included in the training job’s `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ array. A unique hash is created from the file (and any other dependencies), and then the file is uploaded to S3 with the hash included in the path. When a different entry point file is used, a new hash is created and the S3 path for that `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ object changes, initiating a new step run. For examples of what the S3 paths look like, see the **S3 Artifact Folder Structure** section.
10561056
- :code:`inputs`: The inputs are also included in the training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__. Local inputs are uploaded to S3. If the S3 path changes, a new training job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
10571057
1058+
The following example creates a tuning step:
1059+
1060+
.. code-block:: python
1061+
1062+
from sagemaker.workflow.steps import TuningStep
1063+
from sagemaker.tuner import HyperparameterTuner
1064+
from sagemaker.estimator import Estimator
1065+
from sagemaker.inputs import TrainingInput
1066+
1067+
model_path = f"s3://{default_bucket}/{base_job_prefix}/AbaloneTrain"
1068+
1069+
xgb_train = Estimator(
1070+
image_uri=image_uri,
1071+
instance_type=training_instance_type,
1072+
instance_count=1,
1073+
output_path=model_path,
1074+
base_job_name=f"{base_job_prefix}/abalone-train",
1075+
sagemaker_session=pipeline_session,
1076+
role=role,
1077+
)
1078+
1079+
xgb_train.set_hyperparameters(
1080+
eval_metric="rmse",
1081+
objective="reg:squarederror", # Define the object metric for the training job
1082+
num_round=50,
1083+
max_depth=5,
1084+
eta=0.2,
1085+
gamma=4,
1086+
min_child_weight=6,
1087+
subsample=0.7,
1088+
silent=0,
1089+
)
1090+
1091+
objective_metric_name = "validation:rmse"
1092+
1093+
hyperparameter_ranges = {
1094+
"alpha": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
1095+
"lambda": ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
1096+
}
1097+
1098+
tuner = HyperparameterTuner(
1099+
xgb_train,
1100+
objective_metric_name,
1101+
hyperparameter_ranges,
1102+
max_jobs=3,
1103+
max_parallel_jobs=3,
1104+
strategy="Random",
1105+
objective_type="Minimize",
1106+
)
1107+
1108+
hpo_args = tuner.fit(
1109+
inputs={
1110+
"train": TrainingInput(
1111+
s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
1112+
content_type="text/csv",
1113+
),
1114+
"validation": TrainingInput(
1115+
s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
1116+
"validation"
1117+
].S3Output.S3Uri,
1118+
content_type="text/csv",
1119+
),
1120+
}
1121+
)
1122+
1123+
step_tuning = TuningStep(
1124+
name="HPTuning",
1125+
step_args=hpo_args,
1126+
cache_config=cache_config,
1127+
)
1128+
1129+
The following parameters from the example cause additional tuning (or training) step iterations when you change them:
1130+
1131+
- :code:`image_uri`: The :code:`image_uri` parameter defines the image used for training, and is used directly in the `AlgorithmSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html>`__ attribute of the training job(s) that are created from the tuning job.
1132+
- :code:`hyperparameters`: All of the hyperparameters passed in the :code:`xgb_train.set_hyperparameters()` method are used directly in the `StaticHyperParameters <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTrainingJobDefinition.html>`__ attribute for the tuning job.
1133+
- The following parameters are all included in the `HyperParameterTuningJobConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobConfig.html>`__ and if any one of them changes, a new tuning job is initiated:
1134+
- :code:`hyperparameter_ranges`
1135+
- :code:`objective_metric_name`
1136+
- :code:`max_jobs`
1137+
- :code:`max_parallel_jobs`
1138+
- :code:`strategy`
1139+
- :code:`objective_type`
1140+
- :code:`inputs`: The inputs are included in any training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ that get created from the tuning job. Local inputs are uploaded to S3. If the S3 path changes, a new tuning job is initiated. For examples of S3 paths, see the S3 Artifact Folder Structure section.
1141+
1142+
The following examples creates a transform step:
1143+
1144+
.. code-block:: python
1145+
1146+
from sagemaker.transformer import Transformer
1147+
from sagemaker.inputs import TransformInput
1148+
from sagemaker.workflow.steps import TransformStep
1149+
1150+
base_uri = f"s3://{default_bucket}/abalone"
1151+
batch_data_uri = sagemaker.s3.S3Uploader.upload(
1152+
local_path=local_path,
1153+
desired_s3_uri=base_uri,
1154+
)
1155+
1156+
batch_data = ParameterString(
1157+
name="BatchData",
1158+
default_value=batch_data_uri,
1159+
)
1160+
1161+
transformer = Transformer(
1162+
model_name=step_create_model.properties.ModelName,
1163+
instance_type="ml.m5.xlarge",
1164+
instance_count=1,
1165+
output_path=f"s3://{default_bucket}/AbaloneTransform",
1166+
env={
1167+
'class': 'Transformer'
1168+
}
1169+
)
1170+
1171+
step_transform = TransformStep(
1172+
name="AbaloneTransform",
1173+
step_args=transformer.transform(
1174+
data=batch_data,
1175+
data_type="S3Prefix"
1176+
)
1177+
)
1178+
1179+
The following parameters from the example cause additional batch transform step iterations when you change them:
1180+
1181+
- :code:`model_name`: The name of the SageMaker model being used for the transform job.
1182+
- :code:`env`: Environment variables to be set for use during the transform job.
1183+
- :code:`batch_data`: The input data will be included in the transform job’s `TransformInputfield <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TransformInput.html>`__. If the S3 path changes, a new transform job is initiated.
1184+
1185+
The following example creates an automl step:
1186+
1187+
.. code-block:: python
1188+
1189+
from sagemaker.workflow.pipeline_context import PipelineSession
1190+
from sagemaker.workflow.automl_step import AutoMLStep
1191+
1192+
pipeline_session = PipelineSession()
1193+
1194+
auto_ml = AutoML(...,
1195+
role=role,
1196+
target_attribute_name="my_target_attribute_name",
1197+
mode="ENSEMBLING",
1198+
sagemaker_session=pipeline_session)
1199+
1200+
input_training = AutoMLInput(
1201+
inputs="s3://amzn-s3-demo-bucket/my-training-data",
1202+
target_attribute_name="my_target_attribute_name",
1203+
channel_type="training",
1204+
)
1205+
input_validation = AutoMLInput(
1206+
inputs="s3://amzn-s3-demo-bucket/my-validation-data",
1207+
target_attribute_name="my_target_attribute_name",
1208+
channel_type="validation",
1209+
)
1210+
1211+
step_args = auto_ml.fit(
1212+
inputs=[input_training, input_validation]
1213+
)
1214+
1215+
step_automl = AutoMLStep(
1216+
name="AutoMLStep",
1217+
step_args=step_args,
1218+
)
1219+
1220+
best_model = step_automl.get_best_auto_ml_model(role=<role>)
1221+
1222+
The following parameters from the example cause additional automl step iterations when you change them:
1223+
1224+
- :code:`target_attribute_name`: The name of the target variable in supervised learning.
1225+
- :code:`mode`: The method that AutoML job uses to train the model - either AUTO, ENSEMBLING or HYPERPARAMETER_TUNING.
1226+
- :code:`inputs`: The inputs passed to the auto_ml.fit() method are included in the automl job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLChannel.html>`__. If the included S3 path(s) change, a new automl job is initiated.
1227+
1228+
The following example creates an EMR step:
1229+
1230+
.. code-block:: python
1231+
1232+
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig
1233+
1234+
emr_config = EMRStepConfig(
1235+
jar="jar-location", # required, path to jar file used
1236+
args=["--verbose", "--force"], # optional list of arguments to pass to the jar
1237+
main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest
1238+
properties=[ # optional list of Java properties that are set when the step runs
1239+
{
1240+
"key": "mapred.tasktracker.map.tasks.maximum",
1241+
"value": "2"
1242+
},
1243+
{
1244+
"key": "mapreduce.map.sort.spill.percent",
1245+
"value": "0.90"
1246+
},
1247+
{
1248+
"key": "mapreduce.tasktracker.reduce.tasks.maximum",
1249+
"value": "5"
1250+
}
1251+
]
1252+
)
1253+
1254+
step_emr = EMRStep(
1255+
name="EMRSampleStep", # required
1256+
cluster_id="j-1ABCDEFG2HIJK", # include cluster_id to use a running cluster
1257+
step_config=emr_config, # required
1258+
display_name="My EMR Step",
1259+
description="Pipeline step to execute EMR job"
1260+
)
1261+
1262+
The following parameters from the example cause additional EMR step iterations when you change them:
1263+
1264+
- :code:`cluster_id`: The id of a running cluster to leverage for the EMR job.
1265+
- :code:`emr_config`: Configuration regarding the code that will run on the EMR cluster during the job.
1266+
1267+
:class:`Note`: A :code:`cluster_config` parameter may also be passed into :code:`EMRStep` in order to spin up a new cluster. This parameter will also trigger additional step iterations if changed.
1268+
1269+
10581270
S3 Artifact Folder Structure
10591271
----------------------------
10601272

0 commit comments

Comments
 (0)