-
Notifications
You must be signed in to change notification settings - Fork 1
AWS Jar Distribution #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Jar Distribution #373
Conversation
WalkthroughThis pull request updates the Zipline quickstart and artifact management scripts. The AWS quickstart Python script now calls a renamed shell script. New GCP quickstart scripts have been added, with the Python script invoking a corresponding shell script. The build and upload artifacts script has been refactored to include command-line options, separate upload functions for AWS and GCP, and updated customer ID handling. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant P as GCP_Python
participant S as GCP_Shell
U->>P: Execute script
P->>Temp: Create temporary dir
P->>S: Call shell script with temp dir
S->>Repo: Clone repository
S->>Env: Setup virtual environment
S->>Zip: Run Zipline commands
S-->>P: Return status
P-->>U: End execution
sequenceDiagram
participant U as User
participant A as BuildScript
participant AWS as AWS_Uploader
participant GCP as GCP_Uploader
U->>A: Run with CLI flags
A->>A: Parse arguments
alt Build both clouds
A->>AWS: Upload AWS artifacts
A->>GCP: Upload GCP artifacts
else Build single cloud
A->>AWS: Upload AWS artifacts
%% or
A->>GCP: Upload GCP artifacts
end
A-->>U: Return completion status
Possibly related PRs
Poem
Warning Review ran into problems🔥 ProblemsGitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository. Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
…wy-zlai/aws_distribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (4)
distribution/run_zipline_quickstart_aws.sh (4)
38-38
: Avoid masking return values.-export PYTHONPATH="${PYTHONPATH}:$(pwd)" +pwd_dir=$(pwd) +export PYTHONPATH="${PYTHONPATH}:${pwd_dir}"🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
63-66
: Consider consolidating job execution logic.The job execution stages follow a repetitive pattern. Consider creating a helper function.
+function run_emr_job() { + local mode=$1 + local conf=$2 + local extra_args=$3 + local output_file="tmp_${mode}.out" + + echo -e "${GREEN}<<<<<.....................................$mode.....................................>>>>>\033[0m" + touch "$output_file" + zipline run --mode "$mode" --conf "$conf" $extra_args --emr 2>&1 | tee /dev/tty "$output_file" + local job_id=$(cat "$output_file" | grep "$EMR_SUBMITTER_ID_STR" | cut -d " " -f5) + check_emr_job_state "$job_id" +}Also applies to: 69-72, 76-79, 82-85, 89-96
47-47
: Add timeout to EMR wait command.The wait command could potentially run indefinitely.
- aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt + timeout 3600 aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt || { + echo "EMR job timed out after 1 hour" + exit 1 + }
15-17
: Add error handling for table deletion.Tables might not exist on first run.
-aws glue delete-table --database-name data --name quickstart_purchases_v1_test -aws glue delete-table --database-name data --name quickstart_purchases_v1_test_upload +aws glue delete-table --database-name data --name quickstart_purchases_v1_test 2>/dev/null || true +aws glue delete-table --database-name data --name quickstart_purchases_v1_test_upload 2>/dev/null || true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (6)
distribution/build_and_upload_gcp_and_aws_artifacts.sh
(1 hunks)distribution/build_and_upload_gcp_artifacts.sh
(0 hunks)distribution/run_zipline_quickstart_aws.py
(1 hunks)distribution/run_zipline_quickstart_aws.sh
(1 hunks)distribution/run_zipline_quickstart_gcp.py
(1 hunks)distribution/run_zipline_quickstart_gcp.sh
(1 hunks)
💤 Files with no reviewable changes (1)
- distribution/build_and_upload_gcp_artifacts.sh
✅ Files skipped from review due to trivial changes (1)
- distribution/run_zipline_quickstart_gcp.sh
🧰 Additional context used
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart_aws.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
🔇 Additional comments (1)
distribution/run_zipline_quickstart_gcp.py (1)
1-20
: LGTM!Clean implementation that mirrors AWS quickstart structure.
|
||
|
||
WORKING_DIR=$1 | ||
cd $WORKING_DIR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for cd command.
-cd $WORKING_DIR
+cd "$WORKING_DIR" || exit 1
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
cd $WORKING_DIR | |
cd "$WORKING_DIR" || exit 1 |
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
function upload_to_gcp() { | ||
customer_ids_to_upload=("$@") | ||
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" | ||
select yn in "Yes" "No"; do | ||
case $yn in | ||
Yes ) | ||
set -euxo pipefail | ||
for element in "${customer_ids_to_upload[@]}" | ||
do | ||
ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars | ||
gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH"; | ||
gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH"; | ||
gcloud storage cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH" | ||
gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH" | ||
gcloud storage cp "$FLINK_JAR" "$ELEMENT_JAR_PATH" | ||
done | ||
echo "Succeeded" | ||
break;; | ||
No ) break;; | ||
esac | ||
done | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for cloud CLI commands.
Upload functions should verify cloud CLI tools (gcloud, aws) are installed and configured.
function upload_to_gcp() {
+ if ! command -v gcloud &> /dev/null; then
+ echo "gcloud CLI not found"
+ exit 1
+ fi
customer_ids_to_upload=("$@")
...
}
function upload_to_aws() {
+ if ! command -v aws &> /dev/null; then
+ echo "aws CLI not found"
+ exit 1
+ fi
customer_ids_to_upload=("$@")
...
}
Also applies to: 171-192
thrift --gen py -out api/py/ api/thrift/common.thrift | ||
thrift --gen py -out api/py/ api/thrift/api.thrift | ||
thrift --gen py -out api/py/ api/thrift/observability.thrift | ||
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix command concatenation.
The VERSION and pip commands are incorrectly concatenated.
-VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py
+VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"')
+pip wheel api/py
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py | |
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') | |
pip wheel api/py |
exit 1 | ||
fi | ||
|
||
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Python version comparison.
Missing $
in variable references.
-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then | |
if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then |
# check if $1 (single customer id mode) has been set | ||
if [ -z "$1" ]; then | ||
if [ "$BUILD_AWS" = true ]; then | ||
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | ||
fi | ||
if [ "$BUILD_GCP" = true ]; then | ||
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | ||
fi | ||
else | ||
if [ "$BUILD_AWS" = true ]; then | ||
upload_to_aws "$1" | ||
fi | ||
if [ "$BUILD_GCP" = true ]; then | ||
upload_to_gcp "$1" | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add input validation for customer ID.
When a single customer ID is provided, validate it exists in the respective arrays.
if [ -z "$1" ]; then
...
else
+ # Validate customer ID
+ if [ "$BUILD_AWS" = true ] && [[ ! " ${AWS_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then
+ echo "Invalid AWS customer ID: $1"
+ exit 1
+ fi
+ if [ "$BUILD_GCP" = true ] && [[ ! " ${GCP_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then
+ echo "Invalid GCP customer ID: $1"
+ exit 1
+ fi
if [ "$BUILD_AWS" = true ]; then
upload_to_aws "$1"
fi
...
fi
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# check if $1 (single customer id mode) has been set | |
if [ -z "$1" ]; then | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | |
fi | |
else | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "$1" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "$1" | |
fi | |
fi | |
# check if $1 (single customer id mode) has been set | |
if [ -z "$1" ]; then | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | |
fi | |
else | |
# Validate customer ID | |
if [ "$BUILD_AWS" = true ] && [[ ! " ${AWS_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then | |
echo "Invalid AWS customer ID: $1" | |
exit 1 | |
fi | |
if [ "$BUILD_GCP" = true ] && [[ ! " ${GCP_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then | |
echo "Invalid GCP customer ID: $1" | |
exit 1 | |
fi | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "$1" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "$1" | |
fi | |
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
distribution/build_and_upload_artifacts.sh (2)
18-43
: Initialize flags and validate input.Add flag initialization and mutual exclusivity check.
+# Initialize flags +BUILD_GCP=false +BUILD_AWS=false + while [[ $# -gt 0 ]]; do case $1 in --all) BUILD_GCP=true BUILD_AWS=true shift ;; --gcp) + [[ "$BUILD_AWS" == true ]] && { echo "Error: Cannot combine --gcp with other options"; exit 1; } BUILD_GCP=true shift ;; --aws) + [[ "$BUILD_GCP" == true ]] && { echo "Error: Cannot combine --aws with other options"; exit 1; } BUILD_AWS=true shift ;;
102-136
: Reduce duplication in JAR verification.Extract JAR verification into a function.
+function verify_jar() { + local jar_path="$1" + if [ ! -f "$jar_path" ]; then + echo "$jar_path not found" + exit 1 + fi +} + if [ "$BUILD_AWS" = true ]; then bazel build //cloud_aws:cloud_aws_lib_deploy.jar bazel build //cloud_aws:cloud_aws_submitter_deploy.jar CLOUD_AWS_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_aws/cloud_aws_lib_deploy.jar" CLOUD_AWS_SUBMITTER_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_aws/cloud_aws_submitter_deploy.jar" - if [ ! -f "$CLOUD_AWS_JAR" ]; then - echo "$CLOUD_AWS_JAR not found" - exit 1 - fi - if [ ! -f "$CLOUD_AWS_SUBMITTER_JAR" ]; then - echo "$CLOUD_AWS_SUBMITTER_JAR not found" - exit 1 - fi + verify_jar "$CLOUD_AWS_JAR" + verify_jar "$CLOUD_AWS_SUBMITTER_JAR"
function upload_to_gcp() { | ||
customer_ids_to_upload=("$@") | ||
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" | ||
select yn in "Yes" "No"; do | ||
case $yn in | ||
Yes ) | ||
set -euxo pipefail | ||
for element in "${customer_ids_to_upload[@]}" | ||
do | ||
ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars | ||
gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH"; | ||
gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH"; | ||
gcloud storage cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH" | ||
gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH" | ||
gcloud storage cp "$FLINK_JAR" "$ELEMENT_JAR_PATH" | ||
done | ||
echo "Succeeded" | ||
break;; | ||
No ) break;; | ||
esac | ||
done | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling and upload verification.
Upload functions need better error handling.
function upload_to_gcp() {
customer_ids_to_upload=("$@")
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
select yn in "Yes" "No"; do
case $yn in
Yes )
set -euxo pipefail
+ local failed_uploads=()
for element in "${customer_ids_to_upload[@]}"
do
ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
- gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH";
+ if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH"; then
+ failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH")
+ fi
+ # Verify upload
+ if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then
+ failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH")
+ fi
# ... Similar changes for other uploads
done
+ if [ ${#failed_uploads[@]} -ne 0 ]; then
+ echo "Failed uploads:"
+ printf '%s\n' "${failed_uploads[@]}"
+ exit 1
+ fi
echo "Succeeded"
Also applies to: 171-192
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py | ||
EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use captured version in wheel name.
VERSION is captured but not used in wheel name check.
-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"
+EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py | |
EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl" | |
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py | |
EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl" |
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then | ||
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)" | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Python version comparison.
Missing $ before variables causes incorrect version check.
-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then | |
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)" | |
exit 1 | |
fi | |
if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then | |
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)" | |
exit 1 | |
fi |
# check if $1 (single customer id mode) has been set | ||
if [ -z "$1" ]; then | ||
if [ "$BUILD_AWS" = true ]; then | ||
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | ||
fi | ||
if [ "$BUILD_GCP" = true ]; then | ||
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | ||
fi | ||
else | ||
if [ "$BUILD_AWS" = true ]; then | ||
upload_to_aws "$1" | ||
fi | ||
if [ "$BUILD_GCP" = true ]; then | ||
upload_to_gcp "$1" | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix customer ID handling.
$1 check is incorrect as it contains the last processed argument.
-# check if $1 (single customer id mode) has been set
-if [ -z "$1" ]; then
+# Store customer ID before processing flags
+CUSTOMER_ID="$1"
+
+if [ -z "$CUSTOMER_ID" ]; then
if [ "$BUILD_AWS" = true ]; then
upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
fi
if [ "$BUILD_GCP" = true ]; then
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
fi
else
if [ "$BUILD_AWS" = true ]; then
- upload_to_aws "$1"
+ upload_to_aws "$CUSTOMER_ID"
fi
if [ "$BUILD_GCP" = true ]; then
- upload_to_gcp "$1"
+ upload_to_gcp "$CUSTOMER_ID"
fi
fi
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# check if $1 (single customer id mode) has been set | |
if [ -z "$1" ]; then | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | |
fi | |
else | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "$1" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "$1" | |
fi | |
fi | |
# Store customer ID before processing flags | |
CUSTOMER_ID="$1" | |
if [ -z "$CUSTOMER_ID" ]; then | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "${AWS_CUSTOMER_IDS[@]}" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" | |
fi | |
else | |
if [ "$BUILD_AWS" = true ]; then | |
upload_to_aws "$CUSTOMER_ID" | |
fi | |
if [ "$BUILD_GCP" = true ]; then | |
upload_to_gcp "$CUSTOMER_ID" | |
fi | |
fi |
fi | ||
} | ||
|
||
EMR_SUBMITTER_ID_STR="EMR submitter job id" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this is going to work, we may not get the equivalent log output that we see in dataproc. We should use livy and we'll get a job id in the response payload when we actually submit the job.
@@ -8,9 +8,9 @@ def main(): | |||
print(f"Created temporary directory: {temp_dir}") | |||
|
|||
quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how feasible would it be for us to just consolidate the bash and the python into just python? could we replicate most of the logic here with python subprocess where needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@david-zlai Did you try this? Does this seem reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
distribution/build_and_upload_artifacts.sh (5)
66-69
:⚠️ Potential issueFix Python version comparison.
Missing $ before variables causes incorrect version check.
-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then +if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
75-76
:⚠️ Potential issueUse captured version in wheel name.
VERSION is captured but not used in wheel name check.
-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl" +EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"
145-169
:⚠️ Potential issueAdd error handling and upload verification for GCP uploads.
Upload function needs better error handling.
function upload_to_gcp() { gcloud config set storage/parallel_composite_upload_enabled False customer_ids_to_upload=("$@") echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" select yn in "Yes" "No"; do case $yn in Yes ) set -euxo pipefail + local failed_uploads=() for element in "${customer_ids_to_upload[@]}" do ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars - gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)" + if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"; then + failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH") + fi + # Verify upload + if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then + failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH") + fi # Similar changes for other uploads done + if [ ${#failed_uploads[@]} -ne 0 ]; then + echo "Failed uploads:" + printf '%s\n' "${failed_uploads[@]}" + exit 1 + fi
174-195
:⚠️ Potential issueAdd error handling and upload verification for AWS uploads.
Upload function needs better error handling.
function upload_to_aws() { customer_ids_to_upload=("$@") echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" select yn in "Yes" "No"; do case $yn in Yes ) set -euxo pipefail + local failed_uploads=() for element in "${customer_ids_to_upload[@]}" do ELEMENT_JAR_PATH=s3://zipline-artifacts-$element/jars - aws s3 cp "$CLOUD_AWS_JAR" "$ELEMENT_JAR_PATH" + if ! aws s3 cp "$CLOUD_AWS_JAR" "$ELEMENT_JAR_PATH"; then + failed_uploads+=("$CLOUD_AWS_JAR to $ELEMENT_JAR_PATH") + fi + # Verify upload + if ! aws s3 ls "$ELEMENT_JAR_PATH/$(basename $CLOUD_AWS_JAR)" >/dev/null 2>&1; then + failed_uploads+=("Failed to verify $CLOUD_AWS_JAR in $ELEMENT_JAR_PATH") + fi # Similar changes for other uploads done + if [ ${#failed_uploads[@]} -ne 0 ]; then + echo "Failed uploads:" + printf '%s\n' "${failed_uploads[@]}" + exit 1 + fi
197-212
:⚠️ Potential issueFix customer ID handling.
$1 check is incorrect as it contains the last processed argument.
-# check if $1 (single customer id mode) has been set -if [ -z "$1" ]; then +# Store customer ID before processing flags +CUSTOMER_ID="$1" + +if [ -z "$CUSTOMER_ID" ]; then if [ "$BUILD_AWS" = true ]; then upload_to_aws "${AWS_CUSTOMER_IDS[@]}" fi if [ "$BUILD_GCP" = true ]; then upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" fi else if [ "$BUILD_AWS" = true ]; then - upload_to_aws "$1" + upload_to_aws "$CUSTOMER_ID" fi if [ "$BUILD_GCP" = true ]; then - upload_to_gcp "$1" + upload_to_gcp "$CUSTOMER_ID" fi fi
🧹 Nitpick comments (1)
distribution/build_and_upload_artifacts.sh (1)
18-43
: Initialize BUILD_GCP and BUILD_AWS variables.Add explicit initialization before the while loop.
+BUILD_GCP=false +BUILD_AWS=false while [[ $# -gt 0 ]]; do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
distribution/build_and_upload_artifacts.sh (4)
66-69
:⚠️ Potential issueFix Python version comparison.
Missing $ before variables causes incorrect version check.
-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then +if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
75-76
:⚠️ Potential issueUse captured version in wheel name.
VERSION is captured but not used in wheel name check.
-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl" +EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"
145-169
:⚠️ Potential issueAdd error handling and upload verification.
Upload functions need better error handling.
function upload_to_gcp() { customer_ids_to_upload=("$@") echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" select yn in "Yes" "No"; do case $yn in Yes ) set -euxo pipefail + local failed_uploads=() for element in "${customer_ids_to_upload[@]}" do ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars - gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)" + if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"; then + failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH") + fi + # Verify upload + if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then + failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH") + fi # ... Similar changes for other uploads done + if [ ${#failed_uploads[@]} -ne 0 ]; then + echo "Failed uploads:" + printf '%s\n' "${failed_uploads[@]}" + exit 1 + fiAlso applies to: 174-195
197-212
:⚠️ Potential issueFix customer ID handling.
$1 check is incorrect as it contains the last processed argument.
-# check if $1 (single customer id mode) has been set -if [ -z "$1" ]; then +# Store customer ID before processing flags +CUSTOMER_ID="$1" + +if [ -z "$CUSTOMER_ID" ]; then if [ "$BUILD_AWS" = true ]; then upload_to_aws "${AWS_CUSTOMER_IDS[@]}" fi if [ "$BUILD_GCP" = true ]; then upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" fi else if [ "$BUILD_AWS" = true ]; then - upload_to_aws "$1" + upload_to_aws "$CUSTOMER_ID" fi if [ "$BUILD_GCP" = true ]; then - upload_to_gcp "$1" + upload_to_gcp "$CUSTOMER_ID" fi fi
🧹 Nitpick comments (2)
distribution/build_and_upload_artifacts.sh (2)
18-43
: Initialize BUILD_GCP and BUILD_AWS variables.Initialize variables before the while loop to prevent unbound variable errors.
+BUILD_GCP=false +BUILD_AWS=false + while [[ $# -gt 0 ]]; do
72-74
: Add error handling for Thrift generation.Check if Thrift commands succeed.
+set -e thrift --gen py -out api/py/ api/thrift/common.thrift thrift --gen py -out api/py/ api/thrift/api.thrift thrift --gen py -out api/py/ api/thrift/observability.thrift +set +e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
distribution/run_zipline_quickstart_aws.sh (2)
7-8
:⚠️ Potential issueEnsure proper directory change handling.
Quote the variable and exit if the directory change fails.- cd $WORKING_DIR + cd "$WORKING_DIR" || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
20-21
:⚠️ Potential issueAdd error handling after cloning configs.
Quote the directory name and check the cd command’s success.- cd cananry-confs + cd "cananry-confs" || exit 1🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
🧹 Nitpick comments (3)
distribution/run_zipline_quickstart_aws.sh (3)
10-11
: Remove or use unused variable RED.
If not used, consider removing it to avoid confusion.🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
38-38
: Separate PYTHONPATH assignment and export.
Avoid masking return values by splitting the command.- export PYTHONPATH="${PYTHONPATH}:$(pwd)" + PYTHONPATH="${PYTHONPATH}:$(pwd)" + export PYTHONPATH🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
47-50
: Quote variables in AWS commands.
Wrap $STEP_ID in quotes to handle any spaces or special characters.- aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt + aws emr wait step-complete --cluster-id zipline-canary-emr --step-id "$STEP_ID" --no-cli-auto-prompt - JOB_STATE=$(aws emr describe-step --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt --output text | grep "State") + JOB_STATE=$(aws emr describe-step --cluster-id zipline-canary-emr --step-id "$STEP_ID" --no-cli-auto-prompt --output text | grep "State")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
distribution/run_zipline_quickstart_aws.py
(1 hunks)distribution/run_zipline_quickstart_aws.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- distribution/run_zipline_quickstart_aws.py
🧰 Additional context used
🪛 Shellcheck (0.10.0)
distribution/run_zipline_quickstart_aws.sh
[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 11-11: RED appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[warning] 38-38: Declare and assign separately to avoid masking return values.
(SC2155)
…wy-zlai/aws_distribution
…wy-zlai/aws_distribution
## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This reverts commit a3e8793.
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Adds AWS build and push commands to the distribution script. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced an automated quiour clientsstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quiour clientsstart on GCP. - **Refactor** - Updated the AWS quiour clientsstart process to ensure consistent execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Reverts #373 Passing in options to push to only one customer is broken. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Adds AWS build and push commands to the distribution script.
Checklist
Summary by CodeRabbit
New Features
Refactor