AWS Jar Distribution #373

chewy-zlai · 2025-02-13T18:36:29Z

Summary

Adds AWS build and push commands to the distribution script.

Checklist

Added Unit Tests
Covered by existing CI
Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Introduced an automated quickstart process for GCP deployments.
- Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments.
- Added a new script for running the Zipline quickstart on GCP.
Refactor
- Updated the AWS quickstart process to ensure consistent execution.

coderabbitai · 2025-02-13T18:36:38Z

Walkthrough

This pull request updates the Zipline quickstart and artifact management scripts. The AWS quickstart Python script now calls a renamed shell script. New GCP quickstart scripts have been added, with the Python script invoking a corresponding shell script. The build and upload artifacts script has been refactored to include command-line options, separate upload functions for AWS and GCP, and updated customer ID handling.

Changes

File(s)	Change Summary
`distribution/run_zipline_quickstart_aws.py` `distribution/run_zipline_quickstart_aws.sh`	AWS quickstart: Python script now references `run_zipline_quickstart_aws.sh`; shell script continues to automate the AWS Zipline process.
`distribution/run_zipline_quickstart_gcp.py` `distribution/run_zipline_quickstart_gcp.sh`	GCP quickstart: New Python script creates a temporary directory and calls `run_zipline_quickstart_gcp.sh`; shell script comment updated accordingly.
`distribution/build_and_upload_artifacts.sh`	Refactored build/upload script to add CLI usage, new AWS upload function, and split customer IDs between GCP and AWS.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant P as GCP_Python
    participant S as GCP_Shell
    U->>P: Execute script
    P->>Temp: Create temporary dir
    P->>S: Call shell script with temp dir
    S->>Repo: Clone repository
    S->>Env: Setup virtual environment
    S->>Zip: Run Zipline commands
    S-->>P: Return status
    P-->>U: End execution

sequenceDiagram
    participant U as User
    participant A as BuildScript
    participant AWS as AWS_Uploader
    participant GCP as GCP_Uploader
    U->>A: Run with CLI flags
    A->>A: Parse arguments
    alt Build both clouds
        A->>AWS: Upload AWS artifacts
        A->>GCP: Upload GCP artifacts
    else Build single cloud
        A->>AWS: Upload AWS artifacts
        %% or
        A->>GCP: Upload GCP artifacts
    end
    A-->>U: Return completion status

Possibly related PRs

Add script to run full zipline quickstart suite #292: Updates shell script execution for the Zipline quickstart process, related at the code level.

Poem

In lines of code the changes flow,
AWS and GCP now jointly glow.
Quickstart scripts set the stage anew,
Artifacts built with options true.
Code unites the cloud's embrace –
A merge of paths in a harmonious space! 🚀

Warning

Review ran into problems

🔥 Problems

GitHub Actions and Pipeline Checks: Resource not accessible by integration - https://docs.github.com/rest/actions/workflow-runs#list-workflow-runs-for-a-repository.

Please grant the required permissions to the CodeRabbit GitHub App under the organization or repository settings.

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…wy-zlai/aws_distribution

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (4)

distribution/run_zipline_quickstart_aws.sh (4)

38-38: Avoid masking return values.

-export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+pwd_dir=$(pwd)
+export PYTHONPATH="${PYTHONPATH}:${pwd_dir}"

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)

63-66: Consider consolidating job execution logic.

The job execution stages follow a repetitive pattern. Consider creating a helper function.

+function run_emr_job() {
+    local mode=$1
+    local conf=$2
+    local extra_args=$3
+    local output_file="tmp_${mode}.out"
+    
+    echo -e "${GREEN}<<<<<.....................................$mode.....................................>>>>>\033[0m"
+    touch "$output_file"
+    zipline run --mode "$mode" --conf "$conf" $extra_args --emr 2>&1 | tee /dev/tty "$output_file"
+    local job_id=$(cat "$output_file" | grep "$EMR_SUBMITTER_ID_STR" | cut -d " " -f5)
+    check_emr_job_state "$job_id"
+}

Also applies to: 69-72, 76-79, 82-85, 89-96

47-47: Add timeout to EMR wait command.

The wait command could potentially run indefinitely.

-  aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt
+  timeout 3600 aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt || {
+    echo "EMR job timed out after 1 hour"
+    exit 1
+  }

15-17: Add error handling for table deletion.

Tables might not exist on first run.

-aws glue delete-table --database-name data --name quickstart_purchases_v1_test
-aws glue delete-table --database-name data --name quickstart_purchases_v1_test_upload
+aws glue delete-table --database-name data --name quickstart_purchases_v1_test 2>/dev/null || true
+aws glue delete-table --database-name data --name quickstart_purchases_v1_test_upload 2>/dev/null || true

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 2010a0b and b488a16.

📒 Files selected for processing (6)

distribution/build_and_upload_gcp_and_aws_artifacts.sh (1 hunks)
distribution/build_and_upload_gcp_artifacts.sh (0 hunks)
distribution/run_zipline_quickstart_aws.py (1 hunks)
distribution/run_zipline_quickstart_aws.sh (1 hunks)
distribution/run_zipline_quickstart_gcp.py (1 hunks)
distribution/run_zipline_quickstart_gcp.sh (1 hunks)

💤 Files with no reviewable changes (1)

distribution/build_and_upload_gcp_artifacts.sh

✅ Files skipped from review due to trivial changes (1)

distribution/run_zipline_quickstart_gcp.sh

🧰 Additional context used

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart_aws.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (1)

distribution/run_zipline_quickstart_gcp.py (1)

1-20: LGTM!

Clean implementation that mirrors AWS quickstart structure.

distribution/run_zipline_quickstart_aws.sh

coderabbitai · 2025-02-13T18:40:06Z

distribution/run_zipline_quickstart_aws.sh

+
+
+WORKING_DIR=$1
+cd $WORKING_DIR


⚠️ Potential issue

Add error handling for cd command.

-cd $WORKING_DIR +cd "$WORKING_DIR" || exit 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cd $WORKING_DIR

cd "$WORKING_DIR" || exit 1

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

distribution/run_zipline_quickstart_aws.sh

distribution/run_zipline_quickstart_aws.py

coderabbitai · 2025-02-13T18:40:06Z

distribution/build_and_upload_gcp_and_aws_artifacts.sh

+function upload_to_gcp() {
+  customer_ids_to_upload=("$@")
+  echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
+  select yn in "Yes" "No"; do
+      case $yn in
+          Yes )
+              set -euxo pipefail
+              for element in "${customer_ids_to_upload[@]}"
+              do
+                ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
+                gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH";
+                gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH";
+                gcloud storage cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH"
+                gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH"
+                gcloud storage cp "$FLINK_JAR" "$ELEMENT_JAR_PATH"
+              done
+              echo "Succeeded"
+              break;;
+          No ) break;;
+      esac
+  done
+}


🛠️ Refactor suggestion

Add error handling for cloud CLI commands.

Upload functions should verify cloud CLI tools (gcloud, aws) are installed and configured.

function upload_to_gcp() { + if ! command -v gcloud &> /dev/null; then + echo "gcloud CLI not found" + exit 1 + fi customer_ids_to_upload=("$@") ... } function upload_to_aws() { + if ! command -v aws &> /dev/null; then + echo "aws CLI not found" + exit 1 + fi customer_ids_to_upload=("$@") ... }

Also applies to: 171-192

coderabbitai · 2025-02-13T18:40:06Z

distribution/build_and_upload_gcp_and_aws_artifacts.sh

+thrift --gen py -out api/py/ api/thrift/common.thrift
+thrift --gen py -out api/py/ api/thrift/api.thrift
+thrift --gen py -out api/py/ api/thrift/observability.thrift
+VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py


⚠️ Potential issue

Fix command concatenation.

The VERSION and pip commands are incorrectly concatenated.

-VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py +VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') +pip wheel api/py

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py

VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"')

pip wheel api/py

coderabbitai · 2025-02-13T18:40:06Z

distribution/build_and_upload_gcp_and_aws_artifacts.sh

+    exit 1
+fi
+
+if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then


⚠️ Potential issue

Fix Python version comparison.

Missing $ in variable references.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then +if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then

if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

coderabbitai · 2025-02-13T18:40:06Z

distribution/build_and_upload_gcp_and_aws_artifacts.sh

+# check if $1 (single customer id mode) has been set
+if [ -z "$1" ]; then
+  if [ "$BUILD_AWS" = true ]; then
+    upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
+  fi
+  if [ "$BUILD_GCP" = true ]; then
+    upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
+  fi
+else
+  if [ "$BUILD_AWS" = true ]; then
+    upload_to_aws "$1"
+  fi
+  if [ "$BUILD_GCP" = true ]; then
+    upload_to_gcp "$1"
+  fi
+fi


🛠️ Refactor suggestion

Add input validation for customer ID.

When a single customer ID is provided, validate it exists in the respective arrays.

if [ -z "$1" ]; then ... else + # Validate customer ID + if [ "$BUILD_AWS" = true ] && [[ ! " ${AWS_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then + echo "Invalid AWS customer ID: $1" + exit 1 + fi + if [ "$BUILD_GCP" = true ] && [[ ! " ${GCP_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then + echo "Invalid GCP customer ID: $1" + exit 1 + fi if [ "$BUILD_AWS" = true ]; then upload_to_aws "$1" fi ... fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# check if $1 (single customer id mode) has been set

if [ -z "$1" ]; then

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "${AWS_CUSTOMER_IDS[@]}"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"

fi

else

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "$1"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "$1"

fi

fi

# check if $1 (single customer id mode) has been set

if [ -z "$1" ]; then

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "${AWS_CUSTOMER_IDS[@]}"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"

fi

else

# Validate customer ID

if [ "$BUILD_AWS" = true ] && [[ ! " ${AWS_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then

echo "Invalid AWS customer ID: $1"

exit 1

fi

if [ "$BUILD_GCP" = true ] && [[ ! " ${GCP_CUSTOMER_IDS[@]} " =~ " $1 " ]]; then

echo "Invalid GCP customer ID: $1"

exit 1

fi

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "$1"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "$1"

fi

fi

distribution/build_and_upload_gcp_and_aws_artifacts.sh

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (2)

distribution/build_and_upload_artifacts.sh (2)

18-43: Initialize flags and validate input.

Add flag initialization and mutual exclusivity check.

+# Initialize flags
+BUILD_GCP=false
+BUILD_AWS=false
+
 while [[ $# -gt 0 ]]; do
     case $1 in
         --all)
             BUILD_GCP=true
             BUILD_AWS=true
             shift
             ;;
         --gcp)
+            [[ "$BUILD_AWS" == true ]] && { echo "Error: Cannot combine --gcp with other options"; exit 1; }
             BUILD_GCP=true
             shift
             ;;
         --aws)
+            [[ "$BUILD_GCP" == true ]] && { echo "Error: Cannot combine --aws with other options"; exit 1; }
             BUILD_AWS=true
             shift
             ;;

102-136: Reduce duplication in JAR verification.

Extract JAR verification into a function.

+function verify_jar() {
+    local jar_path="$1"
+    if [ ! -f "$jar_path" ]; then
+        echo "$jar_path not found"
+        exit 1
+    fi
+}
+
 if [ "$BUILD_AWS" = true ]; then
     bazel build //cloud_aws:cloud_aws_lib_deploy.jar
     bazel build //cloud_aws:cloud_aws_submitter_deploy.jar
 
     CLOUD_AWS_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_aws/cloud_aws_lib_deploy.jar"
     CLOUD_AWS_SUBMITTER_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_aws/cloud_aws_submitter_deploy.jar"
-    if [ ! -f "$CLOUD_AWS_JAR" ]; then
-        echo "$CLOUD_AWS_JAR not found"
-        exit 1
-    fi
-    if [ ! -f "$CLOUD_AWS_SUBMITTER_JAR" ]; then
-        echo "$CLOUD_AWS_SUBMITTER_JAR not found"
-        exit 1
-    fi
+    verify_jar "$CLOUD_AWS_JAR"
+    verify_jar "$CLOUD_AWS_SUBMITTER_JAR"

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between b488a16 and 5923037.

📒 Files selected for processing (1)

distribution/build_and_upload_artifacts.sh (1 hunks)

coderabbitai · 2025-02-13T19:07:14Z

distribution/build_and_upload_artifacts.sh

+function upload_to_gcp() {
+  customer_ids_to_upload=("$@")
+  echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
+  select yn in "Yes" "No"; do
+      case $yn in
+          Yes )
+              set -euxo pipefail
+              for element in "${customer_ids_to_upload[@]}"
+              do
+                ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
+                gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH";
+                gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH";
+                gcloud storage cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH"
+                gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH"
+                gcloud storage cp "$FLINK_JAR" "$ELEMENT_JAR_PATH"
+              done
+              echo "Succeeded"
+              break;;
+          No ) break;;
+      esac
+  done
+}


🛠️ Refactor suggestion

Add error handling and upload verification.

Upload functions need better error handling.

function upload_to_gcp() { customer_ids_to_upload=("$@") echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}" select yn in "Yes" "No"; do case $yn in Yes ) set -euxo pipefail + local failed_uploads=() for element in "${customer_ids_to_upload[@]}" do ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars - gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH"; + if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH"; then + failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH") + fi + # Verify upload + if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then + failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH") + fi # ... Similar changes for other uploads done + if [ ${#failed_uploads[@]} -ne 0 ]; then + echo "Failed uploads:" + printf '%s\n' "${failed_uploads[@]}" + exit 1 + fi echo "Succeeded"

Also applies to: 171-192

coderabbitai · 2025-02-13T19:07:14Z

distribution/build_and_upload_artifacts.sh

+VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py
+EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"


🛠️ Refactor suggestion

Use captured version in wheel name.

VERSION is captured but not used in wheel name check.

-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl" +EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py

EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"

VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py

EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"

coderabbitai · 2025-02-13T19:07:14Z

distribution/build_and_upload_artifacts.sh

+if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+    echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
+    exit 1
+fi


⚠️ Potential issue

Fix Python version comparison.

Missing $ before variables causes incorrect version check.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then +if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then

echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"

exit 1

fi

if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"

exit 1

fi

coderabbitai · 2025-02-13T19:07:15Z

distribution/build_and_upload_artifacts.sh

+# check if $1 (single customer id mode) has been set
+if [ -z "$1" ]; then
+  if [ "$BUILD_AWS" = true ]; then
+    upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
+  fi
+  if [ "$BUILD_GCP" = true ]; then
+    upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
+  fi
+else
+  if [ "$BUILD_AWS" = true ]; then
+    upload_to_aws "$1"
+  fi
+  if [ "$BUILD_GCP" = true ]; then
+    upload_to_gcp "$1"
+  fi
+fi


⚠️ Potential issue

Fix customer ID handling.

$1 check is incorrect as it contains the last processed argument.

-# check if $1 (single customer id mode) has been set -if [ -z "$1" ]; then +# Store customer ID before processing flags +CUSTOMER_ID="$1" + +if [ -z "$CUSTOMER_ID" ]; then if [ "$BUILD_AWS" = true ]; then upload_to_aws "${AWS_CUSTOMER_IDS[@]}" fi if [ "$BUILD_GCP" = true ]; then upload_to_gcp "${GCP_CUSTOMER_IDS[@]}" fi else if [ "$BUILD_AWS" = true ]; then - upload_to_aws "$1" + upload_to_aws "$CUSTOMER_ID" fi if [ "$BUILD_GCP" = true ]; then - upload_to_gcp "$1" + upload_to_gcp "$CUSTOMER_ID" fi fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# check if $1 (single customer id mode) has been set

if [ -z "$1" ]; then

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "${AWS_CUSTOMER_IDS[@]}"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"

fi

else

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "$1"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "$1"

fi

fi

# Store customer ID before processing flags

CUSTOMER_ID="$1"

if [ -z "$CUSTOMER_ID" ]; then

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "${AWS_CUSTOMER_IDS[@]}"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"

fi

else

if [ "$BUILD_AWS" = true ]; then

upload_to_aws "$CUSTOMER_ID"

fi

if [ "$BUILD_GCP" = true ]; then

upload_to_gcp "$CUSTOMER_ID"

fi

fi

distribution/run_zipline_quickstart_aws.sh

tchow-zlai · 2025-02-13T19:13:55Z

distribution/run_zipline_quickstart_aws.sh

+  fi
+}
+
+EMR_SUBMITTER_ID_STR="EMR submitter job id"


not sure if this is going to work, we may not get the equivalent log output that we see in dataproc. We should use livy and we'll get a job id in the response payload when we actually submit the job.

tchow-zlai · 2025-02-13T19:15:48Z

distribution/run_zipline_quickstart_aws.py

@@ -8,9 +8,9 @@ def main():
        print(f"Created temporary directory: {temp_dir}")

        quickstart_sh = os.path.join(os.path.dirname(os.path.realpath(__file__))


how feasible would it be for us to just consolidate the bash and the python into just python? could we replicate most of the logic here with python subprocess where needed?

@david-zlai Did you try this? Does this seem reasonable?

…to gcp

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

distribution/build_and_upload_artifacts.sh (5)

66-69: ⚠️ Potential issue

Fix Python version comparison.

Missing $ before variables causes incorrect version check.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

75-76: ⚠️ Potential issue

Use captured version in wheel name.

VERSION is captured but not used in wheel name check.

-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"
+EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"

145-169: ⚠️ Potential issue

Add error handling and upload verification for GCP uploads.

Upload function needs better error handling.

 function upload_to_gcp() {
   gcloud config set storage/parallel_composite_upload_enabled False
   customer_ids_to_upload=("$@")
   echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
   select yn in "Yes" "No"; do
       case $yn in
           Yes )
               set -euxo pipefail
+              local failed_uploads=()
               for element in "${customer_ids_to_upload[@]}"
               do
                 ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
-                gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"
+                if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"; then
+                    failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH")
+                fi
+                # Verify upload
+                if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then
+                    failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH")
+                fi
                 # Similar changes for other uploads
               done
+              if [ ${#failed_uploads[@]} -ne 0 ]; then
+                  echo "Failed uploads:"
+                  printf '%s\n' "${failed_uploads[@]}"
+                  exit 1
+              fi

174-195: ⚠️ Potential issue

Add error handling and upload verification for AWS uploads.

Upload function needs better error handling.

 function upload_to_aws() {
   customer_ids_to_upload=("$@")
   echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
   select yn in "Yes" "No"; do
       case $yn in
           Yes )
               set -euxo pipefail
+              local failed_uploads=()
               for element in "${customer_ids_to_upload[@]}"
               do
                 ELEMENT_JAR_PATH=s3://zipline-artifacts-$element/jars
-                aws s3 cp "$CLOUD_AWS_JAR" "$ELEMENT_JAR_PATH"
+                if ! aws s3 cp "$CLOUD_AWS_JAR" "$ELEMENT_JAR_PATH"; then
+                    failed_uploads+=("$CLOUD_AWS_JAR to $ELEMENT_JAR_PATH")
+                fi
+                # Verify upload
+                if ! aws s3 ls "$ELEMENT_JAR_PATH/$(basename $CLOUD_AWS_JAR)" >/dev/null 2>&1; then
+                    failed_uploads+=("Failed to verify $CLOUD_AWS_JAR in $ELEMENT_JAR_PATH")
+                fi
                 # Similar changes for other uploads
               done
+              if [ ${#failed_uploads[@]} -ne 0 ]; then
+                  echo "Failed uploads:"
+                  printf '%s\n' "${failed_uploads[@]}"
+                  exit 1
+              fi

197-212: ⚠️ Potential issue

Fix customer ID handling.

$1 check is incorrect as it contains the last processed argument.

-# check if $1 (single customer id mode) has been set
-if [ -z "$1" ]; then
+# Store customer ID before processing flags
+CUSTOMER_ID="$1"
+
+if [ -z "$CUSTOMER_ID" ]; then
   if [ "$BUILD_AWS" = true ]; then
     upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
   fi
   if [ "$BUILD_GCP" = true ]; then
     upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
   fi
 else
   if [ "$BUILD_AWS" = true ]; then
-    upload_to_aws "$1"
+    upload_to_aws "$CUSTOMER_ID"
   fi
   if [ "$BUILD_GCP" = true ]; then
-    upload_to_gcp "$1"
+    upload_to_gcp "$CUSTOMER_ID"
   fi
 fi

🧹 Nitpick comments (1)

distribution/build_and_upload_artifacts.sh (1)
18-43: Initialize BUILD_GCP and BUILD_AWS variables.

Add explicit initialization before the while loop.
+BUILD_GCP=false
+BUILD_AWS=false
while [[ $# -gt 0 ]]; do

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5923037 and da1376b.

📒 Files selected for processing (1)

distribution/build_and_upload_artifacts.sh (1 hunks)

…to aws

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (4)

distribution/build_and_upload_artifacts.sh (4)

66-69: ⚠️ Potential issue

Fix Python version comparison.

Missing $ before variables causes incorrect version check.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then

75-76: ⚠️ Potential issue

Use captured version in wheel name.

VERSION is captured but not used in wheel name check.

-EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"
+EXPECTED_ZIPLINE_WHEEL="zipline_ai-${VERSION}-py3-none-any.whl"

145-169: ⚠️ Potential issue

Add error handling and upload verification.

Upload functions need better error handling.

 function upload_to_gcp() {
   customer_ids_to_upload=("$@")
   echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
   select yn in "Yes" "No"; do
       case $yn in
           Yes )
               set -euxo pipefail
+              local failed_uploads=()
               for element in "${customer_ids_to_upload[@]}"
               do
                 ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
-                gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"
+                if ! gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date)"; then
+                    failed_uploads+=("$CLOUD_GCP_JAR to $ELEMENT_JAR_PATH")
+                fi
+                # Verify upload
+                if ! gcloud storage stat "$ELEMENT_JAR_PATH/$(basename $CLOUD_GCP_JAR)" >/dev/null 2>&1; then
+                    failed_uploads+=("Failed to verify $CLOUD_GCP_JAR in $ELEMENT_JAR_PATH")
+                fi
                 # ... Similar changes for other uploads
               done
+              if [ ${#failed_uploads[@]} -ne 0 ]; then
+                  echo "Failed uploads:"
+                  printf '%s\n' "${failed_uploads[@]}"
+                  exit 1
+              fi

Also applies to: 174-195

197-212: ⚠️ Potential issue

Fix customer ID handling.

$1 check is incorrect as it contains the last processed argument.

-# check if $1 (single customer id mode) has been set
-if [ -z "$1" ]; then
+# Store customer ID before processing flags
+CUSTOMER_ID="$1"
+
+if [ -z "$CUSTOMER_ID" ]; then
   if [ "$BUILD_AWS" = true ]; then
     upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
   fi
   if [ "$BUILD_GCP" = true ]; then
     upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
   fi
 else
   if [ "$BUILD_AWS" = true ]; then
-    upload_to_aws "$1"
+    upload_to_aws "$CUSTOMER_ID"
   fi
   if [ "$BUILD_GCP" = true ]; then
-    upload_to_gcp "$1"
+    upload_to_gcp "$CUSTOMER_ID"
   fi
 fi

🧹 Nitpick comments (2)

distribution/build_and_upload_artifacts.sh (2)
18-43: Initialize BUILD_GCP and BUILD_AWS variables.

Initialize variables before the while loop to prevent unbound variable errors.
+BUILD_GCP=false
+BUILD_AWS=false
+
 while [[ $# -gt 0 ]]; do
72-74: Add error handling for Thrift generation.

Check if Thrift commands succeed.
+set -e
 thrift --gen py -out api/py/ api/thrift/common.thrift
 thrift --gen py -out api/py/ api/thrift/api.thrift
 thrift --gen py -out api/py/ api/thrift/observability.thrift
+set +e

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between da1376b and e0709d5.

📒 Files selected for processing (1)

distribution/build_and_upload_artifacts.sh (1 hunks)

…wy-zlai/aws_distribution

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

distribution/run_zipline_quickstart_aws.sh (2)
7-8: ⚠️ Potential issue

Ensure proper directory change handling.
Quote the variable and exit if the directory change fails.
- cd $WORKING_DIR
+ cd "$WORKING_DIR" || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

20-21: ⚠️ Potential issue

Add error handling after cloning configs.
Quote the directory name and check the cd command’s success.
- cd cananry-confs
+ cd "cananry-confs" || exit 1
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

🧹 Nitpick comments (3)

distribution/run_zipline_quickstart_aws.sh (3)
10-11: Remove or use unused variable RED.
If not used, consider removing it to avoid confusion.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

38-38: Separate PYTHONPATH assignment and export.
Avoid masking return values by splitting the command.
- export PYTHONPATH="${PYTHONPATH}:$(pwd)"
+ PYTHONPATH="${PYTHONPATH}:$(pwd)"
+ export PYTHONPATH
🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)

47-50: Quote variables in AWS commands.
Wrap $STEP_ID in quotes to handle any spaces or special characters.
- aws emr wait step-complete --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt
+ aws emr wait step-complete --cluster-id zipline-canary-emr --step-id "$STEP_ID" --no-cli-auto-prompt

- JOB_STATE=$(aws emr describe-step --cluster-id zipline-canary-emr --step-id $STEP_ID --no-cli-auto-prompt --output text | grep "State")
+ JOB_STATE=$(aws emr describe-step --cluster-id zipline-canary-emr --step-id "$STEP_ID" --no-cli-auto-prompt --output text | grep "State")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 8ddc766 and fbd51f0.

📒 Files selected for processing (2)

distribution/run_zipline_quickstart_aws.py (1 hunks)
distribution/run_zipline_quickstart_aws.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

distribution/run_zipline_quickstart_aws.py

🧰 Additional context used

🪛 Shellcheck (0.10.0)

distribution/run_zipline_quickstart_aws.sh

[warning] 8-8: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 11-11: RED appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 21-21: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[warning] 38-38: Declare and assign separately to avoid masking return values.

(SC2155)

…wy-zlai/aws_distribution

## Summary Adds AWS build and push commands to the distribution script. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ ] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit - **New Features** - Introduced an automated quickstart process for GCP deployments. - Enhanced the build and upload tool with flexible command-line options, supporting artifact creation for both AWS and GCP environments. - Added a new script for running the Zipline quickstart on GCP. - **Refactor** - Updated the AWS quickstart process to ensure consistent execution.

This reverts commit a3e8793.

Reverts #373 Passing in options to push to only one customer is broken.  ## Summary by CodeRabbit - **Refactor** - Streamlined the deployment process to automatically build and upload artifacts exclusively to Google Cloud Platform. - Removed configuration options and handling for an alternative cloud provider, resulting in a simpler, more focused workflow.