Skip to content

Add AWS to Artifact Distribution Script #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 24, 2025
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
8527bce
Adding AWS version of distribution scripts
chewy-zlai Feb 12, 2025
5176468
use --output text for describe-step
chewy-zlai Feb 12, 2025
b488a16
Merge branch 'main' of https://github.com/zipline-ai/chronon into che…
chewy-zlai Feb 13, 2025
5923037
renme script more concisely
chewy-zlai Feb 13, 2025
da1376b
Add custom metadata to track who has uploaded and when for artifacts …
chewy-zlai Feb 13, 2025
e0709d5
Add custom metadata to track who has uploaded and when for artifacts …
chewy-zlai Feb 13, 2025
bc2ff7f
Merge branch 'main' of https://github.com/zipline-ai/chronon into che…
chewy-zlai Feb 14, 2025
8ddc766
merge in https://github.com/zipline-ai/chronon/pull/385
chewy-zlai Feb 14, 2025
fbd51f0
Clean up some dataproc references from aws quickstart
chewy-zlai Feb 14, 2025
81cea73
split change to upload artifacts from quickstart scripts
chewy-zlai Feb 20, 2025
4bf5a2a
Merge branch 'main' of https://github.com/zipline-ai/chronon into che…
chewy-zlai Feb 20, 2025
9242caa
Merge branch 'main' of https://github.com/zipline-ai/chronon into che…
chewy-zlai Feb 21, 2025
66f4b55
Give the option of specifying customer ids to upload to
chewy-zlai Feb 21, 2025
b72f025
Remove cloud_aws_submitter_deploy.jar from the script as it doesn't e…
chewy-zlai Feb 21, 2025
2402905
Fix python version check
chewy-zlai Feb 21, 2025
9e7ab67
Remove reference to CLOUD_AWS_SUBMITTER_JAR
chewy-zlai Feb 24, 2025
e3091a0
Push to directory, not file named jars
chewy-zlai Feb 24, 2025
83bde59
Use array length of input values instead of checking first value
chewy-zlai Feb 24, 2025
d69efd8
Merge branch 'main' of https://github.com/zipline-ai/chronon into che…
chewy-zlai Feb 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
249 changes: 249 additions & 0 deletions distribution/build_and_upload_artifacts.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
#!/bin/bash

function print_usage() {
echo "Usage: $0 [OPTIONS]"
echo "Options:"
echo " --all Build and upload all artifacts (GCP and AWS)"
echo " --gcp Build and upload only GCP artifacts"
echo " --aws Build and upload only AWS artifacts"
echo " --aws_customer_ids <customer_id> Specify AWS customer IDs to upload artifacts to."
echo " --gcp_customer_ids <customer_id> Specify GCP customer IDs to upload artifacts to."
echo " -h, --help Show this help message"
}

# No arguments provided
if [ $# -eq 0 ]; then
print_usage
exit 1
fi

while [[ $# -gt 0 ]]; do
case $1 in
--all)
BUILD_GCP=true
BUILD_AWS=true
shift
;;
--gcp)
BUILD_GCP=true
shift
;;
--aws)
BUILD_AWS=true
shift
;;
-h|--help)
print_usage
exit 0
;;
--aws_customer_ids)
if [[ -z $2 ]]; then
echo "Error: --customer_ids requires a value"
print_usage
exit 1
fi
INPUT_AWS_CUSTOMER_IDS=("$2")
shift 2
;;
--gcp_customer_ids)
if [[ -z $2 ]]; then
echo "Error: --customer_ids requires a value"
print_usage
exit 1
fi
INPUT_GCP_CUSTOMER_IDS=("$2")
shift 2
;;
*)
echo "Unknown option: $1"
print_usage
exit 1
;;
esac
done


if [[ -n $(git diff HEAD) ]]; then
echo "Error: You have uncommitted changes. Please commit and push them to git so we can track them."
exit 1
fi

# Get current branch name
local_branch=$(git rev-parse --abbrev-ref HEAD)

# Fetch latest from remote
git fetch origin $local_branch

# Check if local is behind remote
if [[ -n $(git diff HEAD..origin/$local_branch) ]]; then
echo "Error: Your branch is not in sync with remote"
echo "Please push your local changes and sync your local branch $local_branch with remote"
exit 1
fi

set -e

SCRIPT_DIRECTORY=$(dirname -- "$(realpath -- "$0")")
CHRONON_ROOT_DIR=$(dirname "$SCRIPT_DIRECTORY")

echo "Working in $CHRONON_ROOT_DIR"
cd $CHRONON_ROOT_DIR

echo "Building wheel"
#Check python version >= 3.9
MAJOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 1)
MINOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 2)

EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION=3
EXPECTED_MINIMUM_MINOR_PYTHON_VERSION=9

if [[ $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION -gt $MAJOR_PYTHON_VERSION ]] ; then
echo "Failed major version of $MAJOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi

if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Python version check bug.
Missing '$' in the minor version test (lines 105-108); use $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION and $MINOR_PYTHON_VERSION.

-if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
-    echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
-    exit 1
-fi
+if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
+    echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
+    exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MAJOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 1)
MINOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 2)
EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION=3
EXPECTED_MINIMUM_MINOR_PYTHON_VERSION=9
if [[ $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION -gt $MAJOR_PYTHON_VERSION ]] ; then
echo "Failed major version of $MAJOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi
if [[ EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt MINOR_PYTHON_VERSION ]] ; then
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi
MAJOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 1)
MINOR_PYTHON_VERSION=$(python --version | cut -d " " -f2 | cut -d "." -f 2)
EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION=3
EXPECTED_MINIMUM_MINOR_PYTHON_VERSION=9
if [[ $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION -gt $MAJOR_PYTHON_VERSION ]] ; then
echo "Failed major version of $MAJOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi
if [[ $EXPECTED_MINIMUM_MINOR_PYTHON_VERSION -gt $MINOR_PYTHON_VERSION ]] ; then
echo "Failed minor version of $MINOR_PYTHON_VERSION. Expecting python version of at least $EXPECTED_MINIMUM_MAJOR_PYTHON_VERSION.$EXPECTED_MINIMUM_MINOR_PYTHON_VERSION to build wheel. Your version is $(python --version)"
exit 1
fi



thrift --gen py -out api/py/ api/thrift/common.thrift
thrift --gen py -out api/py/ api/thrift/api.thrift
thrift --gen py -out api/py/ api/thrift/observability.thrift
VERSION=$(cat version.sbt | cut -d " " -f3 | tr -d '"') pip wheel api/py
EXPECTED_ZIPLINE_WHEEL="zipline_ai-0.1.0.dev0-py3-none-any.whl"
if [ ! -f "$EXPECTED_ZIPLINE_WHEEL" ]; then
echo "$EXPECTED_ZIPLINE_WHEEL not found"
exit 1
fi

echo "Building jars"

bazel build //flink:flink_assembly_deploy.jar
bazel build //service:service_assembly_deploy.jar

FLINK_JAR="$CHRONON_ROOT_DIR/bazel-bin/flink/flink_assembly_deploy.jar"
SERVICE_JAR="$CHRONON_ROOT_DIR/bazel-bin/service/service_assembly_deploy.jar"

if [ ! -f "$SERVICE_JAR" ]; then
echo "$SERVICE_JAR not found"
exit 1
fi

if [ ! -f "$FLINK_JAR" ]; then
echo "$FLINK_JAR not found"
exit 1
fi



if [ "$BUILD_AWS" = true ]; then
bazel build //cloud_aws:cloud_aws_lib_deploy.jar

CLOUD_AWS_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_aws/cloud_aws_lib_deploy.jar"

if [ ! -f "$CLOUD_AWS_JAR" ]; then
echo "$CLOUD_AWS_JAR not found"
exit 1
fi
fi
if [ "$BUILD_GCP" = true ]; then
bazel build //cloud_gcp:cloud_gcp_lib_deploy.jar
bazel build //cloud_gcp:cloud_gcp_submitter_deploy.jar

CLOUD_GCP_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_gcp/cloud_gcp_lib_deploy.jar"
CLOUD_GCP_SUBMITTER_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_gcp/cloud_gcp_submitter_deploy.jar"

if [ ! -f "$CLOUD_GCP_JAR" ]; then
echo "$CLOUD_GCP_JAR not found"
exit 1
fi

if [ ! -f "$CLOUD_GCP_SUBMITTER_JAR" ]; then
echo "$CLOUD_GCP_SUBMITTER_JAR not found"
exit 1
fi

fi




# all customer ids
GCP_CUSTOMER_IDS=("canary" "etsy")

# Takes in array of customer ids
function upload_to_gcp() {
# Disabling this so that we can set the custom metadata on these jars
gcloud config set storage/parallel_composite_upload_enabled False
customer_ids_to_upload=("$@")
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
select yn in "Yes" "No"; do
case $yn in
Yes )
set -euxo pipefail
for element in "${customer_ids_to_upload[@]}"
do
ELEMENT_JAR_PATH=gs://zipline-artifacts-$element/jars
gcloud storage cp "$CLOUD_GCP_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
gcloud storage cp "$CLOUD_GCP_SUBMITTER_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
gcloud storage cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
gcloud storage cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
gcloud storage cp "$FLINK_JAR" "$ELEMENT_JAR_PATH" --custom-metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
done
echo "Succeeded"
break;;
No ) break;;
esac
done
gcloud config set storage/parallel_composite_upload_enabled True
}

AWS_CUSTOMER_IDS=("canary")

# Takes in array of customer ids
function upload_to_aws() {
customer_ids_to_upload=("$@")
echo "Are you sure you want to upload to these customer ids: ${customer_ids_to_upload[*]}"
select yn in "Yes" "No"; do
case $yn in
Yes )
set -euxo pipefail
for element in "${customer_ids_to_upload[@]}"
do
ELEMENT_JAR_PATH=s3://zipline-artifacts-$element/jars
aws s3 cp "$CLOUD_AWS_JAR" "$ELEMENT_JAR_PATH" --metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
aws s3 cp "$CLOUD_AWS_SUBMITTER_JAR" "$ELEMENT_JAR_PATH" --metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
aws s3 cp "$SERVICE_JAR" "$ELEMENT_JAR_PATH" --metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
aws s3 cp "$EXPECTED_ZIPLINE_WHEEL" "$ELEMENT_JAR_PATH" --metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
aws s3 cp "$FLINK_JAR" "$ELEMENT_JAR_PATH" --metadata="zipline_user=$USER,updated_date=$(date),commit=$(git rev-parse HEAD),branch=$(git rev-parse --abbrev-ref HEAD)"
done
echo "Succeeded"
break;;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

AWS upload function uses an undefined CLOUD_AWS_SUBMITTER_JAR.
Ensure the AWS submitter jar is built and its path is set (see AWS build block above).

No ) break;;
esac
done
}


if [ "$BUILD_AWS" = true ]; then
if [ -z "$INPUT_AWS_CUSTOMER_IDS" ]; then
echo "No customer ids provided for AWS. Using default: ${AWS_CUSTOMER_IDS[*]}"
else
AWS_CUSTOMER_IDS=("${INPUT_AWS_CUSTOMER_IDS[@]}")
fi
upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

AWS customer IDs check: use array length.
Instead of [ -z "$INPUT_AWS_CUSTOMER_IDS" ], consider using:

-    if [ -z "$INPUT_AWS_CUSTOMER_IDS" ]; then
+    if [ ${#INPUT_AWS_CUSTOMER_IDS[@]} -eq 0 ]; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$BUILD_AWS" = true ]; then
if [ -z "$INPUT_AWS_CUSTOMER_IDS" ]; then
echo "No customer ids provided for AWS. Using default: ${AWS_CUSTOMER_IDS[*]}"
else
AWS_CUSTOMER_IDS=("${INPUT_AWS_CUSTOMER_IDS[@]}")
fi
upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
fi
if [ "$BUILD_AWS" = true ]; then
if [ ${#INPUT_AWS_CUSTOMER_IDS[@]} -eq 0 ]; then
echo "No customer ids provided for AWS. Using default: ${AWS_CUSTOMER_IDS[*]}"
else
AWS_CUSTOMER_IDS=("${INPUT_AWS_CUSTOMER_IDS[@]}")
fi
upload_to_aws "${AWS_CUSTOMER_IDS[@]}"
fi
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 231-231: Expanding an array without an index only gives the first element.

(SC2128)

if [ "$BUILD_GCP" = true ]; then
if [ -z "$INPUT_GCP_CUSTOMER_IDS" ]; then
echo "No customer ids provided for GCP. Using default: ${GCP_CUSTOMER_IDS[*]}"
else
GCP_CUSTOMER_IDS=("${INPUT_GCP_CUSTOMER_IDS[@]}")
fi
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

GCP customer IDs check: use array length.
Replace [ -z "$INPUT_GCP_CUSTOMER_IDS" ] with:

-    if [ -z "$INPUT_GCP_CUSTOMER_IDS" ]; then
+    if [ ${#INPUT_GCP_CUSTOMER_IDS[@]} -eq 0 ]; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ "$BUILD_GCP" = true ]; then
if [ -z "$INPUT_GCP_CUSTOMER_IDS" ]; then
echo "No customer ids provided for GCP. Using default: ${GCP_CUSTOMER_IDS[*]}"
else
GCP_CUSTOMER_IDS=("${INPUT_GCP_CUSTOMER_IDS[@]}")
fi
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
fi
if [ "$BUILD_GCP" = true ]; then
if [ ${#INPUT_GCP_CUSTOMER_IDS[@]} -eq 0 ]; then
echo "No customer ids provided for GCP. Using default: ${GCP_CUSTOMER_IDS[*]}"
else
GCP_CUSTOMER_IDS=("${INPUT_GCP_CUSTOMER_IDS[@]}")
fi
upload_to_gcp "${GCP_CUSTOMER_IDS[@]}"
fi
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 239-239: Expanding an array without an index only gives the first element.

(SC2128)



# Cleanup wheel stuff
rm ./*.whl
129 changes: 0 additions & 129 deletions distribution/build_and_upload_gcp_artifacts.sh

This file was deleted.