Skip to content

Upgrade Quickstart And Push to Cloud #116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Dec 14, 2024
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions .github/workflows/push_to_canary.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
name: Push To Canary

on:
push:
branches:
- 'main'

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

env:
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}
AWS_QUICKSTART_REPOSITORY: zipline-ai/canary-quickstart
AWS_REGION: ${{secrets.AWS_REGION}}
GCP_PROJECT_ID: canary-443022
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart
GCP_REGION: us-central1

jobs:
push_to_cloud:
runs-on: ubuntu-latest

permissions:
id-token: write
contents: read

steps:
- uses: actions/checkout@v4

- name: Set up QEMU
uses: docker/setup-qemu-action@v3

- name: Setup JDK
uses: actions/setup-java@v4
with:
distribution: corretto
java-version: 11

- name: Install Thrift
env:
THRIFT_VERSION: 0.21.0
run: |
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
cd thrift-${{env.THRIFT_VERSION}} && \
sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
sudo make && \
sudo make install && \
cd .. && \
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz
Comment on lines +43 to +51
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance security and reliability of Thrift installation

The current Thrift installation process has several areas for improvement:

  1. No checksum verification for the downloaded archive
  2. No error handling for failed steps
  3. Direct sudo usage in GitHub Actions (not recommended)

Apply this diff to improve the installation process:

  run: |
+   set -eo pipefail
+   THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0"  # SHA for version 0.21.0
    sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
    curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
+   echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \
    tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
    cd thrift-${{env.THRIFT_VERSION}} && \
-   sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
-   sudo make && \
-   sudo make install && \
+   ./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
+   make && \
+   sudo make install && \
    cd .. && \
    sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
cd thrift-${{env.THRIFT_VERSION}} && \
sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
sudo make && \
sudo make install && \
cd .. && \
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz
set -eo pipefail
THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0" # SHA for version 0.21.0
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
cd thrift-${{env.THRIFT_VERSION}} && \
./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
make && \
sudo make install && \
cd .. && \
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz



- name: Build SBT Project
id: sbt-assembly
run: |
sbt clean && sbt assembly

- name: Build AWS Quickstart Image
id: build-aws-app
shell: bash
env:
USER: root
SPARK_SUBMIT_PATH: spark-submit
PYTHONPATH: /srv/chronon
SPARK_VERSION: 3.1.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Update Spark version to match Dockerfile

The Spark version in the build steps (3.1.1) doesn't match the version in the Dockerfile (3.5.1).

Update the Spark version in both build steps:

- SPARK_VERSION: 3.1.1
+ SPARK_VERSION: 3.5.1

Also applies to: 89-89

JOB_MODE: local[*]
PARALLELISM: 2
EXECUTOR_MEMORY: 2G
EXECUTOR_CORES: 4
DRIVER_MEMORY: 1G
CHRONON_LOG_TABLE: default.chronon_log_table
CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl
AWS_DEFAULT_REGION: ${{env.AWS_REGION}}
DYNAMO_ENDPOINT: https://dynamodb.${{env.AWS_REGION}}.amazonaws.com
JAVA_OPTS: "-Xms1g -Xmx1g"
CLOUD_AWS_JAR: /app/cli/cloud_aws.jar
run:
docker build "." -f "./Dockerfile" -t "aws-quickstart-image:latest"

- name: Build GCP Quickstart Image
id: build-gcp-app
shell: bash
env:
USER: root
SPARK_SUBMIT_PATH: spark-submit
PYTHONPATH: /srv/chronon
SPARK_VERSION: 3.1.1
JOB_MODE: local[*]
PARALLELISM: 2
EXECUTOR_MEMORY: 2G
EXECUTOR_CORES: 4
DRIVER_MEMORY: 1G
CHRONON_LOG_TABLE: default.chronon_log_table
CHRONON_ONLINE_CLASS: ai.chronon.integrations.cloud_gcp.GcpApiImpl
GCP_DEFAULT_REGION: ${{env.GCP_REGION}}
BIGTABLE_ENDPOINT: https://${{env.GCP_REGION}}-bigtable.googleapis.com
JAVA_OPTS: "-Xms1g -Xmx1g"
CLOUD_GCP_JAR: /app/cli/cloud_gcp.jar
run:
docker build "." -f "./Dockerfile" -t "gcp-quickstart-image:latest"

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{env.AWS_ACCOUNT_ID}}:role/github-canary-updater
aws-region: ${{env.AWS_REGION}}


- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v2
with:
registries: ${{env.AWS_ACCOUNT_ID}}

- name: Tag, and push quickstart image to Amazon ECR
env:
ECR_REPOSITORY: ${{steps.login-ecr.outputs.registry}}/${{env.AWS_QUICKSTART_REPOSITORY}}
IMAGE_TAG: main
shell: bash
run: |
set -eo pipefail
docker tag "aws-quickstart-image:latest" "${{env.ECR_REPOSITORY}}:$IMAGE_TAG"
docker push "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" || {
echo "Failed to push canary tag"
exit 1
}
docker tag "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" "${{env.ECR_REPOSITORY}}:${{github.sha}}"
docker push "${{env.ECR_REPOSITORY}}:${{github.sha}}" || {
echo "Failed to push sha tag"
exit 1
}
echo "IMAGE $IMAGE_TAG is pushed to ${{env.ECR_REPOSITORY}}"
echo "image_tag=$IMAGE_TAG"
echo "full_image=${{env.ECR_REPOSITORY}}:$IMAGE_TAG"

- name: Configure GCP Credentials
uses: google-github-actions/auth@v2
with:
project_id: ${{env.GCP_PROJECT_ID}}
workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
service_account: [email protected]
Comment on lines +141 to +142
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove hardcoded service account details

The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.

Replace hardcoded values with secrets:

-          workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
-          service_account: [email protected]
+          workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}}
+          service_account: ${{secrets.GCP_SERVICE_ACCOUNT}}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
service_account: [email protected]
workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}}
service_account: ${{secrets.GCP_SERVICE_ACCOUNT}}


- name: Set up Google Cloud SDK
uses: google-github-actions/setup-gcloud@v2

- name: Google Cloud Docker Auth
shell: bash
run: |-
gcloud auth configure-docker ${{env.GCP_REGION}}-docker.pkg.dev --quiet

- name: Push Quickstart to Artifact Registry
shell: bash
env:
IMAGE_TAG: main
run: |
set -eo pipefail
docker tag "gcp-quickstart-image:latest" "${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
docker push "${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" || {
echo "Failed to push canary tag"
exit 1
}
docker tag "${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" "${{env.GAR_QUICKSTART_REPOSITORY}}:${{github.sha}}"
docker push "${{env.GAR_QUICKSTART_REPOSITORY}}:${{github.sha}}" || {
echo "Failed to push sha tag"
exit 1
}
echo "IMAGE $IMAGE_TAG is pushed to ${{env.GAR_QUICKSTART_REPOSITORY}}"
echo "image_tag=$IMAGE_TAG"
echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
39 changes: 31 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# Start from a Debian base image
FROM openjdk:8-jre-slim
FROM openjdk:17-jdk-slim

# Set this manually before building the image, requires a local build of the jar

ENV CHRONON_JAR_PATH=spark/target-embedded/scala-2.12/your_build.jar
ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar
ENV CLOUD_AWS_JAR_PATH=cloud_aws/target/scala-2.12/cloud_aws-assembly-0.1.0-SNAPSHOT.jar
ENV CLOUD_GCP_JAR_PATH=cloud_gcp/target/scala-2.12/cloud_gcp-assembly-0.1.0-SNAPSHOT.jar

# Update package lists and install necessary tools
RUN apt-get update && apt-get install -y \
Expand All @@ -16,8 +18,8 @@ RUN apt-get update && apt-get install -y \
procps \
python3-pip

ENV THRIFT_VERSION 0.13.0
ENV SCALA_VERSION 2.12.12
ENV THRIFT_VERSION 0.21.0
ENV SCALA_VERSION 2.12.18

# Install thrift
RUN curl -sSL "http://archive.apache.org/dist/thrift/$THRIFT_VERSION/thrift-$THRIFT_VERSION.tar.gz" -o thrift.tar.gz \
Expand All @@ -43,8 +45,8 @@ ENV PATH=${PATH}:${SCALA_HOME}/bin
# Optional env variables
ENV SPARK_HOME=${SPARK_HOME:-"/opt/spark"}
ENV HADOOP_HOME=${HADOOP_HOME:-"/opt/hadoop"}
ENV SPARK_VERSION=${SPARK_VERSION:-"3.1.1"}
ENV HADOOP_VERSION=${HADOOP_VERSION:-"3.2"}
ENV SPARK_VERSION=${SPARK_VERSION:-"3.5.1"}
ENV HADOOP_VERSION=${HADOOP_VERSION:-"3"}
RUN mkdir -p ${HADOOP_HOME} && mkdir -p ${SPARK_HOME}
RUN mkdir -p /opt/spark/spark-events
WORKDIR ${SPARK_HOME}
Expand All @@ -54,12 +56,10 @@ RUN curl https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SP
&& tar xvzf spark.tgz --directory /opt/spark --strip-components 1 \
&& rm -rf spark.tgz


# Install python deps
COPY quickstart/requirements.txt .
RUN pip3 install -r requirements.txt


ENV PATH="/opt/spark/sbin:/opt/spark/bin:${PATH}"
ENV SPARK_HOME="/opt/spark"

Expand All @@ -76,9 +76,32 @@ WORKDIR ${SPARK_HOME}
WORKDIR /srv/chronon

ENV DRIVER_JAR_PATH="/srv/spark/spark_embedded.jar"
ENV CLOUD_AWS_JAR=${CLOUD_AWS_JAR:-"/srv/cloud_aws/cloud_aws.jar"}
ENV CLOUD_GCP_JAR=${CLOUD_GCP_JAR:-"/srv/cloud_gcp/cloud_gcp.jar"}

COPY api/py/test/sample ./
COPY quickstart/mongo-online-impl /srv/onlineImpl
COPY $CHRONON_JAR_PATH "$DRIVER_JAR_PATH"
COPY $CLOUD_AWS_JAR_PATH "$CLOUD_AWS_JAR"
COPY $CLOUD_GCP_JAR_PATH "$CLOUD_GCP_JAR"

ENV CHRONON_DRIVER_JAR="$DRIVER_JAR_PATH"

ENV SPARK_SUBMIT_OPTS="\
-XX:MaxMetaspaceSize=1024m \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.nio=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED \
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED \
--add-opens=java.base/sun.security.action=ALL-UNNAMED \
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"

CMD tail -f /dev/null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Replace tail command with proper entrypoint

Using tail -f /dev/null to keep the container running is a development pattern. Consider:

-CMD tail -f /dev/null
+COPY entrypoint.sh /
+RUN chmod +x /entrypoint.sh
+ENTRYPOINT ["/entrypoint.sh"]

Create an entrypoint.sh that properly initializes the service and handles signals.

Committable suggestion skipped: line range outside the PR's diff.

4 changes: 1 addition & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
# Quickstart Docker containers to run chronon commands with MongoDB as the KV Store.
version: '3.8'

services:

mongodb:
Expand Down Expand Up @@ -44,7 +42,7 @@ services:
- USER=root
- SPARK_SUBMIT_PATH=spark-submit
- PYTHONPATH=/srv/chronon
- SPARK_VERSION=3.1.1
- SPARK_VERSION=3.5.1
- JOB_MODE=local[*]
- PARALLELISM=2
- EXECUTOR_MEMORY=2G
Expand Down
Loading