Upgrade Quickstart And Push to Cloud #116

chewy-zlai · 2024-12-09T20:53:10Z

Summary

Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing.

Checklist

Added Unit Tests
Covered by existing CI
[ x] Integration tested
Documentation update

Summary by CodeRabbit

New Features
- Updated the Dockerfile to use a newer base image and modified the JAR file paths for building the application.
- Upgraded the Spark version in the main service of the docker-compose configuration.
- Introduced a new GitHub Actions workflow to automate the process of pushing Docker images to AWS and GCP.
Bug Fixes
- Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations.

coderabbitai · 2024-12-09T20:53:19Z

Walkthrough

The changes in this pull request involve modifications to the Dockerfile, docker-compose.yml, and the introduction of a new GitHub Actions workflow file. The Dockerfile has updated the base image to openjdk:17-jdk-slim and adjusted environment variables for JAR paths, Thrift version, Scala version, Spark version, and Hadoop version. The docker-compose.yml has updated the SPARK_VERSION for the main service. Additionally, a new workflow file automates the process of pushing Docker images to AWS and GCP.

Changes

File	Change Summary
Dockerfile	Updated base image from `openjdk:8-jre-slim` to `openjdk:17-jdk-slim`. Updated `CHRONON_JAR_PATH` from `spark/target-embedded/scala-2.12/your_build.jar` to `spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar`. Introduced `CLOUD_AWS_JAR_PATH` and `CLOUD_GCP_JAR_PATH`. Updated `THRIFT_VERSION` from `0.13.0` to `0.21.0`, `SCALA_VERSION` from `2.12.12` to `2.12.18`, `SPARK_VERSION` from `3.1.1` to `3.5.1`, and `HADOOP_VERSION` from `3.2` to `3`. Added new options to `SPARK_SUBMIT_OPTS` and a command `CMD tail -f /dev/null`.
docker-compose.yml	Updated `SPARK_VERSION` for the `main` service from `3.1.1` to `3.5.1`.
.github/workflows/push_to_canary.yaml	Introduced a new workflow for automating the process of pushing Docker images to AWS and GCP, including multiple steps for setup and deployment.

Possibly related PRs

Docker init #20: The changes in the Dockerfile regarding the base image and environment variables are relevant as they also involve Docker configurations, specifically for building and running services.
Wire up Play frontend + server in docker setup #24: This PR also modifies a Dockerfile and involves changes to environment variables, which may relate to the overall Docker setup and configuration.
Numpy vuln #39: The updates to the Dockerfile in this PR include changes to the Python environment setup, which may relate to the overall environment configuration discussed in the main PR.
Driver Summarizer #62: Changes to the generate_anomalous_data.py script and its integration with Docker could relate to the overall functionality being set up in the main PR.
Tweak obs demo docker init image to pull out sbt steps outside docker build #102: The changes to streamline the Docker build process and separate build steps from the Dockerfile are directly relevant to the main PR's focus on Docker configurations.
include note about java version #104: Updates to the README regarding Java version requirements are relevant as they complement the changes made in the Dockerfile concerning the Java environment.
update some readme stuff #114: Updates to the README to clarify the frontend server setup, which is relevant to the overall Docker environment configuration discussed in the main PR.

Suggested reviewers

piyush-zlai
nikhil-zlai
ken-zlai

🐇 In the land of code where the rabbits play,
The JAR path has changed, hip-hip-hooray!
From your_build.jar to spark-assembly,
Building anew, so bright and free!
With Docker's embrace, we hop and we cheer,
For smoother deployments, the future is clear! 🌟

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

docker-compose.yml (1)
Line range hint 1-83: Consider adding health checks and proper service dependencies.

While service dependencies are defined (e.g., Kafka depends on Zookeeper), the configuration could benefit from:

Health checks for MongoDB, Kafka, and Zookeeper

Explicit dependency of main service on MongoDB and Kafka

Example addition for the main service:
  main:
    build:
      dockerfile: Dockerfile
+   depends_on:
+     mongodb:
+       condition: service_healthy
+     kafka:
+       condition: service_healthy
And for MongoDB:
  mongodb:
    image: mongo:latest
+   healthcheck:
+     test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
+     interval: 10s
+     timeout: 10s
+     retries: 5
+     start_period: 40s
Dockerfile (1)
Line range hint 82-83: Add validation for JAR copy operation.

The COPY operation for the JAR file should fail explicitly if the source doesn't exist.

Consider adding validation:
-COPY $CHRONON_JAR_PATH "$DRIVER_JAR_PATH"
+COPY --chmod=644 $CHRONON_JAR_PATH "$DRIVER_JAR_PATH"
+RUN test -f "$DRIVER_JAR_PATH" || (echo "ERROR: Driver JAR not found at $DRIVER_JAR_PATH" && exit 1)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e2ff034 and 0a99193.

📒 Files selected for processing (2)

Dockerfile (1 hunks)
docker-compose.yml (1 hunks)

🔇 Additional comments (3)

docker-compose.yml (2)

39-40: LGTM on build configuration change.

Switching to local builds is appropriate for development and testing of quickstart functionality.

Line range hint 75-80: Consider uncommenting volume mounts for quickstart functionality.

The commented volume mounts appear to be essential for quickstart functionality, particularly:

/srv/chronon for sample code
/srv/onlineImpl for KV Store implementation
/srv/jars for required Spark connectors

These mounts might be necessary to get run.py working properly.

Dockerfile (1)

6-6: 🛠️ Refactor suggestion

Verify JAR path and consider making it configurable.

The hardcoded JAR path assumes a specific build output structure and version. Consider:

Making the version configurable via build arg
Verifying the JAR exists before copying

Consider updating the Dockerfile to use build arguments:

-ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar
+ARG CHRONON_VERSION=0.1.0-SNAPSHOT
+ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-${CHRONON_VERSION}.jar

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)

Dockerfile (1)
Line range hint 1-85: Consider optimizing Dockerfile for better maintainability and build performance

While the overall structure is sound, consider these improvements:

Group related ENV variables together (versions, paths)

Combine RUN commands to reduce layers

Add comments explaining the purpose of each major section

Example optimization for package installation:
-RUN apt-get update && apt-get install -y \
+RUN apt-get update && apt-get install -y --no-install-recommends \
     curl \
     python3 \
     python3-dev \
     python3-setuptools \
     vim \
     wget \
     procps \
-    python3-pip
+    python3-pip \
+    && rm -rf /var/lib/apt/lists/*

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0a99193 and cc767f3.

📒 Files selected for processing (1)

Dockerfile (2 hunks)

🔇 Additional comments (4)

Dockerfile (4)

Line range hint 1-85: Verify alignment with PR objectives

The Dockerfile changes establish the build environment, but since the PR mentions run.py is not working:

Confirm these changes are prerequisites for fixing run.py
Consider adding a comment in the PR about how these changes contribute to the Quickstart fix

✅ Verification successful

Dockerfile changes align with fixing run.py and Quickstart

The Dockerfile changes are essential prerequisites for fixing run.py and Quickstart because:

The Dockerfile sets up critical environment variables like CHRONON_JAR_PATH and DRIVER_JAR_PATH that run.py relies on for execution
It installs necessary Python dependencies from quickstart/requirements.txt
It properly configures the Spark environment and copies required Quickstart implementation files (quickstart/mongo-online-impl)

The changes are well-aligned with the PR objectives as they establish the correct runtime environment needed by run.py.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for run.py and related configurations
fd run.py

# Check for any runtime configuration files
fd -e yaml -e properties -e conf -x grep -l "runtime|execution" {} \;

Length of output: 140

Script:

#!/bin/bash
# Check the content of run.py to understand its dependencies
cat api/py/ai/chronon/repo/run.py

# Look for Quickstart related files and documentation
fd -t f quickstart

Length of output: 29170

19-19: ⚠️ Potential issue

Major Thrift version upgrade requires careful testing

The upgrade from Thrift 0.13.0 to 0.21.0 is substantial. Please ensure:

All Thrift-generated code is compatible
No breaking changes affect the system
Consider documenting the reason for this major version jump

✅ Verification successful

Based on the analysis of the codebase, I can now provide a final response:

Thrift version upgrade appears safe to proceed

The codebase shows a well-isolated Thrift usage pattern primarily for API definitions and protocol handling. The Thrift interface file (api.thrift) contains only data structure definitions and doesn't use any deprecated or complex Thrift features that would be affected by version changes. The implementation uses standard Thrift protocols (Binary, JSON, Compact) and basic transport mechanisms that have remained stable across versions.

The main Thrift usage is contained within api/thrift/api.thrift with clean interface definitions
Implementation uses standard Thrift protocols that are well-maintained across versions
No usage of deprecated or version-specific Thrift features detected

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for Thrift interface definitions
fd -e thrift -x echo "Found Thrift file: {}" \;

# Check for Thrift version references in build files
rg --type=yaml --type=json --type=xml "thrift"

Length of output: 140

Script:

#!/bin/bash
# Check content of Thrift interface file
cat ./api/thrift/api.thrift

# Look for Thrift-related dependencies or configurations in package files
rg -g "package*.json" -g "*.xml" -g "*.gradle" -g "requirements.txt" -g "setup.py" "thrift"

# Search for Thrift client/server usage
rg -t py -t java -t js -t ts "(?i)thrift.(Server|Client|Protocol|Transport)"

Length of output: 77186

2-2: Verify Java version compatibility with dependencies

The change from openjdk:8-jre-slim to openjdk:11-jdk-slim is significant. While Java 11 is a good choice (LTS), please ensure:

Spark 3.1.1 is fully compatible with Java 11
All dependencies support Java 11
The switch from JRE to JDK is intentional (larger image size)

✅ Verification successful

Let me generate additional verification scripts to check Spark and Flink version compatibility specifically:

The previous results showed Java 11/17 mentions in comments and some Java-related dependencies, but we need to verify the actual Spark/Flink versions and their Java compatibility requirements to make a definitive assessment.

Java 11 is compatible with the project dependencies

The switch to Java 11 in the Dockerfile is appropriate because:

The project uses Spark 3.5.1, which fully supports Java 11 (and even Java 17)
The codebase appears to be already prepared for Java 11/17 as indicated by the development notes
The JDK requirement (vs JRE) is justified by the presence of source generators (Thrift) in the build process

The dependencies look compatible:

Spark 3.5.1 (core, streaming, sql-kafka)
Java-specific libraries are version-appropriate (javax.servlet-api 3.1.0, snappy-java 1.1.10.4)
Scala Java8 compatibility modules are included

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if any Java version-specific code or dependencies exist
rg -l "javax\.xml\.bind|javax\.activation|javax\.annotation|com\.sun\.|sun\." 

# Look for potential Java version requirements in build files
fd -e sbt -e gradle -e pom.xml -x cat {} \; | rg -i "java|jdk|source|target"

Length of output: 1197

Script:

#!/bin/bash
# Check Spark version and Java requirements
fd spark -e sbt -e properties -x cat {} \; | rg -i "spark.*version|scala.*version"

# Check for explicit Java version configurations in build files
fd "build.sbt|build.properties" -x cat {} \; | rg -i "java|jdk|source|target"

# Look for Spark/Flink dependency declarations
rg -A 3 "spark-core|flink-core"

Length of output: 1857

6-6: Verify JAR build output location

The JAR path has changed from target-embedded to target directory with a specific assembly JAR name. Since the PR mentions compile.py is working:

Confirm this path matches the actual build output
Consider documenting the build process in README

✅ Verification successful

JAR path change is consistent with build configuration

The JAR path change in the Dockerfile is valid and consistent with the project's build configuration:

The spark project in build.sbt has assembly configuration enabled through sparkBaseSettings
The assembly artifact will be generated under spark/target/scala-2.12/ with -assembly classifier
The path is correctly referenced in multiple places in the Dockerfile
compile.py exists and is used for configuration materialization, not for JAR building

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if build documentation exists
fd README -x grep -l "build|compile" {} \;

# Look for build configuration that outputs this JAR
fd -e sbt -x grep -l "assembly" {} \;

Length of output: 214

Script:

#!/bin/bash
# Check build.sbt for assembly configuration
cat build.sbt

# Check for compile.py mentioned in PR
fd -e py compile.py

# Check if the JAR path exists in spark directory
fd -p "spark/target/.*jar$"

Length of output: 11457

Script:

#!/bin/bash
# Check compile.py content to understand build process
cat api/py/ai/chronon/repo/compile.py

# Check spark project's assembly configuration
grep -A 10 "lazy val spark = project" build.sbt

# Check if there are any other Dockerfile references to the JAR
rg "CHRONON_JAR_PATH" -A 2 -B 2

Length of output: 12263

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)

.github/workflows/push_to_canary.yaml (2)
131-131: Fix YAML formatting issues

There are trailing spaces and missing newline at the end of the file.
-          echo "image_tag=$IMAGE_TAG" 
+          echo "image_tag=$IMAGE_TAG"
-          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+
Also applies to: 159-160

🧰 Tools

🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

59-61: Add build caching for SBT

The SBT build steps could benefit from caching to speed up the workflow.

Add caching step before the SBT build:
- uses: actions/cache@v3
  with:
    path: |
      ~/.sbt
      ~/.ivy2/cache
      ~/.coursier/cache
    key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }}
    restore-keys: |
      ${{ runner.os }}-sbt-

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cc767f3 and 2fe7d33.

📒 Files selected for processing (2)

.github/workflows/push_to_canary.yaml (1 hunks)
Dockerfile (3 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 131-131: trailing spaces

(trailing-spaces)

[error] 159-159: trailing spaces

(trailing-spaces)

[error] 160-160: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (3)

Dockerfile (3)

2-2: Verify Java 11 compatibility with the codebase

The base image has been updated from Java 8 to Java 11. While this is a good modernization step, we should ensure all dependencies are compatible with Java 11.

✅ Verification successful

Based on the search results, I can now generate the final response since I have enough information to verify Java 11 compatibility:

Java 11 compatibility is confirmed for this codebase

The codebase shows clear Java 11 compatibility through:

Build configuration in build.sbt shows no Java version constraints that would conflict with Java 11
The codebase uses Java 11 compatible libraries and features:
- Modern Scala versions (2.12.18, 2.13.14) which are Java 11 compatible
- Up-to-date dependencies like Guava 33.3.1-jre
- Usage of Java 11 compatible APIs (java.util.concurrent, java.time, etc.)
No usage of deprecated Java 8 APIs that were removed in Java 11

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for Java version dependencies in build files
echo "Checking build.sbt and related files for Java version specifications:"
rg -A 3 "java|jdk|jre|sourceCompatibility|targetCompatibility|crossScalaVersions"

Length of output: 72161

6-8: Verify JAR paths in build output

The JAR paths have been updated to point to new locations. We need to ensure these paths match the actual build output locations.

21-21: Verify Thrift version compatibility

The Thrift version has been updated from 0.13.0 to 0.21.0. This is a significant version jump that might affect generated code.

Dockerfile

coderabbitai · 2024-12-10T18:26:50Z

.github/workflows/push_to_canary.yaml

+on:
+  push:
+#    branches:
+#      - 'main'
+


⚠️ Potential issue

Uncomment branch filter for production safety

The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.

Uncomment the branch filter to restrict deployments:

on: push: -# branches: -# - 'main' + branches: + - 'main'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

on:

push:

# branches:

# - 'main'

on:

push:

branches:

- 'main'

coderabbitai · 2024-12-10T18:26:50Z

.github/workflows/push_to_canary.yaml

+          workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
+          service_account: [email protected]


⚠️ Potential issue

Remove hardcoded service account details

The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.

Replace hardcoded values with secrets:

- workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github - service_account: [email protected] + workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}} + service_account: ${{secrets.GCP_SERVICE_ACCOUNT}}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github

service_account: [email protected]

workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}}

service_account: ${{secrets.GCP_SERVICE_ACCOUNT}}

coderabbitai · 2024-12-10T18:26:50Z

.github/workflows/push_to_canary.yaml

+  GCP_REGION: us-west1
+  GCP_PROJECT_ID: canary-443022
+  GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart


⚠️ Potential issue

Move hardcoded values to repository secrets

GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.

Replace hardcoded values with secrets:

- GCP_REGION: us-west1 - GCP_PROJECT_ID: canary-443022 - GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart + GCP_REGION: ${{secrets.GCP_REGION}} + GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}} + GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

GCP_REGION: us-west1

GCP_PROJECT_ID: canary-443022

GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart

GCP_REGION: ${{secrets.GCP_REGION}}

GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}}

GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}}

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)

.github/workflows/push_to_canary.yaml (4)
41-54: Optimize Thrift installation

The Thrift installation process could be optimized to reduce build time and image size.

Consider these improvements:

Use multi-stage builds to keep the final image clean

Cache the Thrift installation using GitHub Actions cache
- name: Cache Thrift
  uses: actions/cache@v3
  with:
    path: thrift-${{env.THRIFT_VERSION}}
    key: thrift-${{env.THRIFT_VERSION}}
56-62: Add SBT caching for faster builds

The SBT build process could be significantly faster with proper caching.

Add caching before the build step:
- name: Cache SBT
  uses: actions/cache@v3
  with:
    path: |
      ~/.sbt
      ~/.ivy2/cache
      ~/.coursier/cache
    key: sbt-${{ hashFiles('**/*.sbt') }}-${{ hashFiles('project/build.properties') }}
63-84: Move Docker build configuration to a file

The large number of environment variables in the workflow makes it hard to maintain.

Consider moving the configuration to a separate file:

Create .github/config/aws-build.env

Update the workflow:
- name: Build AWS Quickstart Image
  id: build-aws-app
  shell: bash
  env:
    USER: root
  run: |
    docker build "." -f "./Dockerfile" \
      --env-file .github/config/aws-build.env \
      -t "aws-quickstart-image:latest"
131-131: Fix YAML formatting issues

There are minor YAML formatting issues:

Remove trailing spaces on lines 131 and 159

Add newline at end of file

Also applies to: 159-160

🧰 Tools

🪛 yamllint (1.35.1)

[error] 131-131: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe7d33 and 4c093e5.

📒 Files selected for processing (1)

.github/workflows/push_to_canary.yaml (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 131-131: trailing spaces

(trailing-spaces)

[error] 159-159: trailing spaces

(trailing-spaces)

[error] 160-160: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (4)

.github/workflows/push_to_canary.yaml (4)

3-7: Uncomment branch filter for production safety

The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.

17-19: Move hardcoded values to repository secrets

GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.

21-28: LGTM! Job configuration follows best practices

The job configuration correctly sets minimal required permissions and uses a standard runner.

138-139: Remove hardcoded service account details

The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.

coderabbitai · 2024-12-10T18:32:05Z

.github/workflows/push_to_canary.yaml

+env:
+  AWS_REGION: ${{secrets.AWS_REGION}}
+  AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}}
+  AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}}
+  AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}
+  GCP_REGION: us-west1
+  GCP_PROJECT_ID: canary-443022
+  GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart
+


⚠️ Potential issue

Add missing AWS environment variable

The AWS quickstart repository variable is not defined in the environment variables section.

Add the following environment variable:

AWS_REGION: ${{secrets.AWS_REGION}} AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}} AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}} AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}} + AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

env:

AWS_REGION: ${{secrets.AWS_REGION}}

AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}}

AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}}

AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}

GCP_REGION: us-west1

GCP_PROJECT_ID: canary-443022

GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart

env:

AWS_REGION: ${{secrets.AWS_REGION}}

AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}}

AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}}

AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}

AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}}

GCP_REGION: us-west1

GCP_PROJECT_ID: canary-443022

GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart

.github/workflows/push_to_canary.yaml

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

.github/workflows/push_to_canary.yaml (2)

61-104: Optimize Docker build configuration

The AWS and GCP build steps contain significant duplication and could be optimized.

Consider these improvements:

Extract common environment variables:

+     env:
+       COMMON_ENV: |
+         USER=root
+         SPARK_SUBMIT_PATH=spark-submit
+         PYTHONPATH=/srv/chronon
+         SPARK_VERSION=3.1.1
+         JOB_MODE=local[*]
+         PARALLELISM=2
+         EXECUTOR_MEMORY=2G
+         EXECUTOR_CORES=4
+         DRIVER_MEMORY=1G
+         CHRONON_LOG_TABLE=default.chronon_log_table
+         JAVA_OPTS="-Xms1g -Xmx1g"

      - name: Build AWS Quickstart Image
        env:
-         USER: root
-         SPARK_SUBMIT_PATH: spark-submit
-         # ... (remove other common vars)
+         <<: ${{env.COMMON_ENV}}
          CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl
          AWS_DEFAULT_REGION: ${{env.AWS_REGION}}
          DYNAMO_ENDPOINT: https://dynamodb.${{env.AWS_REGION}}.amazonaws.com
          CLOUD_AWS_JAR: /app/cli/cloud_aws.jar

Use build arguments for variant-specific values:

        run: |
-         docker build "." -f "./Dockerfile" -t "aws-quickstart-image:latest"
+         docker build "." -f "./Dockerfile" \
+           --build-arg CLOUD_JAR="${CLOUD_AWS_JAR}" \
+           --build-arg ONLINE_CLASS="${CHRONON_ONLINE_CLASS}" \
+           -t "aws-quickstart-image:latest"

129-129: Fix YAML formatting issues

There are minor formatting issues in the file:

Trailing spaces on lines 129 and 157
Missing newline at end of file

Apply this diff to fix the formatting:

-          echo "image_tag=$IMAGE_TAG" 
+          echo "image_tag=$IMAGE_TAG"
-          echo "image_tag=$IMAGE_TAG" 
+          echo "image_tag=$IMAGE_TAG"
-          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+

Also applies to: 157-158

🧰 Tools

🪛 yamllint (1.35.1)

[error] 129-129: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4c093e5 and 981c5c3.

📒 Files selected for processing (1)

.github/workflows/push_to_canary.yaml (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 129-129: trailing spaces

(trailing-spaces)

[error] 157-157: trailing spaces

(trailing-spaces)

[error] 158-158: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (2)

.github/workflows/push_to_canary.yaml (2)

8-10: LGTM: Appropriate concurrency configuration

The concurrency settings are well configured to prevent queue buildup by canceling outdated runs.

14-14: Verify AWS repository URL construction

The AWS repository URL is constructed using multiple components. Let's verify the format is correct.

✅ Verification successful

AWS ECR repository URL format is correctly constructed

The AWS repository URL in the workflow file is properly constructed using the following format:
${{secrets.AWS_ACCOUNT_ID}}.dkr.ecr.us-west-1.amazonaws.com/zipline-ai/canary-quickstart

This follows the standard ECR repository URL pattern:

Account ID from secrets
ECR domain with region
Repository path

The URL is used consistently throughout the workflow, including in the ECR login and docker push steps, and all sensitive information is properly stored in secrets.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify AWS repository URL format matches ECR standards
echo "Checking AWS repository URL format..."
REPO_URL="${{env.AWS_QUICKSTART_REPOSITORY}}"
if [[ ! $REPO_URL =~ ^[0-9]+\.dkr\.ecr\.[a-z0-9-]+\.amazonaws\.com/[a-z0-9-]+/[a-z0-9-]+$ ]]; then
  echo "Invalid ECR repository URL format"
  exit 1
fi

Length of output: 379

Script:

#!/bin/bash
# Let's verify the actual content of the workflow file
cat .github/workflows/push_to_canary.yaml

Length of output: 5840

coderabbitai · 2024-12-10T19:01:13Z

.github/workflows/push_to_canary.yaml

+      - name: Build SBT Project
+        id: sbt-dist
+        run: |
+          sbt clean && sbt assembly
+          sbt dist
+


🛠️ Refactor suggestion

Optimize SBT build performance and reliability

The SBT build step could be improved with dependency caching and error handling.

Apply these changes:

+ - name: Cache SBT dependencies + uses: actions/cache@v3 + with: + path: | + ~/.ivy2/cache + ~/.sbt + key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }} + restore-keys: | + ${{ runner.os }}-sbt- - name: Build SBT Project id: sbt-dist run: | + set -eo pipefail sbt clean && sbt assembly sbt dist

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- name: Build SBT Project

id: sbt-dist

run: |

sbt clean && sbt assembly

sbt dist

- name: Cache SBT dependencies

uses: actions/cache@v3

with:

path: |

~/.ivy2/cache

~/.sbt

key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }}

restore-keys: |

${{ runner.os }}-sbt-

- name: Build SBT Project

id: sbt-dist

run: |

set -eo pipefail

sbt clean && sbt assembly

sbt dist

coderabbitai · 2024-12-10T19:01:13Z

.github/workflows/push_to_canary.yaml

+          sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
+          curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
+          tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
+          cd thrift-${{env.THRIFT_VERSION}} && \
+          sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
+          sudo make && \
+          sudo make install && \
+          cd .. && \
+          sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz


⚠️ Potential issue

Enhance security and reliability of Thrift installation

The current Thrift installation process has several areas for improvement:

No checksum verification for the downloaded archive

No error handling for failed steps

Direct sudo usage in GitHub Actions (not recommended)

Apply this diff to improve the installation process:

run: | + set -eo pipefail + THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0" # SHA for version 0.21.0 sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \ curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \ + echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \ tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \ cd thrift-${{env.THRIFT_VERSION}} && \ - sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \ - sudo make && \ - sudo make install && \ + ./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \ + make && \ + sudo make install && \ cd .. && \ sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \

curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \

tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \

cd thrift-${{env.THRIFT_VERSION}} && \

sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \

sudo make && \

sudo make install && \

cd .. && \

sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz

set -eo pipefail

THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0" # SHA for version 0.21.0

sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \

curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \

echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \

tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \

cd thrift-${{env.THRIFT_VERSION}} && \

./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \

make && \

sudo make install && \

cd .. && \

sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

.github/workflows/push_to_canary.yaml (1)

128-128: Fix YAML formatting issues

There are several YAML formatting issues:

Remove trailing spaces on lines 128 and 156

Add a newline character at the end of the file

Also applies to: 156-157

🧰 Tools

🪛 yamllint (1.35.1)

[error] 128-128: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 981c5c3 and 66226f2.

📒 Files selected for processing (1)

.github/workflows/push_to_canary.yaml (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 128-128: trailing spaces

(trailing-spaces)

[error] 156-156: trailing spaces

(trailing-spaces)

[error] 157-157: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (7)

.github/workflows/push_to_canary.yaml (7)

3-7: Uncomment branch filter for production safety

The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.

16-18: Move hardcoded GCP values to repository secrets

GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.

44-52: Enhance security and reliability of Thrift installation

55-59: Optimize SBT build performance and reliability

122-127: Add error handling for Docker push commands

135-136: Remove hardcoded service account details

69-73: Review Spark memory configuration

The current memory settings might be insufficient for Spark workloads:

Executor memory (2G) and driver memory (1G) seem low for processing data
Consider increasing these values based on your data volume and processing requirements

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

Dockerfile (2)
6-8: Consider using build arguments for version numbers

The hardcoded version numbers in JAR paths make maintenance more challenging. Consider using ARG instructions for versions:
+ARG VERSION=0.1.0-SNAPSHOT
-ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar
+ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-${VERSION}.jar
88-103: Review Java module access permissions

The extensive --add-opens configurations expose internal Java modules. While necessary for Java 17 compatibility, consider:

Documenting why each module needs to be opened

Reviewing if all opened modules are necessary

Security implications of exposing internal modules

Consider creating a separate documentation file explaining the rationale for each module access requirement.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 66226f2 and 8f83def.

📒 Files selected for processing (1)

Dockerfile (4 hunks)

🔇 Additional comments (4)

Dockerfile (4)

83-84: ⚠️ Potential issue

Fix undefined environment variables

The COPY commands reference $CLOUD_AWS_JAR and $CLOUD_GCP_JAR which haven't been defined in the Dockerfile.

21-22: Verify Thrift 0.21.0 compatibility

The upgrade from Thrift 0.13.0 to 0.21.0 is significant. Ensure all generated Thrift code is compatible with the new version.

✅ Verification successful

Let me check the Thrift interface and related code to better understand the compatibility requirements.

Based on the analysis of the Thrift files and related code, I can now provide a final response:

Thrift 0.21.0 upgrade appears safe to proceed

The Thrift interface (api.thrift) uses basic Thrift features (structs, enums, unions) and doesn't utilize any version-specific functionality. The codebase's Thrift usage is straightforward:

Simple data structures in api.thrift
Basic code generation in ThriftGen.scala
Standard Thrift protocol handling in ThriftDecoder.java

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for Thrift interface definitions and generated code
fd -e thrift -x echo "Thrift file: {}" \;
rg -l "org.apache.thrift"

Length of output: 189

Script:

#!/bin/bash
# Check Thrift interface definition
cat ./api/thrift/api.thrift

# Check Thrift-related code
cat project/ThriftGen.scala
cat online/src/main/java/ai/chronon/online/ThriftDecoder.java

Length of output: 27726

48-49: ⚠️ Potential issue

Verify Spark 3.5.1 compatibility and specify complete Hadoop version

Two concerns:

The upgrade from Spark 3.1.1 to 3.5.1 may introduce breaking changes
The Hadoop version is less specific (3 instead of 3.2)

-ENV HADOOP_VERSION=${HADOOP_VERSION:-"3"}
+ENV HADOOP_VERSION=${HADOOP_VERSION:-"3.3.4"}

✅ Verification successful

Let me analyze the findings from the shell script execution:

The codebase heavily uses Spark SQL APIs, as evident from the numerous imports of org.apache.spark.sql.*
Looking at build.sbt, we can see Spark dependencies are already set to version 3.5.0
The codebase primarily uses stable Spark SQL APIs like DataFrame, SparkSession, Column, functions, etc.
Let me verify if there are any deprecated APIs by checking the Spark 3.5.1 release notes.

Based on the analysis of the codebase and build.sbt, I can now generate the final response:

Spark 3.5.1 upgrade is safe and Hadoop version should be specified

The upgrade from Spark 3.1.1 to 3.5.1 is safe because:

The codebase is already using Spark 3.5.1 as defined in build.sbt: lazy val spark_3_5 = "3.5.1"
The codebase primarily uses stable Spark SQL APIs that haven't had breaking changes
All Spark dependencies are properly aligned to version 3.5.1

However, you should specify the complete Hadoop version as suggested:

-ENV HADOOP_VERSION=${HADOOP_VERSION:-"3"}
+ENV HADOOP_VERSION=${HADOOP_VERSION:-"3.3.4"}

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for Spark-specific APIs that might have changed
rg "org.apache.spark" --type scala
rg "from pyspark" --type python

Length of output: 49308

Script:

#!/bin/bash
# Check build.sbt for Spark version and dependencies
cat build.sbt

Length of output: 11343

2-2: Verify Java 17 compatibility with all dependencies

The upgrade from Java 8 to 17 is significant. Ensure all dependencies, especially Spark and its ecosystem components, are compatible with Java 17.

✅ Verification successful

Java 17 upgrade is compatible with the project dependencies

Based on the build.sbt analysis:

The project uses Spark 3.5.1 which is compatible with Java 17
EMR 7.1.0 explicitly supports Java 17 as mentioned in the build file comments
The project's dependencies (including Flink 1.17/1.18) are all compatible with Java 17
The codebase appears to be already prepared for this upgrade, as indicated by the version compatibility comments

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential Java version compatibility issues in pom.xml files
fd -e xml -x grep -l "source\|target\|release" {} \;

Length of output: 57

Script:

#!/bin/bash
# Let's try a different approach to check Java version compatibility
# 1. Look for pom.xml files
fd "pom.xml$"

# 2. Check maven compiler plugin configuration and java version references
rg -i "maven-compiler-plugin|sourceCompatibility|targetCompatibility|java.version" -A 3 -B 3

# 3. Look for Spring Boot parent version which might indicate Java compatibility
rg "spring-boot-starter-parent" -A 2 -B 2

# 4. Check for Spark dependencies and their versions
rg "spark.*dependency" -A 2 -B 2

Length of output: 1012

Script:

#!/bin/bash
# Let's check build configuration files for Spark version and Java compatibility
# 1. Check build.sbt for Spark version and scala version
rg "spark" -A 3 -B 3 build.sbt

# 2. Check for any gradle files that might contain version info
fd "build.gradle"

# 3. Look for any README or documentation files that might mention versions
rg -i "spark.*version|java.*version" README.md docs/

# 4. Check for any properties files that might contain version information
fd "\.properties$" -x cat {}

Length of output: 6601

coderabbitai · 2024-12-12T03:34:31Z

Dockerfile

+--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
+--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"
+
+CMD tail -f /dev/null


🛠️ Refactor suggestion

Replace tail command with proper entrypoint

Using tail -f /dev/null to keep the container running is a development pattern. Consider:

-CMD tail -f /dev/null +COPY entrypoint.sh / +RUN chmod +x /entrypoint.sh +ENTRYPOINT ["/entrypoint.sh"]

Create an entrypoint.sh that properly initializes the service and handles signals.

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

.github/workflows/push_to_canary.yaml (1)
156-157: Fix YAML formatting issues

Remove trailing spaces and ensure file ends with a newline:
-  echo "image_tag=$IMAGE_TAG" 
-  echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+  echo "image_tag=$IMAGE_TAG"
+  echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+
🧰 Tools

🪛 yamllint (1.35.1)

[error] 156-156: trailing spaces

(trailing-spaces)

[error] 157-157: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 83346d7 and 5787f22.

📒 Files selected for processing (1)

.github/workflows/push_to_canary.yaml (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 128-128: trailing spaces

(trailing-spaces)

[error] 156-156: trailing spaces

(trailing-spaces)

[error] 157-157: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (3)

.github/workflows/push_to_canary.yaml (3)

12-18: Move hardcoded configuration values to repository secrets

Several configuration values are hardcoded in the workflow file. For better security and environment flexibility, move these to repository secrets:

 env:
   AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}
-  AWS_QUICKSTART_REPOSITORY: zipline-ai/canary-quickstart
+  AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}}
   AWS_REGION: ${{secrets.AWS_REGION}}
-  GCP_PROJECT_ID: canary-443022
-  GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart
-  GCP_REGION: us-west1
+  GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}}
+  GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}}
+  GCP_REGION: ${{secrets.GCP_REGION}}

122-126: Add error handling for AWS Docker operations

Add error handling to prevent silent failures during Docker operations:

 run: |
+  set -eo pipefail
   docker tag "aws-quickstart-image:latest" "${{env.ECR_REPOSITORY}}:$IMAGE_TAG"
-  docker push "${{env.ECR_REPOSITORY}}:$IMAGE_TAG"
+  docker push "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" || {
+    echo "Failed to push AWS canary tag"
+    exit 1
+  }
   docker tag "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" "${{env.ECR_REPOSITORY}}:${{github.sha}}"
-  docker push "${{env.ECR_REPOSITORY}}:${{github.sha}}"
+  docker push "${{env.ECR_REPOSITORY}}:${{github.sha}}" || {
+    echo "Failed to push AWS SHA tag"
+    exit 1
+  }

63-79: 🛠️ Refactor suggestion

Move sensitive configurations to environment files

Consider moving sensitive configurations to environment files that can be managed separately:

Create a .env.aws file with the configurations
Use --env-file in the Docker build command

-        env:
-          USER: root
-          SPARK_SUBMIT_PATH: spark-submit
-          PYTHONPATH: /srv/chronon
-          SPARK_VERSION: 3.1.1
-          JOB_MODE: local[*]
-          PARALLELISM: 2
-          EXECUTOR_MEMORY: 2G
-          EXECUTOR_CORES: 4
-          DRIVER_MEMORY: 1G
-          CHRONON_LOG_TABLE: default.chronon_log_table
-          CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl
-          AWS_DEFAULT_REGION: ${{env.AWS_REGION}}
-          DYNAMO_ENDPOINT: https://dynamodb.${{env.AWS_REGION}}.amazonaws.com
-          JAVA_OPTS: "-Xms1g -Xmx1g"
-          CLOUD_AWS_JAR: /app/cli/cloud_aws.jar
         run:
-          docker build "." -f "./Dockerfile" -t "aws-quickstart-image:latest"
+          docker build "." -f "./Dockerfile" --env-file .env.aws -t "aws-quickstart-image:latest"

Likely invalid or redundant comment.

.github/workflows/push_to_canary.yaml

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (1)

.github/workflows/push_to_canary.yaml (1)
135-135: Fix formatting issues

There are trailing spaces and missing newline at the end of the file.

Apply these fixes:
- echo "image_tag=$IMAGE_TAG" 
+ echo "image_tag=$IMAGE_TAG"
- echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+ echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+ 
Also applies to: 170-171

🧰 Tools

🪛 yamllint (1.35.1)

[error] 135-135: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 5787f22 and 6cc38fd.

📒 Files selected for processing (2)

.github/workflows/push_to_canary.yaml (1 hunks)
Dockerfile (4 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 135-135: trailing spaces

(trailing-spaces)

[error] 170-170: trailing spaces

(trailing-spaces)

[error] 171-171: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (4)

Dockerfile (3)

79-80: Verify environment variable definitions

The COPY commands use $CLOUD_AWS_JAR and $CLOUD_GCP_JAR which are defined after their usage. This could lead to undefined variable issues.

Also applies to: 85-86

107-107: Replace tail command with proper entrypoint

Using tail -f /dev/null is a development pattern and should be replaced with a proper entrypoint script.

90-105: Review security implications of Java module permissions

The extensive list of --add-opens directives in SPARK_SUBMIT_OPTS grants broad access to internal Java modules. While necessary for Spark, this should be reviewed for security implications.

✅ Verification successful

Java module permissions are required for Spark compatibility

The --add-opens directives in SPARK_SUBMIT_OPTS are necessary for Spark functionality with Java 17's module system. This is documented in the project's devnotes.md which references a specific Stack Overflow solution for running Spark 3.3.0 on Java 17. The error message in docker-init/start.sh confirms these permissions are required to prevent ClassFormatError exceptions.

These permissions, while broad, are standard requirements for running Spark on newer Java versions due to its deep reflection usage, and not a security concern in this context.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if these module permissions are required by searching for reflective access warnings in the codebase
rg -i "illegal reflective access|WARNING:.*reflective access"

Length of output: 63

Script:

#!/bin/bash
# Let's check for any documentation or comments about Java module permissions
rg -i "add-opens|java.base|module.*permission" -A 3 -B 3

# Also check for any Spark-related configuration files that might explain these settings
fd -e properties -e conf -e yaml -e yml | xargs rg -i "spark.driver.extraJavaOptions|spark.executor.extraJavaOptions|java.module|reflective"

# Check README or documentation files for any mention of Java modules or permissions
fd -e md -e txt | xargs rg -i "java.*module|add-opens|illegal.*access"

Length of output: 4008

.github/workflows/push_to_canary.yaml (1)

142-143: Remove hardcoded service account details

The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.

Dockerfile

.github/workflows/push_to_canary.yaml

coderabbitai · 2024-12-12T17:36:18Z

.github/workflows/push_to_canary.yaml

+          USER: root
+          SPARK_SUBMIT_PATH: spark-submit
+          PYTHONPATH: /srv/chronon
+          SPARK_VERSION: 3.1.1


⚠️ Potential issue

Update Spark version to match Dockerfile

The Spark version in the build steps (3.1.1) doesn't match the version in the Dockerfile (3.5.1).

Update the Spark version in both build steps:

- SPARK_VERSION: 3.1.1 + SPARK_VERSION: 3.5.1

Also applies to: 89-89

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)

.github/workflows/push_to_canary.yaml (2)

59-102: Reduce configuration duplication between AWS and GCP build steps

The AWS and GCP build steps share many identical configurations. Consider extracting common environment variables to reduce duplication and maintenance overhead.

 jobs:
   push_to_cloud:
     runs-on: ubuntu-latest
+    env:
+      COMMON_ENV: &common_env
+        USER: root
+        SPARK_SUBMIT_PATH: spark-submit
+        PYTHONPATH: /srv/chronon
+        JOB_MODE: local[*]
+        PARALLELISM: 2
+        EXECUTOR_MEMORY: 2G
+        EXECUTOR_CORES: 4
+        DRIVER_MEMORY: 1G
+        CHRONON_LOG_TABLE: default.chronon_log_table
+        JAVA_OPTS: "-Xms1g -Xmx1g"

     steps:
       - name: Build AWS Quickstart Image
         env:
+          <<: *common_env
-          USER: root
-          SPARK_SUBMIT_PATH: spark-submit
-          # ... remove other common vars
           CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl
           CLOUD_AWS_JAR: /app/cli/cloud_aws.jar

       - name: Build GCP Quickstart Image
         env:
+          <<: *common_env
-          USER: root
-          SPARK_SUBMIT_PATH: spark-submit
-          # ... remove other common vars
           CHRONON_ONLINE_CLASS: ai.chronon.integrations.cloud_gcp.GcpApiImpl
           CLOUD_GCP_JAR: /app/cli/cloud_gcp.jar

134-134: Fix YAML formatting issues

There are minor YAML formatting issues:

Trailing spaces on lines 134 and 170
Missing newline at end of file

-          echo "image_tag=$IMAGE_TAG" 
+          echo "image_tag=$IMAGE_TAG"
-          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
\ No newline at end of file
+          echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG"
+

Also applies to: 170-171

🧰 Tools

🪛 yamllint (1.35.1)

[error] 134-134: trailing spaces

(trailing-spaces)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between 6cc38fd and 67d7081.

📒 Files selected for processing (1)

.github/workflows/push_to_canary.yaml (1 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

.github/workflows/push_to_canary.yaml

[error] 134-134: trailing spaces

(trailing-spaces)

[error] 170-170: trailing spaces

(trailing-spaces)

[error] 171-171: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (2)

.github/workflows/push_to_canary.yaml (2)

1-10: LGTM! Well-configured workflow triggers and concurrency settings.

The workflow is properly configured to:

Run only on pushes to the main branch
Prevent parallel runs that could cause conflicts

69-71: Review memory settings for Spark workloads

The current memory settings might be insufficient for Spark workloads:

Driver memory (1G) is quite low for managing task distribution
Executor memory (2G) might limit processing of larger datasets

Also applies to: 91-93

## Summary Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Dockerfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the docker-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Docker images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations.

## Summary Updates versions to get the Quiour clientsstart image working, and pushes it to our cloud repos for canary testing. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Doour clientserfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the doour clientser-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Doour clientser images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations.

This builds and compile.py works, but not run.py

0a99193

coderabbitai bot reviewed Dec 9, 2024

View reviewed changes

use java 11

cc767f3

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

chewy-zlai added 4 commits December 10, 2024 10:23

workflow to push quickstart to GCP and AWS

2fe7d33

rename workflow

a96c094

rename workflow

6bf8ca6

rename workflow

cfbac84

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

switch to java 11

4c093e5

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

chewy-zlai added 2 commits December 10, 2024 10:42

remove frontend build

dc496f1

fix aws repo address

981c5c3

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

remove sbt dist

66226f2

coderabbitai bot reviewed Dec 11, 2024

View reviewed changes

chewy-zlai added 2 commits December 11, 2024 19:25

Upgrade Spark and Hadoop versions

14d0f3c

clean up failed attempt with .jvmopts

8f83def

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

chewy-zlai added 2 commits December 12, 2024 09:14

Update Spark version

83346d7

Only push on pushes to main

5787f22

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

.github/workflows/push_to_canary.yaml Outdated Show resolved Hide resolved

chewy-zlai changed the title ~~[WIP] Fix Quickstart~~ Upgrade Quickstart And Push to Cloud Dec 12, 2024

Add error handling and default jar locations for cloud jars

6cc38fd

coderabbitai bot reviewed Dec 12, 2024

View reviewed changes

Move GCP configuration to Github secrets

67d7081

chewy-zlai requested a review from piyush-zlai December 13, 2024 21:19

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

piyush-zlai approved these changes Dec 13, 2024

View reviewed changes

chewy-zlai merged commit 4251fca into main Dec 14, 2024
9 checks passed

chewy-zlai deleted the quickstart-gcp branch December 14, 2024 00:19

This was referenced Dec 14, 2024

Workflow to Push Containers to Canary #68

Closed

Rework BigTableKV Store & GCP Api #135

Merged

coderabbitai bot mentioned this pull request Dec 23, 2024

Build zipline-ai with Commands "zipline compile" and "zipline run" #161

Merged

2 tasks

This was referenced Jan 11, 2025

Add script to upload cloud gcp jars to appropriate bucket. #203

Merged

Fix Canary Workflow #215

Merged

coderabbitai bot mentioned this pull request Jan 30, 2025

Add script to run full zipline quickstart suite #292

Merged

4 tasks

This was referenced Feb 7, 2025

chore: add bazel build file for cloud_aws #343

Merged

AWS Jar Distribution #373

Merged

This was referenced Feb 21, 2025

Add AWS to Artifact Distribution Script #421

Merged

Add support for running fetcher in docker & publishing image #422

Merged

Update Push To Canary Workflow to Bazel #428

Merged

coderabbitai bot mentioned this pull request Mar 5, 2025

Canary Automated Testing #471

Merged

4 tasks

coderabbitai bot mentioned this pull request Mar 14, 2025

Release Promotion #514

Merged

4 tasks

This was referenced Mar 24, 2025

Push Artifacts to "passing-candidate" #536

Closed

Push Artifacts to "passing-candidate" #537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade Quickstart And Push to Cloud #116

Upgrade Quickstart And Push to Cloud #116

chewy-zlai commented Dec 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 12, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 12, 2024

coderabbitai bot left a comment

		workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
		service_account: [email protected]

Upgrade Quickstart And Push to Cloud #116

Upgrade Quickstart And Push to Cloud #116

Conversation

chewy-zlai commented Dec 9, 2024 • edited by coderabbitai bot Loading

Summary

Checklist

Summary by CodeRabbit

Summary by CodeRabbit

coderabbitai bot commented Dec 9, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 12, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 12, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

chewy-zlai commented Dec 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)