-
Notifications
You must be signed in to change notification settings - Fork 0
Upgrade Quickstart And Push to Cloud #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes in this pull request involve modifications to the Changes
Possibly related PRs
Suggested reviewers
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
docker-compose.yml (1)
Line range hint
1-83
: Consider adding health checks and proper service dependencies.While service dependencies are defined (e.g., Kafka depends on Zookeeper), the configuration could benefit from:
- Health checks for MongoDB, Kafka, and Zookeeper
- Explicit dependency of
main
service on MongoDB and KafkaExample addition for the
main
service:main: build: dockerfile: Dockerfile + depends_on: + mongodb: + condition: service_healthy + kafka: + condition: service_healthyAnd for MongoDB:
mongodb: image: mongo:latest + healthcheck: + test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet + interval: 10s + timeout: 10s + retries: 5 + start_period: 40sDockerfile (1)
Line range hint
82-83
: Add validation for JAR copy operation.The COPY operation for the JAR file should fail explicitly if the source doesn't exist.
Consider adding validation:
-COPY $CHRONON_JAR_PATH "$DRIVER_JAR_PATH" +COPY --chmod=644 $CHRONON_JAR_PATH "$DRIVER_JAR_PATH" +RUN test -f "$DRIVER_JAR_PATH" || (echo "ERROR: Driver JAR not found at $DRIVER_JAR_PATH" && exit 1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
Dockerfile
(1 hunks)docker-compose.yml
(1 hunks)
🔇 Additional comments (3)
docker-compose.yml (2)
39-40
: LGTM on build configuration change.
Switching to local builds is appropriate for development and testing of quickstart functionality.
Line range hint 75-80
: Consider uncommenting volume mounts for quickstart functionality.
The commented volume mounts appear to be essential for quickstart functionality, particularly:
/srv/chronon
for sample code/srv/onlineImpl
for KV Store implementation/srv/jars
for required Spark connectors
These mounts might be necessary to get run.py
working properly.
Dockerfile (1)
6-6
: 🛠️ Refactor suggestion
Verify JAR path and consider making it configurable.
The hardcoded JAR path assumes a specific build output structure and version. Consider:
- Making the version configurable via build arg
- Verifying the JAR exists before copying
Consider updating the Dockerfile to use build arguments:
-ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar
+ARG CHRONON_VERSION=0.1.0-SNAPSHOT
+ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-${CHRONON_VERSION}.jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
Dockerfile (1)
Line range hint
1-85
: Consider optimizing Dockerfile for better maintainability and build performanceWhile the overall structure is sound, consider these improvements:
- Group related ENV variables together (versions, paths)
- Combine RUN commands to reduce layers
- Add comments explaining the purpose of each major section
Example optimization for package installation:
-RUN apt-get update && apt-get install -y \ +RUN apt-get update && apt-get install -y --no-install-recommends \ curl \ python3 \ python3-dev \ python3-setuptools \ vim \ wget \ procps \ - python3-pip + python3-pip \ + && rm -rf /var/lib/apt/lists/*
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
Dockerfile
(2 hunks)
🔇 Additional comments (4)
Dockerfile (4)
Line range hint 1-85
: Verify alignment with PR objectives
The Dockerfile changes establish the build environment, but since the PR mentions run.py is not working:
- Confirm these changes are prerequisites for fixing run.py
- Consider adding a comment in the PR about how these changes contribute to the Quickstart fix
✅ Verification successful
Dockerfile changes align with fixing run.py and Quickstart
The Dockerfile changes are essential prerequisites for fixing run.py and Quickstart because:
- The Dockerfile sets up critical environment variables like
CHRONON_JAR_PATH
andDRIVER_JAR_PATH
that run.py relies on for execution - It installs necessary Python dependencies from
quickstart/requirements.txt
- It properly configures the Spark environment and copies required Quickstart implementation files (
quickstart/mongo-online-impl
)
The changes are well-aligned with the PR objectives as they establish the correct runtime environment needed by run.py.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for run.py and related configurations
fd run.py
# Check for any runtime configuration files
fd -e yaml -e properties -e conf -x grep -l "runtime|execution" {} \;
Length of output: 140
Script:
#!/bin/bash
# Check the content of run.py to understand its dependencies
cat api/py/ai/chronon/repo/run.py
# Look for Quickstart related files and documentation
fd -t f quickstart
Length of output: 29170
19-19
:
Major Thrift version upgrade requires careful testing
The upgrade from Thrift 0.13.0 to 0.21.0 is substantial. Please ensure:
- All Thrift-generated code is compatible
- No breaking changes affect the system
- Consider documenting the reason for this major version jump
✅ Verification successful
Based on the analysis of the codebase, I can now provide a final response:
Thrift version upgrade appears safe to proceed
The codebase shows a well-isolated Thrift usage pattern primarily for API definitions and protocol handling. The Thrift interface file (api.thrift
) contains only data structure definitions and doesn't use any deprecated or complex Thrift features that would be affected by version changes. The implementation uses standard Thrift protocols (Binary, JSON, Compact) and basic transport mechanisms that have remained stable across versions.
- The main Thrift usage is contained within
api/thrift/api.thrift
with clean interface definitions - Implementation uses standard Thrift protocols that are well-maintained across versions
- No usage of deprecated or version-specific Thrift features detected
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for Thrift interface definitions
fd -e thrift -x echo "Found Thrift file: {}" \;
# Check for Thrift version references in build files
rg --type=yaml --type=json --type=xml "thrift"
Length of output: 140
Script:
#!/bin/bash
# Check content of Thrift interface file
cat ./api/thrift/api.thrift
# Look for Thrift-related dependencies or configurations in package files
rg -g "package*.json" -g "*.xml" -g "*.gradle" -g "requirements.txt" -g "setup.py" "thrift"
# Search for Thrift client/server usage
rg -t py -t java -t js -t ts "(?i)thrift.(Server|Client|Protocol|Transport)"
Length of output: 77186
2-2
: Verify Java version compatibility with dependencies
The change from openjdk:8-jre-slim
to openjdk:11-jdk-slim
is significant. While Java 11 is a good choice (LTS), please ensure:
- Spark 3.1.1 is fully compatible with Java 11
- All dependencies support Java 11
- The switch from JRE to JDK is intentional (larger image size)
✅ Verification successful
Let me generate additional verification scripts to check Spark and Flink version compatibility specifically:
The previous results showed Java 11/17 mentions in comments and some Java-related dependencies, but we need to verify the actual Spark/Flink versions and their Java compatibility requirements to make a definitive assessment.
Java 11 is compatible with the project dependencies
The switch to Java 11 in the Dockerfile is appropriate because:
- The project uses Spark 3.5.1, which fully supports Java 11 (and even Java 17)
- The codebase appears to be already prepared for Java 11/17 as indicated by the development notes
- The JDK requirement (vs JRE) is justified by the presence of source generators (Thrift) in the build process
The dependencies look compatible:
- Spark 3.5.1 (core, streaming, sql-kafka)
- Java-specific libraries are version-appropriate (javax.servlet-api 3.1.0, snappy-java 1.1.10.4)
- Scala Java8 compatibility modules are included
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if any Java version-specific code or dependencies exist
rg -l "javax\.xml\.bind|javax\.activation|javax\.annotation|com\.sun\.|sun\."
# Look for potential Java version requirements in build files
fd -e sbt -e gradle -e pom.xml -x cat {} \; | rg -i "java|jdk|source|target"
Length of output: 1197
Script:
#!/bin/bash
# Check Spark version and Java requirements
fd spark -e sbt -e properties -x cat {} \; | rg -i "spark.*version|scala.*version"
# Check for explicit Java version configurations in build files
fd "build.sbt|build.properties" -x cat {} \; | rg -i "java|jdk|source|target"
# Look for Spark/Flink dependency declarations
rg -A 3 "spark-core|flink-core"
Length of output: 1857
6-6
: Verify JAR build output location
The JAR path has changed from target-embedded
to target
directory with a specific assembly JAR name. Since the PR mentions compile.py is working:
- Confirm this path matches the actual build output
- Consider documenting the build process in README
✅ Verification successful
JAR path change is consistent with build configuration
The JAR path change in the Dockerfile is valid and consistent with the project's build configuration:
- The
spark
project inbuild.sbt
has assembly configuration enabled throughsparkBaseSettings
- The assembly artifact will be generated under
spark/target/scala-2.12/
with-assembly
classifier - The path is correctly referenced in multiple places in the Dockerfile
compile.py
exists and is used for configuration materialization, not for JAR building
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if build documentation exists
fd README -x grep -l "build|compile" {} \;
# Look for build configuration that outputs this JAR
fd -e sbt -x grep -l "assembly" {} \;
Length of output: 214
Script:
#!/bin/bash
# Check build.sbt for assembly configuration
cat build.sbt
# Check for compile.py mentioned in PR
fd -e py compile.py
# Check if the JAR path exists in spark directory
fd -p "spark/target/.*jar$"
Length of output: 11457
Script:
#!/bin/bash
# Check compile.py content to understand build process
cat api/py/ai/chronon/repo/compile.py
# Check spark project's assembly configuration
grep -A 10 "lazy val spark = project" build.sbt
# Check if there are any other Dockerfile references to the JAR
rg "CHRONON_JAR_PATH" -A 2 -B 2
Length of output: 12263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
.github/workflows/push_to_canary.yaml (2)
131-131
: Fix YAML formatting issuesThere are trailing spaces and missing newline at the end of the file.
- echo "image_tag=$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" - echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" + echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" +Also applies to: 159-160
🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: trailing spaces
(trailing-spaces)
59-61
: Add build caching for SBTThe SBT build steps could benefit from caching to speed up the workflow.
Add caching step before the SBT build:
- uses: actions/cache@v3 with: path: | ~/.sbt ~/.ivy2/cache ~/.coursier/cache key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }} restore-keys: | ${{ runner.os }}-sbt-
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
.github/workflows/push_to_canary.yaml
(1 hunks)Dockerfile
(3 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 131-131: trailing spaces
(trailing-spaces)
[error] 159-159: trailing spaces
(trailing-spaces)
[error] 160-160: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (3)
Dockerfile (3)
2-2
: Verify Java 11 compatibility with the codebase
The base image has been updated from Java 8 to Java 11. While this is a good modernization step, we should ensure all dependencies are compatible with Java 11.
✅ Verification successful
Based on the search results, I can now generate the final response since I have enough information to verify Java 11 compatibility:
Java 11 compatibility is confirmed for this codebase
The codebase shows clear Java 11 compatibility through:
- Build configuration in
build.sbt
shows no Java version constraints that would conflict with Java 11 - The codebase uses Java 11 compatible libraries and features:
- Modern Scala versions (2.12.18, 2.13.14) which are Java 11 compatible
- Up-to-date dependencies like Guava 33.3.1-jre
- Usage of Java 11 compatible APIs (java.util.concurrent, java.time, etc.)
- No usage of deprecated Java 8 APIs that were removed in Java 11
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for Java version dependencies in build files
echo "Checking build.sbt and related files for Java version specifications:"
rg -A 3 "java|jdk|jre|sourceCompatibility|targetCompatibility|crossScalaVersions"
Length of output: 72161
6-8
: Verify JAR paths in build output
The JAR paths have been updated to point to new locations. We need to ensure these paths match the actual build output locations.
21-21
: Verify Thrift version compatibility
The Thrift version has been updated from 0.13.0 to 0.21.0. This is a significant version jump that might affect generated code.
on: | ||
push: | ||
# branches: | ||
# - 'main' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment branch filter for production safety
The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.
Uncomment the branch filter to restrict deployments:
on:
push:
-# branches:
-# - 'main'
+ branches:
+ - 'main'
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
on: | |
push: | |
# branches: | |
# - 'main' | |
on: | |
push: | |
branches: | |
- 'main' | |
workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github | ||
service_account: [email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove hardcoded service account details
The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.
Replace hardcoded values with secrets:
- workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github
- service_account: [email protected]
+ workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}}
+ service_account: ${{secrets.GCP_SERVICE_ACCOUNT}}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
workload_identity_provider: projects/703996152583/locations/global/workloadIdentityPools/github-actions/providers/github | |
service_account: [email protected] | |
workload_identity_provider: ${{secrets.GCP_WORKLOAD_IDENTITY_PROVIDER}} | |
service_account: ${{secrets.GCP_SERVICE_ACCOUNT}} |
GCP_REGION: us-west1 | ||
GCP_PROJECT_ID: canary-443022 | ||
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move hardcoded values to repository secrets
GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.
Replace hardcoded values with secrets:
- GCP_REGION: us-west1
- GCP_PROJECT_ID: canary-443022
- GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart
+ GCP_REGION: ${{secrets.GCP_REGION}}
+ GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}}
+ GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
GCP_REGION: us-west1 | |
GCP_PROJECT_ID: canary-443022 | |
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart | |
GCP_REGION: ${{secrets.GCP_REGION}} | |
GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}} | |
GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
.github/workflows/push_to_canary.yaml (4)
41-54
: Optimize Thrift installationThe Thrift installation process could be optimized to reduce build time and image size.
Consider these improvements:
- Use multi-stage builds to keep the final image clean
- Cache the Thrift installation using GitHub Actions cache
- name: Cache Thrift uses: actions/cache@v3 with: path: thrift-${{env.THRIFT_VERSION}} key: thrift-${{env.THRIFT_VERSION}}
56-62
: Add SBT caching for faster buildsThe SBT build process could be significantly faster with proper caching.
Add caching before the build step:
- name: Cache SBT uses: actions/cache@v3 with: path: | ~/.sbt ~/.ivy2/cache ~/.coursier/cache key: sbt-${{ hashFiles('**/*.sbt') }}-${{ hashFiles('project/build.properties') }}
63-84
: Move Docker build configuration to a fileThe large number of environment variables in the workflow makes it hard to maintain.
Consider moving the configuration to a separate file:
- Create
.github/config/aws-build.env
- Update the workflow:
- name: Build AWS Quickstart Image id: build-aws-app shell: bash env: USER: root run: | docker build "." -f "./Dockerfile" \ --env-file .github/config/aws-build.env \ -t "aws-quickstart-image:latest"
131-131
: Fix YAML formatting issuesThere are minor YAML formatting issues:
- Remove trailing spaces on lines 131 and 159
- Add newline at end of file
Also applies to: 159-160
🧰 Tools
🪛 yamllint (1.35.1)
[error] 131-131: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 131-131: trailing spaces
(trailing-spaces)
[error] 159-159: trailing spaces
(trailing-spaces)
[error] 160-160: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (4)
.github/workflows/push_to_canary.yaml (4)
3-7
: Uncomment branch filter for production safety
The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.
17-19
: Move hardcoded values to repository secrets
GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.
21-28
: LGTM! Job configuration follows best practices
The job configuration correctly sets minimal required permissions and uses a standard runner.
138-139
: Remove hardcoded service account details
The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.
env: | ||
AWS_REGION: ${{secrets.AWS_REGION}} | ||
AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}} | ||
AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}} | ||
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}} | ||
GCP_REGION: us-west1 | ||
GCP_PROJECT_ID: canary-443022 | ||
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing AWS environment variable
The AWS quickstart repository variable is not defined in the environment variables section.
Add the following environment variable:
AWS_REGION: ${{secrets.AWS_REGION}}
AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}}
AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}}
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}
+ AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
env: | |
AWS_REGION: ${{secrets.AWS_REGION}} | |
AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}} | |
AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}} | |
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}} | |
GCP_REGION: us-west1 | |
GCP_PROJECT_ID: canary-443022 | |
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart | |
env: | |
AWS_REGION: ${{secrets.AWS_REGION}} | |
AWS_APP_REPOSITORY: ${{secrets.APP_REPOSITORY}} | |
AWS_FRONTEND_REPOSITORY: ${{secrets.FRONTEND_REPOSITORY}} | |
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}} | |
AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}} | |
GCP_REGION: us-west1 | |
GCP_PROJECT_ID: canary-443022 | |
GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
.github/workflows/push_to_canary.yaml (2)
61-104
: Optimize Docker build configurationThe AWS and GCP build steps contain significant duplication and could be optimized.
Consider these improvements:
- Extract common environment variables:
+ env: + COMMON_ENV: | + USER=root + SPARK_SUBMIT_PATH=spark-submit + PYTHONPATH=/srv/chronon + SPARK_VERSION=3.1.1 + JOB_MODE=local[*] + PARALLELISM=2 + EXECUTOR_MEMORY=2G + EXECUTOR_CORES=4 + DRIVER_MEMORY=1G + CHRONON_LOG_TABLE=default.chronon_log_table + JAVA_OPTS="-Xms1g -Xmx1g" - name: Build AWS Quickstart Image env: - USER: root - SPARK_SUBMIT_PATH: spark-submit - # ... (remove other common vars) + <<: ${{env.COMMON_ENV}} CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl AWS_DEFAULT_REGION: ${{env.AWS_REGION}} DYNAMO_ENDPOINT: https://dynamodb.${{env.AWS_REGION}}.amazonaws.com CLOUD_AWS_JAR: /app/cli/cloud_aws.jar
- Use build arguments for variant-specific values:
run: | - docker build "." -f "./Dockerfile" -t "aws-quickstart-image:latest" + docker build "." -f "./Dockerfile" \ + --build-arg CLOUD_JAR="${CLOUD_AWS_JAR}" \ + --build-arg ONLINE_CLASS="${CHRONON_ONLINE_CLASS}" \ + -t "aws-quickstart-image:latest"
129-129
: Fix YAML formatting issuesThere are minor formatting issues in the file:
- Trailing spaces on lines 129 and 157
- Missing newline at end of file
Apply this diff to fix the formatting:
- echo "image_tag=$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" - echo "image_tag=$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" - echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" + echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" +Also applies to: 157-158
🧰 Tools
🪛 yamllint (1.35.1)
[error] 129-129: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 129-129: trailing spaces
(trailing-spaces)
[error] 157-157: trailing spaces
(trailing-spaces)
[error] 158-158: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (2)
.github/workflows/push_to_canary.yaml (2)
8-10
: LGTM: Appropriate concurrency configuration
The concurrency settings are well configured to prevent queue buildup by canceling outdated runs.
14-14
: Verify AWS repository URL construction
The AWS repository URL is constructed using multiple components. Let's verify the format is correct.
✅ Verification successful
AWS ECR repository URL format is correctly constructed
The AWS repository URL in the workflow file is properly constructed using the following format:
${{secrets.AWS_ACCOUNT_ID}}.dkr.ecr.us-west-1.amazonaws.com/zipline-ai/canary-quickstart
This follows the standard ECR repository URL pattern:
- Account ID from secrets
- ECR domain with region
- Repository path
The URL is used consistently throughout the workflow, including in the ECR login and docker push steps, and all sensitive information is properly stored in secrets.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify AWS repository URL format matches ECR standards
echo "Checking AWS repository URL format..."
REPO_URL="${{env.AWS_QUICKSTART_REPOSITORY}}"
if [[ ! $REPO_URL =~ ^[0-9]+\.dkr\.ecr\.[a-z0-9-]+\.amazonaws\.com/[a-z0-9-]+/[a-z0-9-]+$ ]]; then
echo "Invalid ECR repository URL format"
exit 1
fi
Length of output: 379
Script:
#!/bin/bash
# Let's verify the actual content of the workflow file
cat .github/workflows/push_to_canary.yaml
Length of output: 5840
- name: Build SBT Project | ||
id: sbt-dist | ||
run: | | ||
sbt clean && sbt assembly | ||
sbt dist | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize SBT build performance and reliability
The SBT build step could be improved with dependency caching and error handling.
Apply these changes:
+ - name: Cache SBT dependencies
+ uses: actions/cache@v3
+ with:
+ path: |
+ ~/.ivy2/cache
+ ~/.sbt
+ key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }}
+ restore-keys: |
+ ${{ runner.os }}-sbt-
- name: Build SBT Project
id: sbt-dist
run: |
+ set -eo pipefail
sbt clean && sbt assembly
sbt dist
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
- name: Build SBT Project | |
id: sbt-dist | |
run: | | |
sbt clean && sbt assembly | |
sbt dist | |
- name: Cache SBT dependencies | |
uses: actions/cache@v3 | |
with: | |
path: | | |
~/.ivy2/cache | |
~/.sbt | |
key: ${{ runner.os }}-sbt-${{ hashFiles('**/*.sbt') }} | |
restore-keys: | | |
${{ runner.os }}-sbt- | |
- name: Build SBT Project | |
id: sbt-dist | |
run: | | |
set -eo pipefail | |
sbt clean && sbt assembly | |
sbt dist |
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \ | ||
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | ||
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | ||
cd thrift-${{env.THRIFT_VERSION}} && \ | ||
sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \ | ||
sudo make && \ | ||
sudo make install && \ | ||
cd .. && \ | ||
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enhance security and reliability of Thrift installation
The current Thrift installation process has several areas for improvement:
- No checksum verification for the downloaded archive
- No error handling for failed steps
- Direct sudo usage in GitHub Actions (not recommended)
Apply this diff to improve the installation process:
run: |
+ set -eo pipefail
+ THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0" # SHA for version 0.21.0
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \
+ echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \
cd thrift-${{env.THRIFT_VERSION}} && \
- sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
- sudo make && \
- sudo make install && \
+ ./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \
+ make && \
+ sudo make install && \
cd .. && \
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \ | |
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | |
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | |
cd thrift-${{env.THRIFT_VERSION}} && \ | |
sudo ./configure --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \ | |
sudo make && \ | |
sudo make install && \ | |
cd .. && \ | |
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz | |
set -eo pipefail | |
THRIFT_SHA256="757d70a0855c59a3d42e6d3878bf73f18d9f9fca7f48cd44ce8b47d6d5b793e0" # SHA for version 0.21.0 | |
sudo apt-get install automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config && \ | |
curl -LSs https://archive.apache.org/dist/thrift/${{env.THRIFT_VERSION}}/thrift-${{env.THRIFT_VERSION}}.tar.gz -o thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | |
echo "$THRIFT_SHA256 thrift-${{env.THRIFT_VERSION}}.tar.gz" | sha256sum -c - && \ | |
tar -xzf thrift-${{env.THRIFT_VERSION}}.tar.gz && \ | |
cd thrift-${{env.THRIFT_VERSION}} && \ | |
./configure --prefix=/usr/local --without-python --without-cpp --without-nodejs --without-java --disable-debug --disable-tests --disable-libs && \ | |
make && \ | |
sudo make install && \ | |
cd .. && \ | |
sudo rm -rf thrift-${{env.THRIFT_VERSION}} thrift-${{env.THRIFT_VERSION}}.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
.github/workflows/push_to_canary.yaml (1)
128-128
: Fix YAML formatting issuesThere are several YAML formatting issues:
- Remove trailing spaces on lines 128 and 156
- Add a newline character at the end of the file
Also applies to: 156-157
🧰 Tools
🪛 yamllint (1.35.1)
[error] 128-128: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 128-128: trailing spaces
(trailing-spaces)
[error] 156-156: trailing spaces
(trailing-spaces)
[error] 157-157: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (7)
.github/workflows/push_to_canary.yaml (7)
3-7
: Uncomment branch filter for production safety
The branch filter is commented out, which means this workflow will run on pushes to any branch. This could lead to unintended deployments.
16-18
: Move hardcoded GCP values to repository secrets
GCP configuration values are hardcoded in the workflow. These should be moved to repository secrets for better security and flexibility across environments.
44-52
: Enhance security and reliability of Thrift installation
55-59
: Optimize SBT build performance and reliability
122-127
: Add error handling for Docker push commands
135-136
: Remove hardcoded service account details
69-73
: Review Spark memory configuration
The current memory settings might be insufficient for Spark workloads:
- Executor memory (2G) and driver memory (1G) seem low for processing data
- Consider increasing these values based on your data volume and processing requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
Dockerfile (2)
6-8
: Consider using build arguments for version numbersThe hardcoded version numbers in JAR paths make maintenance more challenging. Consider using ARG instructions for versions:
+ARG VERSION=0.1.0-SNAPSHOT -ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-0.1.0-SNAPSHOT.jar +ENV CHRONON_JAR_PATH=spark/target/scala-2.12/spark-assembly-${VERSION}.jar
88-103
: Review Java module access permissionsThe extensive
--add-opens
configurations expose internal Java modules. While necessary for Java 17 compatibility, consider:
- Documenting why each module needs to be opened
- Reviewing if all opened modules are necessary
- Security implications of exposing internal modules
Consider creating a separate documentation file explaining the rationale for each module access requirement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
Dockerfile
(4 hunks)
🔇 Additional comments (4)
Dockerfile (4)
83-84
:
Fix undefined environment variables
The COPY commands reference $CLOUD_AWS_JAR
and $CLOUD_GCP_JAR
which haven't been defined in the Dockerfile.
21-22
: Verify Thrift 0.21.0 compatibility
The upgrade from Thrift 0.13.0 to 0.21.0 is significant. Ensure all generated Thrift code is compatible with the new version.
✅ Verification successful
Let me check the Thrift interface and related code to better understand the compatibility requirements.
Based on the analysis of the Thrift files and related code, I can now provide a final response:
Thrift 0.21.0 upgrade appears safe to proceed
The Thrift interface (api.thrift
) uses basic Thrift features (structs, enums, unions) and doesn't utilize any version-specific functionality. The codebase's Thrift usage is straightforward:
- Simple data structures in
api.thrift
- Basic code generation in
ThriftGen.scala
- Standard Thrift protocol handling in
ThriftDecoder.java
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for Thrift interface definitions and generated code
fd -e thrift -x echo "Thrift file: {}" \;
rg -l "org.apache.thrift"
Length of output: 189
Script:
#!/bin/bash
# Check Thrift interface definition
cat ./api/thrift/api.thrift
# Check Thrift-related code
cat project/ThriftGen.scala
cat online/src/main/java/ai/chronon/online/ThriftDecoder.java
Length of output: 27726
48-49
:
Verify Spark 3.5.1 compatibility and specify complete Hadoop version
Two concerns:
- The upgrade from Spark 3.1.1 to 3.5.1 may introduce breaking changes
- The Hadoop version is less specific (3 instead of 3.2)
-ENV HADOOP_VERSION=${HADOOP_VERSION:-"3"}
+ENV HADOOP_VERSION=${HADOOP_VERSION:-"3.3.4"}
✅ Verification successful
Let me analyze the findings from the shell script execution:
- The codebase heavily uses Spark SQL APIs, as evident from the numerous imports of
org.apache.spark.sql.*
- Looking at
build.sbt
, we can see Spark dependencies are already set to version 3.5.0 - The codebase primarily uses stable Spark SQL APIs like DataFrame, SparkSession, Column, functions, etc.
- Let me verify if there are any deprecated APIs by checking the Spark 3.5.1 release notes.
Based on the analysis of the codebase and build.sbt, I can now generate the final response:
Spark 3.5.1 upgrade is safe and Hadoop version should be specified
The upgrade from Spark 3.1.1 to 3.5.1 is safe because:
- The codebase is already using Spark 3.5.1 as defined in build.sbt:
lazy val spark_3_5 = "3.5.1"
- The codebase primarily uses stable Spark SQL APIs that haven't had breaking changes
- All Spark dependencies are properly aligned to version 3.5.1
However, you should specify the complete Hadoop version as suggested:
-ENV HADOOP_VERSION=${HADOOP_VERSION:-"3"}
+ENV HADOOP_VERSION=${HADOOP_VERSION:-"3.3.4"}
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for Spark-specific APIs that might have changed
rg "org.apache.spark" --type scala
rg "from pyspark" --type python
Length of output: 49308
Script:
#!/bin/bash
# Check build.sbt for Spark version and dependencies
cat build.sbt
Length of output: 11343
2-2
: Verify Java 17 compatibility with all dependencies
The upgrade from Java 8 to 17 is significant. Ensure all dependencies, especially Spark and its ecosystem components, are compatible with Java 17.
✅ Verification successful
Java 17 upgrade is compatible with the project dependencies
Based on the build.sbt analysis:
- The project uses Spark 3.5.1 which is compatible with Java 17
- EMR 7.1.0 explicitly supports Java 17 as mentioned in the build file comments
- The project's dependencies (including Flink 1.17/1.18) are all compatible with Java 17
- The codebase appears to be already prepared for this upgrade, as indicated by the version compatibility comments
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for potential Java version compatibility issues in pom.xml files
fd -e xml -x grep -l "source\|target\|release" {} \;
Length of output: 57
Script:
#!/bin/bash
# Let's try a different approach to check Java version compatibility
# 1. Look for pom.xml files
fd "pom.xml$"
# 2. Check maven compiler plugin configuration and java version references
rg -i "maven-compiler-plugin|sourceCompatibility|targetCompatibility|java.version" -A 3 -B 3
# 3. Look for Spring Boot parent version which might indicate Java compatibility
rg "spring-boot-starter-parent" -A 2 -B 2
# 4. Check for Spark dependencies and their versions
rg "spark.*dependency" -A 2 -B 2
Length of output: 1012
Script:
#!/bin/bash
# Let's check build configuration files for Spark version and Java compatibility
# 1. Check build.sbt for Spark version and scala version
rg "spark" -A 3 -B 3 build.sbt
# 2. Check for any gradle files that might contain version info
fd "build.gradle"
# 3. Look for any README or documentation files that might mention versions
rg -i "spark.*version|java.*version" README.md docs/
# 4. Check for any properties files that might contain version information
fd "\.properties$" -x cat {}
Length of output: 6601
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \ | ||
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" | ||
|
||
CMD tail -f /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Replace tail command with proper entrypoint
Using tail -f /dev/null
to keep the container running is a development pattern. Consider:
-CMD tail -f /dev/null
+COPY entrypoint.sh /
+RUN chmod +x /entrypoint.sh
+ENTRYPOINT ["/entrypoint.sh"]
Create an entrypoint.sh
that properly initializes the service and handles signals.
Committable suggestion skipped: line range outside the PR's diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
.github/workflows/push_to_canary.yaml (1)
156-157
: Fix YAML formatting issuesRemove trailing spaces and ensure file ends with a newline:
- echo "image_tag=$IMAGE_TAG" - echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" + echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" +🧰 Tools
🪛 yamllint (1.35.1)
[error] 156-156: trailing spaces
(trailing-spaces)
[error] 157-157: no new line character at the end of file
(new-line-at-end-of-file)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 128-128: trailing spaces
(trailing-spaces)
[error] 156-156: trailing spaces
(trailing-spaces)
[error] 157-157: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (3)
.github/workflows/push_to_canary.yaml (3)
12-18
: Move hardcoded configuration values to repository secrets
Several configuration values are hardcoded in the workflow file. For better security and environment flexibility, move these to repository secrets:
env:
AWS_ACCOUNT_ID: ${{secrets.AWS_ACCOUNT_ID}}
- AWS_QUICKSTART_REPOSITORY: zipline-ai/canary-quickstart
+ AWS_QUICKSTART_REPOSITORY: ${{secrets.AWS_QUICKSTART_REPOSITORY}}
AWS_REGION: ${{secrets.AWS_REGION}}
- GCP_PROJECT_ID: canary-443022
- GAR_QUICKSTART_REPOSITORY: us-west1-docker.pkg.dev/canary-443022/canary-images/quickstart
- GCP_REGION: us-west1
+ GCP_PROJECT_ID: ${{secrets.GCP_PROJECT_ID}}
+ GAR_QUICKSTART_REPOSITORY: ${{secrets.GAR_QUICKSTART_REPOSITORY}}
+ GCP_REGION: ${{secrets.GCP_REGION}}
122-126
: Add error handling for AWS Docker operations
Add error handling to prevent silent failures during Docker operations:
run: |
+ set -eo pipefail
docker tag "aws-quickstart-image:latest" "${{env.ECR_REPOSITORY}}:$IMAGE_TAG"
- docker push "${{env.ECR_REPOSITORY}}:$IMAGE_TAG"
+ docker push "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" || {
+ echo "Failed to push AWS canary tag"
+ exit 1
+ }
docker tag "${{env.ECR_REPOSITORY}}:$IMAGE_TAG" "${{env.ECR_REPOSITORY}}:${{github.sha}}"
- docker push "${{env.ECR_REPOSITORY}}:${{github.sha}}"
+ docker push "${{env.ECR_REPOSITORY}}:${{github.sha}}" || {
+ echo "Failed to push AWS SHA tag"
+ exit 1
+ }
63-79
: 🛠️ Refactor suggestion
Move sensitive configurations to environment files
Consider moving sensitive configurations to environment files that can be managed separately:
- Create a
.env.aws
file with the configurations - Use
--env-file
in the Docker build command
- env:
- USER: root
- SPARK_SUBMIT_PATH: spark-submit
- PYTHONPATH: /srv/chronon
- SPARK_VERSION: 3.1.1
- JOB_MODE: local[*]
- PARALLELISM: 2
- EXECUTOR_MEMORY: 2G
- EXECUTOR_CORES: 4
- DRIVER_MEMORY: 1G
- CHRONON_LOG_TABLE: default.chronon_log_table
- CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl
- AWS_DEFAULT_REGION: ${{env.AWS_REGION}}
- DYNAMO_ENDPOINT: https://dynamodb.${{env.AWS_REGION}}.amazonaws.com
- JAVA_OPTS: "-Xms1g -Xmx1g"
- CLOUD_AWS_JAR: /app/cli/cloud_aws.jar
run:
- docker build "." -f "./Dockerfile" -t "aws-quickstart-image:latest"
+ docker build "." -f "./Dockerfile" --env-file .env.aws -t "aws-quickstart-image:latest"
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (1)
.github/workflows/push_to_canary.yaml (1)
135-135
: Fix formatting issuesThere are trailing spaces and missing newline at the end of the file.
Apply these fixes:
- echo "image_tag=$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" - echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" + echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" +Also applies to: 170-171
🧰 Tools
🪛 yamllint (1.35.1)
[error] 135-135: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (2)
.github/workflows/push_to_canary.yaml
(1 hunks)Dockerfile
(4 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 135-135: trailing spaces
(trailing-spaces)
[error] 170-170: trailing spaces
(trailing-spaces)
[error] 171-171: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (4)
Dockerfile (3)
79-80
: Verify environment variable definitions
The COPY commands use $CLOUD_AWS_JAR
and $CLOUD_GCP_JAR
which are defined after their usage. This could lead to undefined variable issues.
Also applies to: 85-86
107-107
: Replace tail command with proper entrypoint
Using tail -f /dev/null
is a development pattern and should be replaced with a proper entrypoint script.
90-105
: Review security implications of Java module permissions
The extensive list of --add-opens
directives in SPARK_SUBMIT_OPTS
grants broad access to internal Java modules. While necessary for Spark, this should be reviewed for security implications.
✅ Verification successful
Java module permissions are required for Spark compatibility
The --add-opens
directives in SPARK_SUBMIT_OPTS
are necessary for Spark functionality with Java 17's module system. This is documented in the project's devnotes.md
which references a specific Stack Overflow solution for running Spark 3.3.0 on Java 17. The error message in docker-init/start.sh
confirms these permissions are required to prevent ClassFormatError
exceptions.
These permissions, while broad, are standard requirements for running Spark on newer Java versions due to its deep reflection usage, and not a security concern in this context.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if these module permissions are required by searching for reflective access warnings in the codebase
rg -i "illegal reflective access|WARNING:.*reflective access"
Length of output: 63
Script:
#!/bin/bash
# Let's check for any documentation or comments about Java module permissions
rg -i "add-opens|java.base|module.*permission" -A 3 -B 3
# Also check for any Spark-related configuration files that might explain these settings
fd -e properties -e conf -e yaml -e yml | xargs rg -i "spark.driver.extraJavaOptions|spark.executor.extraJavaOptions|java.module|reflective"
# Check README or documentation files for any mention of Java modules or permissions
fd -e md -e txt | xargs rg -i "java.*module|add-opens|illegal.*access"
Length of output: 4008
.github/workflows/push_to_canary.yaml (1)
142-143
: Remove hardcoded service account details
The workload identity provider and service account email are hardcoded. These should be moved to repository secrets.
USER: root | ||
SPARK_SUBMIT_PATH: spark-submit | ||
PYTHONPATH: /srv/chronon | ||
SPARK_VERSION: 3.1.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update Spark version to match Dockerfile
The Spark version in the build steps (3.1.1) doesn't match the version in the Dockerfile (3.5.1).
Update the Spark version in both build steps:
- SPARK_VERSION: 3.1.1
+ SPARK_VERSION: 3.5.1
Also applies to: 89-89
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
.github/workflows/push_to_canary.yaml (2)
59-102
: Reduce configuration duplication between AWS and GCP build stepsThe AWS and GCP build steps share many identical configurations. Consider extracting common environment variables to reduce duplication and maintenance overhead.
jobs: push_to_cloud: runs-on: ubuntu-latest + env: + COMMON_ENV: &common_env + USER: root + SPARK_SUBMIT_PATH: spark-submit + PYTHONPATH: /srv/chronon + JOB_MODE: local[*] + PARALLELISM: 2 + EXECUTOR_MEMORY: 2G + EXECUTOR_CORES: 4 + DRIVER_MEMORY: 1G + CHRONON_LOG_TABLE: default.chronon_log_table + JAVA_OPTS: "-Xms1g -Xmx1g" steps: - name: Build AWS Quickstart Image env: + <<: *common_env - USER: root - SPARK_SUBMIT_PATH: spark-submit - # ... remove other common vars CHRONON_ONLINE_CLASS: ai.chronon.integrations.aws.AwsApiImpl CLOUD_AWS_JAR: /app/cli/cloud_aws.jar - name: Build GCP Quickstart Image env: + <<: *common_env - USER: root - SPARK_SUBMIT_PATH: spark-submit - # ... remove other common vars CHRONON_ONLINE_CLASS: ai.chronon.integrations.cloud_gcp.GcpApiImpl CLOUD_GCP_JAR: /app/cli/cloud_gcp.jar
134-134
: Fix YAML formatting issuesThere are minor YAML formatting issues:
- Trailing spaces on lines 134 and 170
- Missing newline at end of file
- echo "image_tag=$IMAGE_TAG" + echo "image_tag=$IMAGE_TAG" - echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" \ No newline at end of file + echo "full_image=${{env.GAR_QUICKSTART_REPOSITORY}}:$IMAGE_TAG" +Also applies to: 170-171
🧰 Tools
🪛 yamllint (1.35.1)
[error] 134-134: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (1)
.github/workflows/push_to_canary.yaml
(1 hunks)
🧰 Additional context used
🪛 yamllint (1.35.1)
.github/workflows/push_to_canary.yaml
[error] 134-134: trailing spaces
(trailing-spaces)
[error] 170-170: trailing spaces
(trailing-spaces)
[error] 171-171: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (2)
.github/workflows/push_to_canary.yaml (2)
1-10
: LGTM! Well-configured workflow triggers and concurrency settings.
The workflow is properly configured to:
- Run only on pushes to the main branch
- Prevent parallel runs that could cause conflicts
69-71
: Review memory settings for Spark workloads
The current memory settings might be insufficient for Spark workloads:
- Driver memory (1G) is quite low for managing task distribution
- Executor memory (2G) might limit processing of larger datasets
Also applies to: 91-93
## Summary Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Dockerfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the docker-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Docker images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Dockerfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the docker-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Docker images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing. ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Dockerfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the docker-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Docker images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Updates versions to get the Quiour clientsstart image working, and pushes it to our cloud repos for canary testing. ## Cheour clientslist - [ ] Added Unit Tests - [ ] Covered by existing CI - [ x] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Updated the Doour clientserfile to use a newer base image and modified the JAR file paths for building the application. - Upgraded the Spark version in the `main` service of the doour clientser-compose configuration. - Introduced a new GitHub Actions workflow to automate the process of pushing Doour clientser images to AWS and GCP. - **Bug Fixes** - Ensured environment variables for Spark and Chronon are correctly retained in the updated configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Updates versions to get the Quickstart image working, and pushes it to our cloud repos for canary testing.
Checklist
Summary by CodeRabbit
Summary by CodeRabbit
New Features
main
service of the docker-compose configuration.Bug Fixes