Skip to content

Commit 1808c79

Browse files
authored
Add support for running fetcher in docker & publishing image (#422)
## Summary Add support to run the fetcher service in docker. Also add rails to publish to docker hub as a private image - [ziplineai/chronon-fetcher](https://hub.docker.com/repository/docker/ziplineai/chronon-fetcher) I wasn't able to sort out logback / log4j2 logging as there's a lot of deps messing things up - Vert.x supports JUL configs and that is seemingly working so starting with that for now. Tested with: ``` docker run -v ~/.config/gcloud/application_default_credentials.json:/gcp/credentials.json \ -p 9000:9000 \ -e "GCP_PROJECT_ID=canary-443022" \ -e "GOOGLE_CLOUD_PROJECT=canary-443022" \ -e "GCP_BIGTABLE_INSTANCE_ID=zipline-canary-instance" \ -e "STATSD_HOST=127.0.0.1" \ -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/credentials.json \ ziplineai/chronon-fetcher ``` And then you can `curl http://localhost:9000/ping` On Etsy side just need to swap out the project and bt instance id and then can curl the actual join: ``` curl -X POST http://localhost:9000/v1/fetch/join/search.ranking.v1_web_zipline_cdc_and_beacon_external -H 'Content-Type: application/json' -d '[{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"}]' {"results":[{"status":"Success","entityKeys":{"listing_id":"632126370","shop_id":"53908089","shipping_profile_id":"235561688531"},"features":{... ``` ## Checklist - [ ] Added Unit Tests - [ ] Covered by existing CI - [X] Integration tested - [ ] Documentation update <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added an automation script that streamlines the container image build and publication process with improved error handling. - Introduced a new container configuration that installs essential dependencies, sets environment variables, and incorporates a health check for enhanced reliability. - Implemented a robust logging setup that standardizes console and file outputs with log rotation. - Provided a startup script for the service that verifies required settings and applies platform-specific options for seamless execution. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
1 parent 4b5ade2 commit 1808c79

File tree

4 files changed

+178
-0
lines changed

4 files changed

+178
-0
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
#!/bin/bash
2+
3+
4+
if [[ -n $(git diff HEAD) ]]; then
5+
echo "Error: You have uncommitted changes. Please commit and push them to git so we can track them."
6+
exit 1
7+
fi
8+
9+
# Get current branch name
10+
local_branch=$(git rev-parse --abbrev-ref HEAD)
11+
12+
# Fetch latest from remote
13+
git fetch origin $local_branch
14+
15+
# Check if local is behind remote
16+
if [[ -n $(git diff HEAD..origin/$local_branch) ]]; then
17+
echo "Error: Your branch is not in sync with remote"
18+
echo "Please push your local changes and sync your local branch $local_branch with remote"
19+
exit 1
20+
fi
21+
22+
set -e
23+
24+
SCRIPT_DIRECTORY=$(dirname -- "$(realpath -- "$0")")
25+
CHRONON_ROOT_DIR=$(dirname "$SCRIPT_DIRECTORY")
26+
27+
echo "Working in $CHRONON_ROOT_DIR"
28+
cd $CHRONON_ROOT_DIR
29+
30+
echo "Building jars"
31+
32+
bazel build //cloud_gcp:cloud_gcp_lib_deploy.jar
33+
bazel build //service:service_assembly_deploy.jar
34+
35+
CLOUD_GCP_JAR="$CHRONON_ROOT_DIR/bazel-bin/cloud_gcp/cloud_gcp_lib_deploy.jar"
36+
SERVICE_JAR="$CHRONON_ROOT_DIR/bazel-bin/service/service_assembly_deploy.jar"
37+
38+
if [ ! -f "$CLOUD_GCP_JAR" ]; then
39+
echo "$CLOUD_GCP_JAR not found"
40+
exit 1
41+
fi
42+
43+
if [ ! -f "$SERVICE_JAR" ]; then
44+
echo "$SERVICE_JAR not found"
45+
exit 1
46+
fi
47+
48+
# We copy to build output as the docker build can't access the bazel-bin (as its a symlink)
49+
echo "Copying jars to build_output"
50+
mkdir -p build_output
51+
cp bazel-bin/service/service_assembly_deploy.jar build_output/
52+
cp bazel-bin/cloud_aws/cloud_aws_lib_deploy.jar build_output/
53+
cp bazel-bin/cloud_gcp/cloud_gcp_lib_deploy.jar build_output/
54+
55+
echo "Kicking off a docker login"
56+
docker login
57+
58+
docker buildx build \
59+
--platform linux/amd64,linux/arm64 \
60+
-f docker/fetcher/Dockerfile \
61+
-t ziplineai/chronon-fetcher:$(git rev-parse --short HEAD) \
62+
-t ziplineai/chronon-fetcher:latest \
63+
--push \
64+
.
65+
66+
# Clean up build output dir
67+
rm -rf build_output

docker/fetcher/Dockerfile

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Start from a Debian base image
2+
FROM openjdk:17-jdk-slim
3+
4+
# We expect jars to be copied to the build_output directory as docker can't read from bazel-bin as that's a symlink
5+
# https://stackoverflow.com/questions/31881904/docker-follow-symlink-outside-context
6+
ENV CLOUD_AWS_JAR_PATH=build_output/cloud_aws_lib_deploy.jar
7+
ENV CLOUD_GCP_JAR_PATH=build_output/cloud_gcp_lib_deploy.jar
8+
ENV FETCHER_SVC_JAR_PATH=build_output/service_assembly_deploy.jar
9+
ENV FETCHER_LAUNCH_SCRIPT=docker/fetcher/start.sh
10+
ENV GCP_ONLINE_CLASS=ai.chronon.integrations.cloud_gcp.GcpApiImpl
11+
ENV AWS_ONLINE_CLASS=ai.chronon.integrations.aws.AwsApiImpl
12+
13+
# Update package lists and install necessary tools
14+
RUN apt-get update && apt-get install -y \
15+
curl \
16+
python3 \
17+
python3-dev \
18+
python3-setuptools \
19+
vim \
20+
wget \
21+
procps \
22+
python3-pip
23+
24+
ENV SCALA_VERSION 2.12.18
25+
26+
RUN curl https://downloads.lightbend.com/scala/${SCALA_VERSION}/scala-${SCALA_VERSION}.deb -k -o scala.deb && \
27+
apt install -y ./scala.deb && \
28+
rm -rf scala.deb /var/lib/apt/lists/*
29+
30+
ENV SCALA_HOME="/usr/bin/scala"
31+
ENV PATH=${PATH}:${SCALA_HOME}/bin
32+
33+
WORKDIR /srv/zipline
34+
35+
ENV CLOUD_AWS_JAR=${CLOUD_AWS_JAR:-"/srv/zipline/cloud_aws/cloud_aws.jar"}
36+
ENV CLOUD_GCP_JAR=${CLOUD_GCP_JAR:-"/srv/zipline/cloud_gcp/cloud_gcp.jar"}
37+
ENV FETCHER_JAR=${FETCHER_JAR:-"/srv/zipline/fetcher/service.jar"}
38+
ENV LOG_PATH=${LOG_PATH:-"/srv/zipline/fetcher/logs"}
39+
40+
COPY $CLOUD_AWS_JAR_PATH "$CLOUD_AWS_JAR"
41+
COPY $CLOUD_GCP_JAR_PATH "$CLOUD_GCP_JAR"
42+
COPY $FETCHER_SVC_JAR_PATH "$FETCHER_JAR"
43+
COPY $FETCHER_LAUNCH_SCRIPT /srv/fetcher/start.sh
44+
COPY docker/fetcher/logging.properties /srv/zipline/fetcher/logging.properties
45+
46+
ENV FETCHER_PORT=9000
47+
48+
HEALTHCHECK --start-period=2m --retries=4 CMD curl --fail http://localhost:$FETCHER_PORT/ping || exit 1
49+
50+
RUN mkdir -p $LOG_PATH && \
51+
chmod 755 $LOG_PATH
52+
53+
CMD /srv/fetcher/start.sh

docker/fetcher/logging.properties

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Handlers - both console and file
2+
handlers=java.util.logging.ConsoleHandler,java.util.logging.FileHandler
3+
4+
# Common formatter for both handlers
5+
java.util.logging.SimpleFormatter.format=%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS.%1$tL [%4$s] %2$s: %5$s%6$s%n
6+
7+
# Console handler configuration
8+
java.util.logging.ConsoleHandler.level=INFO
9+
java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter
10+
11+
# File handler configuration
12+
java.util.logging.FileHandler.pattern=/srv/zipline/fetcher/logs/zipline-fs.%g.log
13+
java.util.logging.FileHandler.limit=104857600
14+
java.util.logging.FileHandler.count=30
15+
java.util.logging.FileHandler.formatter=java.util.logging.SimpleFormatter
16+
java.util.logging.FileHandler.level=INFO
17+
18+
# Root logger configuration
19+
.level=INFO
20+
21+
# Package-specific levels
22+
io.vertx.level=INFO
23+
ai.chronon.level=INFO

docker/fetcher/start.sh

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/bin/bash
2+
set -e
3+
4+
# Required environment variables
5+
required_vars=("FETCHER_JAR" "STATSD_HOST" "FETCHER_PORT")
6+
for var in "${required_vars[@]}"; do
7+
if [ -z "${!var}" ]; then
8+
echo "Error: Required environment variable $var is not set"
9+
exit 1
10+
fi
11+
done
12+
13+
if [[ $USE_AWS == true ]]; then
14+
ONLINE_JAR=$CLOUD_AWS_JAR
15+
ONLINE_CLASS=$AWS_ONLINE_CLASS
16+
else
17+
ONLINE_JAR=$CLOUD_GCP_JAR
18+
ONLINE_CLASS=$GCP_ONLINE_CLASS
19+
fi
20+
21+
JMX_OPTS="-XX:MaxMetaspaceSize=1g -XX:MaxRAMPercentage=70.0 -XX:MinRAMPercentage=70.0 -XX:InitialRAMPercentage=70.0 -XX:MaxHeapFreeRatio=100 -XX:MinHeapFreeRatio=0"
22+
23+
echo "Starting Fetcher service with online jar $ONLINE_JAR and online class $ONLINE_CLASS"
24+
25+
if ! java -Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.JULLogDelegateFactory \
26+
-Djava.util.logging.config.file=/srv/zipline/fetcher/logging.properties \
27+
-jar $FETCHER_JAR run ai.chronon.service.FetcherVerticle \
28+
$JMX_OPTS \
29+
-Dserver.port=$FETCHER_PORT \
30+
-Donline.jar=$ONLINE_JAR \
31+
-Dai.chronon.metrics.host=$STATSD_HOST \
32+
-Donline.class=$ONLINE_CLASS; then
33+
echo "Error: Fetcher service failed to start"
34+
exit 1
35+
fi

0 commit comments

Comments
 (0)