Skip to content

Add Singer infra #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Aug 14, 2020
Merged

Add Singer infra #58

merged 23 commits into from
Aug 14, 2020

Conversation

sherifnada
Copy link
Contributor

@sherifnada sherifnada commented Aug 14, 2020

What

Allow running tests which use Singer binaries under the hood to verify we can run an entire workflow correctly.

Some complicating factors:

  1. Some Singer connectors depend on binaries installed on the host system e.g: tap-postgres requires pg_config and python3.7.
  2. Testing Singer taps will often also involve setting up other infra components e.g: a Postgres DB for tests

How

We split the server docker image into 2:

  • The base image: sets up the server's host environment i.e: installs all system dependencies and builds distribution TARs via Gradle. This image is used to run unit and integration tests.
  • The distribution image: the base image minus the source code

To build the server, we:

  1. build base image
  2. run the base image to run tests
  3. if tests are successful, build the distribution image.
    This functionality is now inside ./tools/app/build.sh

The base image is similar to the previous server image with a few notable changes:

  1. It consistently uses the openjdk docker image as the base throughout and manually installs/copies any needed dependencies like gradle to ensure that we're running the exact same OS across build, test, and prod. OpenJDK and the Gradle base used different Linux distros which tripped up Singer system dependencies.
  2. It installs Singer libraries in /usr/local/lib/singer

This PR demos using singer libraries in tests in the class TestSingerDiscoveryWorker. Currently the only test case runs discovery on a Postgres DB which is instantiated inside the test via TestContainer.

Recommended reading order

  1. ./tools/app/build.sh
  2. ./server_base.Dockerfile
  3. ./server_dist.Dockerfile
  4. ./tools/singer/setup_singer_env.buster.sh
  5. ./tools/singer/install_all_connectors.sh
  6. ./tools/singer/install_connectors.sh
  7. TestSingerDiscoveryWorker and BaseWorkerTestCase

There are definitely some more test cases that should be added to TestSingerDiscoveryWorker but I think that should be added in future PRs to keep this change focused on the infra pieces of this more than the test cases.

@@ -39,6 +39,10 @@ subprojects {

test {
useJUnitPlatform()
testLogging() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tests fail we want to see the reasons on the CLI

export SINGER_ROOT=$1

pip3 install psycopg2-binary
./tools/singer/install_connector.sh tap-postgres tap-postgres 0.1.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinning versions forces docker to re-install if deps change

apt-get clean
apt-get update
apt-get -y install libpq-dev=11.7-0+deb10u1 \
python3.7=3.7.3-2+deb10u2 \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinning deps for docker caching

RUN mkdir -p /usr/local/lib/singer
RUN ./tools/singer/setup_singer_env.buster.sh /usr/local/lib/singer

# Install Node. While the UI is not going to be served from this container, running UI tests is part of the build.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a little unsure why Node is building given this, but added this to pass for now

@sherifnada sherifnada requested review from michel-tricot, jrhizor and cgardens and removed request for michel-tricot August 14, 2020 17:49
@@ -24,23 +28,11 @@ protected Path getWorkspacePath() {

private void createTestWorkspace() {
try {
workspaceDirectory = Paths.get("/tmp/tests/dataline-" + UUID.randomUUID().toString());
workspaceDirectory =
Paths.get("/tmp/tests/dataline-" + UUID.randomUUID().toString().substring(0, 8));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files.createTempDirectory("dataline");?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah didn't know about that one -- will take a look. TY for the tip.

# TODO: add data mount instead
RUN mkdir data

RUN cp /code/dataline-server/build/distributions/*.tar dataline-server.tar \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm probably just not understanding something here, but why don't we need to copy the distributions for each module (e.g. scheduler and worker)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought those are both running in the same binary? just thought this was sufficient. We both have the same intenion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now there's a single server build artifact

@@ -0,0 +1,19 @@
#!/usr/bin/env sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this script triggered? Is it part of the CI process?

Copy link
Contributor

@michel-tricot michel-tricot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good starting point but overall looks super hacky

import java.sql.SQLException;
import java.util.Map;
import org.junit.jupiter.api.Test;
import org.testcontainers.containers.PostgreSQLContainer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious do these test past on github?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup!

@@ -0,0 +1,15 @@
# Build final image
FROM dataline/server/base:latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to break down the file? Can't you use the multi-stage feature (as we were before)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way you split it you're adding a dependency that is not automatic. I can't do docker build server_dist.Dockerfile before doing the same with base

Copy link
Contributor Author

@sherifnada sherifnada Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline:
A couple of reasons:

  1. If tests run during docker build, then all other test dependencies (e.g: postgres container) need to be available + we can't start docker containers from the tests themselves since we can't mount the docker daemon as a volume.
  2. The entrypoint for the container runs tests, and the second container runs the server itself.

We probably want to do something with Docker compose for integration tests in the future.

@sherifnada sherifnada merged commit 636c5dd into master Aug 14, 2020
@sherifnada sherifnada deleted the sherif/singer-infra branch August 17, 2020 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants