Skip to content

feat: add properties to set universe domain and endpoint in bigquery #3158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Sep 26, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ public class GcpBigQueryAutoConfiguration {

private int threadPoolSize;

private String universeDomain;
private String jsonWriterEndpoint;

private String host;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling out the addition of this property since this is a deviation from our original design of exposing universe domain and endpoint. Apiary libraries such as Bigquery and Storage only allow for customizing host (which is essentially endpoint without the port) instead of endpoint so we are doing the same in the Spring Cloud GCP layer. For context: googleapis/sdk-platform-java#2329

@burkedavison @blakeli0


GcpBigQueryAutoConfiguration(
GcpBigQueryProperties gcpBigQueryProperties,
GcpProjectIdProvider projectIdProvider,
Expand All @@ -78,6 +83,10 @@ public class GcpBigQueryAutoConfiguration {
this.jsonWriterBatchSize = gcpBigQueryProperties.getJsonWriterBatchSize();

this.threadPoolSize = getThreadPoolSize(gcpBigQueryProperties.getThreadPoolSize());

this.universeDomain = gcpBigQueryProperties.getUniverseDomain();
this.jsonWriterEndpoint = gcpBigQueryProperties.getJsonWriterEndpoint();
this.host = gcpBigQueryProperties.getHost();
}

/**
Expand All @@ -97,25 +106,35 @@ private int getThreadPoolSize(int threadPoolSize) {
@Bean
@ConditionalOnMissingBean
public BigQuery bigQuery() throws IOException {
BigQueryOptions bigQueryOptions =
BigQueryOptions.Builder bigQueryOptionsBuilder =
BigQueryOptions.newBuilder()
.setProjectId(this.projectId)
.setCredentials(this.credentialsProvider.getCredentials())
.setHeaderProvider(new UserAgentHeaderProvider(GcpBigQueryAutoConfiguration.class))
.build();
return bigQueryOptions.getService();
.setHeaderProvider(new UserAgentHeaderProvider(GcpBigQueryAutoConfiguration.class));
if (this.universeDomain != null) {
bigQueryOptionsBuilder.setUniverseDomain(this.universeDomain);
}
if (this.host != null) {
bigQueryOptionsBuilder.setHost(this.host);
}
return bigQueryOptionsBuilder.build().getService();
}

@Bean
@ConditionalOnMissingBean
public BigQueryWriteClient bigQueryWriteClient() throws IOException {
BigQueryWriteSettings bigQueryWriteSettings =
BigQueryWriteSettings.Builder bigQueryWriteSettingsBuilder =
BigQueryWriteSettings.newBuilder()
.setCredentialsProvider(this.credentialsProvider)
.setQuotaProjectId(this.projectId)
.setHeaderProvider(new UserAgentHeaderProvider(GcpBigQueryAutoConfiguration.class))
.build();
return BigQueryWriteClient.create(bigQueryWriteSettings);
.setHeaderProvider(new UserAgentHeaderProvider(GcpBigQueryAutoConfiguration.class));
if (this.universeDomain != null) {
bigQueryWriteSettingsBuilder.setUniverseDomain(this.universeDomain);
}
if (this.jsonWriterEndpoint != null) {
bigQueryWriteSettingsBuilder.setEndpoint(this.jsonWriterEndpoint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like jsonWriterEndpoint is only used for BigQueryWriteClient, which is actually part of bigquery-storage, not bigquery. Ideally I think we should have different auto-configuration for them, but since they are in one auto-configuration, can we either expose host or endpoint? And construct the other dynamically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it's only used for BigqueryWriteClient (bigquery storage). We can expose endpoint and strip the port and reformat it to the pattern that DEFAULT_HOST currently has when setting it to BigqueryOptions.

However, the concern I have with this is that it would prevent us from making endpoint/host configurable on a per-service basis. In fact, looking at the code again, I'm wondering if it was a mistake to make universe domain shared between bigquery and bigquerystorage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we share anything else between Bigquery and Bigquerystorage client? Ideally they should be separate as they are two different services, but looks like we are treating them the same in spring-cloud-gcp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - you're right, they do seem pretty interconnected in spring-cloud-gcp.
BigqueryTemplate wraps both Bigquery and BigQueryWriteClient and both services share the spring.cloud.gcp.bigquery.threadPoolSize property and spring.cloud.gcp.bigquery.datasetName property. Dataset name is used by the BigtableTemplate#writeJsonStream method which uses Bigquery to create a table before invoking the BigQuery Storage Write API.

spring.cloud.gcp.bigquery.jsonWriterBatchSize is the only property that is unique to bigquery storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two options

  1. Continue treating bigquery and bigquerystorage the same product. Then we only introduce universe-domain, and either host or endpoint.
  2. Start treating them two different product. Then we would introduce universe-domain and host for bigquery, universe-domain and endpoint for bigquerystorage.
    Any preferences? cc: @burkedavison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, considering that both bigquery and bigquerystorage are wrapped as one product in spring-cloud-gcp, we will expose global endpoint and universe domain properties. The port will be omitted from the endpoint when set to the bigquery client.

}
return BigQueryWriteClient.create(bigQueryWriteSettingsBuilder.build());
}

@Bean
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,19 @@ public class GcpBigQueryProperties implements CredentialsSupplier {
/** The size of thread pool of ThreadPoolTaskScheduler used by GcpBigQueryAutoConfiguration */
private int threadPoolSize;

private String universeDomain;

/**
* Endpoint (formatted as `${service}.${universeDomain}:${port}`) that will be used by
* BigQueryJsonDataWriter.
*/
private String jsonWriterEndpoint;

/**
* Host (formatted as `https://{service}.{universeDomain}`) that will be used by BigQueryOptions.
*/
private String host;

public int getJsonWriterBatchSize() {
return jsonWriterBatchSize;
}
Expand Down Expand Up @@ -80,4 +93,29 @@ public String getDatasetName() {
public void setDatasetName(String datasetName) {
this.datasetName = datasetName;
}

public String getUniverseDomain() {
return universeDomain;
}

public void setUniverseDomain(String universeDomain) {
this.universeDomain = universeDomain;
}

public String getJsonWriterEndpoint() {
return jsonWriterEndpoint;
}

public void setJsonWriterEndpoint(String jsonWriterEndpoint) {
this.jsonWriterEndpoint = jsonWriterEndpoint;
}

public String getHost() {
return host;
}

public void setHost(String host) {
this.host = host;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import com.google.auth.oauth2.GoogleCredentials;
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.storage.v1.BigQueryWriteClient;
import com.google.cloud.spring.autoconfigure.core.GcpContextAutoConfiguration;
import com.google.cloud.spring.bigquery.core.BigQueryTemplate;
import org.junit.jupiter.api.Test;
Expand Down Expand Up @@ -60,6 +61,107 @@ void testSettingBigQueryOptions() {
});
}

@Test
void testBigQuery_universeDomain() {
this.contextRunner
.withPropertyValues("spring.cloud.gcp.bigquery.universe-domain=myUniverseDomain")
.run(
ctx -> {
BigQueryOptions options = ctx.getBean(BigQuery.class).getOptions();
assertThat(options.getUniverseDomain()).isEqualTo("myUniverseDomain");
assertThat(options.getResolvedApiaryHost("bigquery"))
.isEqualTo("https://bigquery.myUniverseDomain/");
});
}

@Test
void testBigQuery_noUniverseDomainAndHostSet_useClientDefault() {
this.contextRunner.run(
ctx -> {
BigQueryOptions options = ctx.getBean(BigQuery.class).getOptions();
assertThat(options.getUniverseDomain()).isNull();
assertThat(options.getResolvedApiaryHost("bigquery"))
.isEqualTo("https://bigquery.googleapis.com/");
});
}

@Test
void testBigQuery_host() {
this.contextRunner
.withPropertyValues("spring.cloud.gcp.bigquery.host=bigquery.example.com")
.run(
ctx -> {
BigQueryOptions options = ctx.getBean(BigQuery.class).getOptions();
assertThat(options.getResolvedApiaryHost("bigquery"))
.isEqualTo("bigquery.example.com");
});
}

@Test
void testBigQuery_bothHostAndUniverseDomainSet() {
this.contextRunner
.withPropertyValues("spring.cloud.gcp.bigquery.host=bigquery.example.com")
.withPropertyValues("spring.cloud.gcp.bigquery.universe-domain=myUniverseDomain")
.run(
ctx -> {
BigQueryOptions options = ctx.getBean(BigQuery.class).getOptions();
assertThat(options.getResolvedApiaryHost("bigquery"))
.isEqualTo("bigquery.example.com");
assertThat(options.getUniverseDomain()).isEqualTo("myUniverseDomain");
});
}

@Test
void testBigQueryWrite_universeDomain() {
this.contextRunner
.withPropertyValues("spring.cloud.gcp.bigquery.universe-domain=myUniverseDomain")
.run(
ctx -> {
BigQueryWriteClient writeClient = ctx.getBean(BigQueryWriteClient.class);
assertThat(writeClient.getSettings().getUniverseDomain())
.isEqualTo("myUniverseDomain");
});
}

@Test
void testBigQueryWrite_endpoint() {
this.contextRunner
.withPropertyValues(
"spring.cloud.gcp.bigquery.jsonWriterEndpoint=bigquerystorage.example.com:123")
.run(
ctx -> {
BigQueryWriteClient client = ctx.getBean(BigQueryWriteClient.class);
assertThat(client.getSettings().getEndpoint())
.isEqualTo("bigquerystorage.example.com:123");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that it's intended to have getEndpoint not return the URI scheme, but getResolvedApiaryHost does return the URI scheme.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, returning endpoint with this pattern is intended for Bigquerystorage which will otherwise throw an exception if it doesn't follow a specific convention. For example, passing in bigquerystorage.example.com without the port will result in:

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.google.cloud.bigquery.storage.v1.BigQueryWriteClient]: Factory method 'bigQueryWriteClient' threw exception with message: invalid endpoint, expecting "<host>:<port>"
	at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:178)
	at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:644)
	... 31 more
Caused by: java.lang.IllegalArgumentException: invalid endpoint, expecting "<host>:<port>"

On the other hand bigquery accepts either accepts user-provided endpoint as-is if one is provided or builds the host in the https://service.universeDomain format if only the universe domain is provided:
https://github.com/googleapis/sdk-platform-java/blob/1a988df22f7e3d15ce6b121bf26897c59ab468e4/java-core/google-cloud-core/src/main/java/com/google/cloud/ServiceOptions.java#L868 but doesn't contain an explicit check to verify this schema. And in this PR, we reformat the provided endpoint to follow the same pattern that getResolvedApiaryHost() defaults to.

});
}

@Test
void testBigQueryWrite_bothUniverseDomainAndEndpointSet() {
this.contextRunner
.withPropertyValues("spring.cloud.gcp.bigquery.universe-domain=myUniverseDomain")
.withPropertyValues(
"spring.cloud.gcp.bigquery.jsonWriterEndpoint=bigquerystorage.example.com:123")
.run(
ctx -> {
BigQueryWriteClient client = ctx.getBean(BigQueryWriteClient.class);
assertThat(client.getSettings().getUniverseDomain()).isEqualTo("myUniverseDomain");
assertThat(client.getSettings().getEndpoint())
.isEqualTo("bigquerystorage.example.com:123");
});
}

@Test
void testBigQueryWrite_noUniverseDomainOrEndpointSet_useClientDefault() {
this.contextRunner.run(
ctx -> {
BigQueryWriteClient client = ctx.getBean(BigQueryWriteClient.class);
assertThat(client.getSettings().getUniverseDomain()).isEqualTo("googleapis.com");
assertThat(client.getSettings().getEndpoint())
.isEqualTo("bigquerystorage.googleapis.com:443");
});
}

/** Spring Boot config for tests. */
@AutoConfigurationPackage
static class TestConfiguration {
Expand Down
Loading