Skip to content

Commit 2c8317b

Browse files
authored
Prepare for KCL .Net Release 4.0.0 (#251)
1 parent 4e7f798 commit 2c8317b

File tree

3 files changed

+204
-23
lines changed

3 files changed

+204
-23
lines changed

README.md

+34-1
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,40 @@ all languages.
150150

151151
## Release Notes
152152

153+
### Release 4.0.0 (November 6, 2024)
154+
* New lease assignment / load balancing algorithm
155+
* KCL 3.x introduces a new lease assignment and load balancing algorithm. It assigns leases among workers based on worker utilization metrics and throughput on each lease, replacing the previous lease count-based lease assignment algorithm.
156+
* When KCL detects higher variance in CPU utilization among workers, it proactively reassigns leases from over-utilized workers to under-utilized workers for even load balancing. This ensures even CPU utilization across workers and removes the need to over-provision the stream processing compute hosts.
157+
* Optimized DynamoDB RCU usage
158+
* KCL 3.x optimizes DynamoDB read capacity unit (RCU) usage on the lease table by implementing a global secondary index with leaseOwner as the partition key. This index mirrors the leaseKey attribute from the base lease table, allowing workers to efficiently discover their assigned leases by querying the index instead of scanning the entire table.
159+
* This approach significantly reduces read operations compared to earlier KCL versions, where workers performed full table scans, resulting in higher RCU consumption.
160+
* Graceful lease handoff
161+
* KCL 3.x introduces a feature called "graceful lease handoff" to minimize data reprocessing during lease reassignments. Graceful lease handoff allows the current worker to complete checkpointing of processed records before transferring the lease to another worker. For graceful lease handoff, you should implement checkpointing logic within the existing `shutdownRequested()` method.
162+
* This feature is enabled by default in KCL 3.x, but you can turn off this feature by adjusting the configuration property `isGracefulLeaseHandoffEnabled`.
163+
* While this approach significantly reduces the probability of data reprocessing during lease transfers, it doesn't completely eliminate the possibility. To maintain data integrity and consistency, it's crucial to design your downstream consumer applications to be idempotent. This ensures that the application can handle potential duplicate record processing without adverse effects.
164+
* New DynamoDB metadata management artifacts
165+
* KCL 3.x introduces two new DynamoDB tables for improved lease management:
166+
* Worker metrics table: Records CPU utilization metrics from each worker. KCL uses these metrics for optimal lease assignments, balancing resource utilization across workers. If CPU utilization metric is not available, KCL assigns leases to balance the total sum of shard throughput per worker instead.
167+
* Coordinator state table: Stores internal state information for workers. Used to coordinate in-place migration from KCL 2.x to KCL 3.x and leader election among workers.
168+
* Follow this [documentation](https://docs.aws.amazon.com/streams/latest/dev/kcl-migration-from-2-3.html#kcl-migration-from-2-3-IAM-permissions) to add required IAM permissions for your KCL application.
169+
* Other improvements and changes
170+
* Dependency on the AWS SDK for Java 1.x has been fully removed.
171+
* The Glue Schema Registry integration functionality no longer depends on AWS SDK for Java 1.x. Previously, it required this as a transient dependency.
172+
* Multilangdaemon has been upgraded to use AWS SDK for Java 2.x. It no longer depends on AWS SDK for Java 1.x.
173+
* `idleTimeBetweenReadsInMillis` (PollingConfig) now has a minimum default value of 200.
174+
* This polling configuration property determines the [publishers](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client/src/main/java/software/amazon/kinesis/retrieval/polling/PrefetchRecordsPublisher.java) wait time between GetRecords calls in both success and failure cases. Previously, setting this value below 200 caused unnecessary throttling. This is because Amazon Kinesis Data Streams supports up to five read transactions per second per shard for shared-throughput consumers.
175+
* Shard lifecycle management is improved to deal with edge cases around shard splits and merges to ensure records continue being processed as expected.
176+
* Migration
177+
* The programming interfaces of KCL 3.x remain identical with KCL 2.x for an easier migration. For detailed migration instructions, please refer to the [Migrate consumers from KCL 2.x to KCL 3.x](https://docs.aws.amazon.com/streams/latest/dev/kcl-migration-from-2-3.html) page in the Amazon Kinesis Data Streams developer guide.
178+
* Configuration properties
179+
* New configuration properties introduced in KCL 3.x are listed in this [doc](https://github.com/awslabs/amazon-kinesis-client/blob/master/docs/kcl-configurations.md#new-configurations-in-kcl-3x).
180+
* Deprecated configuration properties in KCL 3.x are listed in this [doc](https://github.com/awslabs/amazon-kinesis-client/blob/master/docs/kcl-configurations.md#discontinued-configuration-properties-in-kcl-3x). You need to keep the deprecated configuration properties during the migration from any previous KCL version to KCL 3.x.
181+
* Metrics
182+
* New CloudWatch metrics introduced in KCL 3.x are explained in the [Monitor the Kinesis Client Library with Amazon CloudWatch](https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-kcl.html) in the Amazon Kinesis Data Streams developer guide. The following operations are newly added in KCL 3.x:
183+
* `LeaseAssignmentManager`
184+
* `WorkerMetricStatsReporter`
185+
* `LeaseDiscovery`
186+
153187
### Release 3.0.1 (April 24, 2024)
154188
* Upgraded amazon-kinesis-client from 2.4.4 to 2.5.8
155189
* Upgraded netcoreapp from 5.0 to 6.0
@@ -224,7 +258,6 @@ all languages.
224258
[amazon-kinesis-ruby-github]: https://github.com/awslabs/amazon-kinesis-client-ruby
225259
[amazon-kinesis-nodejs-github]: https://github.com/awslabs/amazon-kinesis-client-nodejs
226260
[multi-lang-daemon]: https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/com/amazonaws/services/kinesis/multilang/package-info.java
227-
[DefaultAWSCredentialsProviderChain]: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html
228261
[kinesis-forum]: http://developer.amazonwebservices.com/connect/forum.jspa?forumID=169
229262
[master-zip]: https://github.com/awslabs/amazon-kinesis-client-net/archive/master.zip
230263
[aws-sdk-dotnet]: https://aws.amazon.com/sdk-for-net/

SampleConsumer/kcl.properties

+104-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
# The script that abides by the multi-language protocol. This script will
22
# be executed by the MultiLangDaemon, which will communicate with this script
33
# over STDIN and STDOUT according to the multi-language protocol.
4+
# Ensure the path to the executable is correct: dotnet <path-to-your-dll>/SampleConsumer.dll
45
executableName = dotnet SampleConsumer.dll
56

7+
# The Stream arn: arn:aws:kinesis:<region>:<account id>:stream/<stream name>
8+
# Important: streamArn takes precedence over streamName if both are set
9+
streamArn = arn:aws:kinesis:us-east-5:000000000000:stream/kclnetsample
10+
611
# The name of an Amazon Kinesis stream to process.
712
streamName = kclnetsample
813

@@ -12,10 +17,12 @@ streamName = kclnetsample
1217
applicationName = DotNetKinesisSample
1318

1419
# Users can change the credentials provider the KCL will use to retrieve credentials.
15-
# The DefaultAWSCredentialsProviderChain checks several other providers, which is
20+
# Expected key name (case-sensitive):
21+
# AwsCredentialsProvider / AwsCredentialsProviderDynamoDB / AwsCredentialsProviderCloudWatch
22+
# The DefaultCredentialsProvider checks several other providers, which is
1623
# described here:
17-
# http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html
18-
AWSCredentialsProvider = DefaultAWSCredentialsProviderChain
24+
# https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html
25+
AwsCredentialsProvider = DefaultCredentialsProvider
1926

2027
# Appended to the user agent of the KCL. Does not impact the functionality of the
2128
# KCL in any other way.
@@ -25,6 +32,11 @@ processingLanguage = C#
2532
# See http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html#API_GetShardIterator_RequestSyntax
2633
initialPositionInStream = TRIM_HORIZON
2734

35+
# To specify an initial timestamp from which to start processing records, please specify timestamp value for 'initiatPositionInStreamExtended',
36+
# and uncomment below line with right timestamp value.
37+
# See more from 'Timestamp' under http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html#API_GetShardIterator_RequestSyntax
38+
#initialPositionInStreamExtended = 1636609142
39+
2840
# The following properties are also available for configuring the KCL Worker that is created
2941
# by the MultiLangDaemon.
3042

@@ -81,3 +93,92 @@ regionName = us-east-1
8193
# active threads set to the provided value. If a non-positive integer or no
8294
# value is provided a CachedThreadPool is used.
8395
#maxActiveThreads = 0
96+
97+
################### KclV3 configurations ###################
98+
# NOTE : These are just test configurations to show how to customize
99+
# all possible KCLv3 configurations. They are not necessarily the best
100+
# default values to use for production.
101+
102+
# Coordinator config
103+
# Version the KCL needs to operate in. For more details check the KCLv3 migration
104+
# documentation. Default is CLIENT_VERSION_CONFIG_3X
105+
# clientVersionConfig =
106+
# Configurations to control how the CoordinatorState DDB table is created
107+
# Default name is applicationName-CoordinatorState in PAY_PER_REQUEST,
108+
# with PITR and deletion protection disabled and no tags
109+
# coordinatorStateTableName =
110+
# coordinatorStateBillingMode =
111+
# coordinatorStateReadCapacity =
112+
# coordinatorStateWriteCapacity =
113+
# coordinatorStatePointInTimeRecoveryEnabled =
114+
# coordinatorStateDeletionProtectionEnabled =
115+
# coordinatorStateTags =
116+
117+
# Graceful handoff config - tuning of the shutdown behavior during lease transfers
118+
# default values are 30000 and true respectively
119+
# gracefulLeaseHandoffTimeoutMillis =
120+
# isGracefulLeaseHandoffEnabled =
121+
122+
# WorkerMetricStats table config - control how the DDB table is created
123+
# Default name is applicationName-WorkerMetricStats in PAY_PER_REQUEST,
124+
# with PITR and deletion protection disabled and no tags
125+
# workerMetricsTableName =
126+
# workerMetricsBillingMode =
127+
# workerMetricsReadCapacity =
128+
# workerMetricsWriteCapacity =
129+
# workerMetricsPointInTimeRecoveryEnabled =
130+
# workerMetricsDeletionProtectionEnabled =
131+
# workerMetricsTags =
132+
133+
# WorkerUtilizationAwareAssignment config - tune the new KCLv3 Lease balancing algorithm
134+
#
135+
# frequency of capturing worker metrics in memory. Default is 1s
136+
# inMemoryWorkerMetricsCaptureFrequencyMillis =
137+
138+
# frequency of reporting worker metric stats to storage. Default is 30s
139+
# workerMetricsReporterFreqInMillis =
140+
141+
# No. of metricStats that are persisted in WorkerMetricStats ddb table, default is 10
142+
# noOfPersistedMetricsPerWorkerMetrics =
143+
144+
# Disable use of worker metrics to balance lease, default is false.
145+
# If it is true, the algorithm balances lease based on worker's processing throughput.
146+
# disableWorkerMetrics =
147+
148+
# Max throughput per host 10 MBps, to limit processing to the given value
149+
# Default is unlimited.
150+
# maxThroughputPerHostKBps =
151+
152+
# Dampen the load that is rebalanced during lease re-balancing, default is 60%
153+
# dampeningPercentage =
154+
155+
# Configures the allowed variance range for worker utilization. The upper
156+
# limit is calculated as average * (1 + reBalanceThresholdPercentage/100).
157+
# The lower limit is average * (1 - reBalanceThresholdPercentage/100). If
158+
# any worker's utilization falls outside this range, lease re-balancing is
159+
# triggered. The re-balancing algorithm aims to bring variance within the
160+
# specified range. It also avoids thrashing by ensuring the utilization of
161+
# the worker receiving the load after re-balancing doesn't exceed the fleet
162+
# average. This might cause no re-balancing action even the utilization is
163+
# out of the variance range. The default value is 10, representing +/-10%
164+
# variance from the average value.
165+
# reBalanceThresholdPercentage =
166+
167+
# Whether at-least one lease must be taken from a high utilization worker
168+
# during re-balancing when there is no lease assigned to that worker which has
169+
# throughput is less than or equal to the minimum throughput that needs to be
170+
# moved away from that worker to bring the worker back into the allowed variance.
171+
# Default is true.
172+
# allowThroughputOvershoot =
173+
174+
# Lease assignment is performed every failoverTimeMillis but re-balance will
175+
# be attempted only once in 5 times based on the below config. Default is 3.
176+
# varianceBalancingFrequency =
177+
178+
# Alpha value used for calculating exponential moving average of worker's metricStats.
179+
# workerMetricsEMAAlpha =
180+
# Duration after which workerMetricStats entry from WorkerMetricStats table will
181+
# be cleaned up.
182+
# Duration format examples: PT15M (15 mins) PT10H (10 hours) P2D (2 days)
183+
# Refer to Duration.parse javadocs for more details
184+
# staleWorkerMetricsEntryCleanupDuration =

pom.xml

+66-19
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,12 @@
22
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
33
<modelVersion>4.0.0</modelVersion>
44
<properties>
5-
<kcl.version>2.5.8</kcl.version>
6-
<awssdk.version>2.19.16</awssdk.version>
7-
<aws-java-sdk.version>1.12.468</aws-java-sdk.version>
5+
<awssdk.version>2.25.64</awssdk.version>
6+
<kcl.version>3.0.0</kcl.version>
87
<netty.version>4.1.108.Final</netty.version>
98
<netty-reactive.version>2.0.6</netty-reactive.version>
10-
<fasterxml-jackson.version>2.14.2</fasterxml-jackson.version>
11-
<logback.version>1.3.12</logback.version>
9+
<fasterxml-jackson.version>2.13.5</fasterxml-jackson.version>
10+
<logback.version>1.3.14</logback.version>
1211
</properties>
1312
<dependencies>
1413
<dependency>
@@ -31,6 +30,18 @@
3130
<artifactId>dynamodb</artifactId>
3231
<version>${awssdk.version}</version>
3332
</dependency>
33+
<!-- https://mvnrepository.com/artifact/software.amazon.awssdk/dynamodb-enhanced -->
34+
<dependency>
35+
<groupId>software.amazon.awssdk</groupId>
36+
<artifactId>dynamodb-enhanced</artifactId>
37+
<version>${awssdk.version}</version>
38+
</dependency>
39+
<!-- https://mvnrepository.com/artifact/com.amazonaws/dynamodb-lock-client -->
40+
<dependency>
41+
<groupId>com.amazonaws</groupId>
42+
<artifactId>dynamodb-lock-client</artifactId>
43+
<version>1.3.0</version>
44+
</dependency>
3445
<dependency>
3546
<groupId>software.amazon.awssdk</groupId>
3647
<artifactId>cloudwatch</artifactId>
@@ -136,6 +147,41 @@
136147
<artifactId>apache-client</artifactId>
137148
<version>${awssdk.version}</version>
138149
</dependency>
150+
<dependency>
151+
<groupId>software.amazon.awssdk</groupId>
152+
<artifactId>arns</artifactId>
153+
<version>${awssdk.version}</version>
154+
</dependency>
155+
<dependency>
156+
<groupId>software.amazon.awssdk</groupId>
157+
<artifactId>http-auth-spi</artifactId>
158+
<version>${awssdk.version}</version>
159+
</dependency>
160+
<dependency>
161+
<groupId>software.amazon.awssdk</groupId>
162+
<artifactId>http-auth</artifactId>
163+
<version>${awssdk.version}</version>
164+
</dependency>
165+
<dependency>
166+
<groupId>software.amazon.awssdk</groupId>
167+
<artifactId>http-auth-aws</artifactId>
168+
<version>${awssdk.version}</version>
169+
</dependency>
170+
<dependency>
171+
<groupId>software.amazon.awssdk</groupId>
172+
<artifactId>checksums-spi</artifactId>
173+
<version>${awssdk.version}</version>
174+
</dependency>
175+
<dependency>
176+
<groupId>software.amazon.awssdk</groupId>
177+
<artifactId>checksums</artifactId>
178+
<version>${awssdk.version}</version>
179+
</dependency>
180+
<dependency>
181+
<groupId>software.amazon.awssdk</groupId>
182+
<artifactId>identity-spi</artifactId>
183+
<version>${awssdk.version}</version>
184+
</dependency>
139185
<dependency>
140186
<groupId>io.netty</groupId>
141187
<artifactId>netty-codec-http</artifactId>
@@ -219,7 +265,7 @@
219265
<dependency>
220266
<groupId>com.google.errorprone</groupId>
221267
<artifactId>error_prone_annotations</artifactId>
222-
<version>2.19.1</version>
268+
<version>2.7.1</version>
223269
</dependency>
224270
<dependency>
225271
<groupId>com.google.j2objc</groupId>
@@ -234,22 +280,22 @@
234280
<dependency>
235281
<groupId>com.google.protobuf</groupId>
236282
<artifactId>protobuf-java</artifactId>
237-
<version>3.23.0</version>
283+
<version>4.27.5</version>
238284
</dependency>
239285
<dependency>
240286
<groupId>org.apache.commons</groupId>
241287
<artifactId>commons-lang3</artifactId>
242-
<version>3.12.0</version>
288+
<version>3.14.0</version>
243289
</dependency>
244290
<dependency>
245291
<groupId>org.slf4j</groupId>
246292
<artifactId>slf4j-api</artifactId>
247-
<version>2.0.5</version>
293+
<version>2.0.13</version>
248294
</dependency>
249295
<dependency>
250296
<groupId>io.reactivex.rxjava3</groupId>
251297
<artifactId>rxjava</artifactId>
252-
<version>3.1.5</version>
298+
<version>3.1.8</version>
253299
</dependency>
254300
<dependency>
255301
<groupId>com.fasterxml.jackson.dataformat</groupId>
@@ -291,11 +337,6 @@
291337
<artifactId>httpcore</artifactId>
292338
<version>4.4.15</version>
293339
</dependency>
294-
<dependency>
295-
<groupId>com.amazonaws</groupId>
296-
<artifactId>aws-java-sdk-core</artifactId>
297-
<version>${aws-java-sdk.version}</version>
298-
</dependency>
299340
<dependency>
300341
<groupId>com.amazon.ion</groupId>
301342
<artifactId>ion-java</artifactId>
@@ -304,7 +345,13 @@
304345
<dependency>
305346
<groupId>software.amazon.glue</groupId>
306347
<artifactId>schema-registry-serde</artifactId>
307-
<version>1.1.13</version>
348+
<version>1.1.19</version>
349+
<exclusions>
350+
<exclusion>
351+
<groupId>com.amazonaws</groupId>
352+
<artifactId>aws-java-sdk-sts</artifactId>
353+
</exclusion>
354+
</exclusions>
308355
</dependency>
309356
<dependency>
310357
<groupId>joda-time</groupId>
@@ -329,12 +376,12 @@
329376
<dependency>
330377
<groupId>commons-io</groupId>
331378
<artifactId>commons-io</artifactId>
332-
<version>2.11.0</version>
379+
<version>2.16.1</version>
333380
</dependency>
334381
<dependency>
335382
<groupId>commons-logging</groupId>
336383
<artifactId>commons-logging</artifactId>
337-
<version>1.2</version>
384+
<version>1.1.3</version>
338385
</dependency>
339386
<dependency>
340387
<groupId>org.apache.commons</groupId>
@@ -352,4 +399,4 @@
352399
<version>3.2.2</version>
353400
</dependency>
354401
</dependencies>
355-
</project>
402+
</project>

0 commit comments

Comments
 (0)