You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[exporter/loadbalancing] feat(lb): Introduce the ability to load balance on composite keys in lb (#36567)
Right now, there's a problem at high throughput using the load balancer
and the `service.name` resource attribute: The load balancers themself
get slow. While it's possible to vertically scale them to a point (e.g.
about 100k req/sec), as they get slow they star tot back up traffic and
block on requests. Applications then can't write as many spans out, and
start dropping spans.
This commit seeks to address that by extending the load balancing
collector to allow create a composite from attributes that can still
keep the load balancing decision "consistent enough" to reduce
cardinallity, but still spread the load across ${N} collectors.
It doesn't make too many assumptions about how the operators will use
this, except that the underlying data (the spans) are unlikely to be
complete in all cases, and the key generation is "best effort". This is
a deviation from the existing design, in which hard-requires
"span.name".
== Design Notes
=== Contributor Skill
As a contributor, I'm very much new to the opentelemetry collector, and
do not anticipate I will be contributing much except for as needs
require to tune the collectors that I am responsible for. Given this,
the code may violate certain assumptions that are otherwise "well
known".
=== Required Knowledge
The biggest surprise in this code was that despite accepting a slice,
the routingIdentifierFromTraces function assumes spans have been
processed with the batchpersignal.SplitTraces() function, which appears
to ensure taht each "trace" contains only a single span (thus allowing
them to be multiplexed effectively)
This allows the function to be simplified quite substantially.
=== Use case
The primary use case I am thinking about when writing this work is
calculating metrics in the spanmetricsconnector component. Essentially,
services drive far too much traffic for a single collector instance to
handle, so we need to multiplex it in a way that still allows them to be
calculated in a single place (limiting cardinality) but also, spreads
the load across ${N} collectors.
=== Traces only implementation
This commit addreses this only for traces, as I only care about traces.
The logic can likely be extended easily, however.
Fixes#35320Fixes#33660
---------
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
Co-authored-by: Andrew Howden <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Co-authored-by: Antoine Toulme <[email protected]>
Copy file name to clipboardExpand all lines: exporter/loadbalancingexporter/README.md
+3-1
Original file line number
Diff line number
Diff line change
@@ -114,11 +114,13 @@ Refer to [config.yaml](./testdata/config.yaml) for detailed examples on using th
114
114
* This resolver currently returns a maximum of 100 hosts.
115
115
*`TODO`: Feature request [29771](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/29771) aims to cover the pagination for this scenario
116
116
* The `routing_key` property is used to specify how to route values (spans or metrics) to exporters based on different parameters. This functionality is currently enabled only for `trace` and `metric` pipeline types. It supports one of the following values:
117
-
*`service`: Routes values based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate.
117
+
*`service`: Routes values based on their service name. This is useful when using processors like the span metrics, so all spans for each service are sent to consistent collector instances for metric collection. Otherwise, metrics for the same services are sent to different collectors, making aggregations inaccurate. In addition to resource / span attributes, `span.kind`, `span.name` (the top level properties of a span) are also supported.
118
+
*`attributes`: Routes based on values in the attributes of the traces. This is similar to service, but useful for situations in which a single service overwhelms any given instance of the collector, and should be split over multiple collectors.
118
119
*`traceID`: Routes spans based on their `traceID`. Invalid for metrics.
119
120
*`metric`: Routes metrics based on their metric name. Invalid for spans.
120
121
*`streamID`: Routes metrics based on their datapoint streamID. That's the unique hash of all it's attributes, plus the attributes and identifying information of its resource, scope, and metric data
121
122
* loadbalancing exporter supports set of standard [queuing, retry and timeout settings](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md), but they are disable by default to maintain compatibility
123
+
* The `routing_attributes` property is used to list the attributes that should be used if the `routing_key` is `attributes`.
0 commit comments