|
| 1 | +--- |
| 2 | +title: 'Observability at the Edge: New OTel features in Envoy and Istio' |
| 3 | +linkTitle: New OTel features in Envoy and Istio |
| 4 | +date: 2024-06-07 |
| 5 | +author: '[Joao Grassi](https://github.com/joaopgrassi) (Dynatrace)' |
| 6 | +issue: 4534 |
| 7 | +sig: OpenTelemetry Specification |
| 8 | +cSpell:ignore: bookinfo Grassi istioctl Joao productpage |
| 9 | +--- |
| 10 | + |
| 11 | +In the dynamic world of cloud-native and distributed applications, managing |
| 12 | +microservices effectively is critical. [Kubernetes](https://kubernetes.io/) has |
| 13 | +become the de facto standard for container orchestration, enabling seamless |
| 14 | +deployment, scaling, and management of containerized applications. |
| 15 | + |
| 16 | +The distributed nature of such systems, however, adds a layer of complexity in |
| 17 | +the form of networking for in-cluster communication. Two well-known projects, |
| 18 | +Envoy and Istio, have emerged as the foundation for the smooth management and |
| 19 | +operation of such complex environments. |
| 20 | + |
| 21 | +Together, these technologies empower organizations to build scalable, resilient, |
| 22 | +and secure distributed systems. |
| 23 | + |
| 24 | +[Istio](https://istio.io/) is a service mesh, that orchestrates communication |
| 25 | +between microservices, providing features such as traffic management, security |
| 26 | +and, of course observability. Istio uses the Envoy proxy as its data plane. |
| 27 | +[Envoy](https://www.envoyproxy.io/) is a high-performance proxy, designed for |
| 28 | +single applications/services as well as a communication bus and "universal data |
| 29 | +plane" for service meshes. |
| 30 | + |
| 31 | +[Envoy](https://www.cncf.io/projects/envoy/) and |
| 32 | +[Istio](https://www.cncf.io/projects/istio/) projects are open source and part |
| 33 | +of the [Cloud Native Computing Foundation](https://www.cncf.io/). |
| 34 | + |
| 35 | +## Observability in Envoy and Istio |
| 36 | + |
| 37 | +The Envoy proxy deployed by the Istio service mesh is the perfect candidate to |
| 38 | +ensure incoming and outgoing requests are properly traced. This approach |
| 39 | +provides distributed traces of the entire service mesh, giving an overview on |
| 40 | +the communication between services — even when the applications themselves are |
| 41 | +not instrumented. |
| 42 | + |
| 43 | +> Note: At minimum, applications must be configured to propagate the |
| 44 | +> `traceparent` header. |
| 45 | +
|
| 46 | +Envoy offers several |
| 47 | +[HTTP tracers](https://www.envoyproxy.io/docs/envoy/v1.29.4/api-v3/config/trace/trace) |
| 48 | +for tracing requests, including the |
| 49 | +[OpenTelemetry tracer](https://www.envoyproxy.io/docs/envoy/v1.29.4/api-v3/config/trace/v3/opentelemetry.proto). |
| 50 | +[Tracers](/docs/concepts/signals/traces/#tracer) can be configured either |
| 51 | +directly within Envoy (when using it as a standalone component) or for all Envoy |
| 52 | +instances by using Istio. |
| 53 | + |
| 54 | +Here is an example of how Istio and Envoy work together to trace requests: |
| 55 | + |
| 56 | + |
| 57 | + |
| 58 | +## New OTel tracing features in Envoy and Istio |
| 59 | + |
| 60 | +Although Envoy already had support for exporting OpenTelemetry traces using |
| 61 | +gRPC, it lacked support for exporting using HTTP. OpenTelemetry supports both |
| 62 | +protocols as first-class citizens. In addition, other areas such as providing |
| 63 | +resource attributes and configurable sampling decisions were lagging behind the |
| 64 | +stable portions of the OpenTelemetry specification. |
| 65 | + |
| 66 | +Starting from Envoy |
| 67 | +[1.29](https://www.envoyproxy.io/docs/envoy/latest/version_history/v1.29/v1.29) |
| 68 | +and Istio |
| 69 | +[1.22](https://istio.io/latest/news/releases/1.22.x/announcing-1.22/change-notes), |
| 70 | +users have access to the new features described below. |
| 71 | + |
| 72 | +### OTLP HTTP exporter |
| 73 | + |
| 74 | +The |
| 75 | +[OpenTelemetry tracer](https://www.envoyproxy.io/docs/envoy/v1.29.4/api-v3/config/trace/v3/opentelemetry.proto) |
| 76 | +in Envoy can now be configured to export OTLP traces using HTTP. This allows it |
| 77 | +to send telemetry to observability sinks using OTLP/HTTP, directly from Envoy |
| 78 | +proxies. |
| 79 | + |
| 80 | +### Resource detectors |
| 81 | + |
| 82 | +Envoy now ships with the |
| 83 | +[Environment Resource Detector](https://www.envoyproxy.io/docs/envoy/v1.29.4/api-v3/extensions/tracers/opentelemetry/resource_detectors/v3/environment_resource_detector.proto). |
| 84 | +This resource detector follows the |
| 85 | +[OTel specification](/docs/specs/otel/resource/sdk/#specifying-resource-information-via-an-environment-variable) |
| 86 | +and allows users to further enrich the spans produced by Envoy proxies. |
| 87 | + |
| 88 | +The [resource detector feature](https://github.com/envoyproxy/envoy/pull/29547) |
| 89 | +not only added the environment detector, but also made it possible for any other |
| 90 | +resource detector to be easily added with Envoy's built-in extensions feature. |
| 91 | + |
| 92 | +### Custom samplers |
| 93 | + |
| 94 | +Another exciting feature added to Envoy is the possibility of implementing and |
| 95 | +configuring custom samplers. Envoy follows the |
| 96 | +[OTel Sampler interface](/docs/specs/otel/trace/sdk/#sampler), which makes it |
| 97 | +easy for anyone to contribute their own samplers. |
| 98 | + |
| 99 | +Envoy ships with the |
| 100 | +[Always On Sampler](https://www.envoyproxy.io/docs/envoy/v1.29.4/api-v3/extensions/tracers/opentelemetry/samplers/v3/always_on_sampler.proto) |
| 101 | +which simply forwards all spans. This base implementation can be used as a |
| 102 | +reference implementation for smarter samplers. |
| 103 | + |
| 104 | +## Demo |
| 105 | + |
| 106 | +It's time to see the new features in action! For this, we use the |
| 107 | +[Istio Bookinfo application](https://istio.io/latest/docs/examples/bookinfo/), |
| 108 | +and illustrate how to: |
| 109 | + |
| 110 | +- Deploy in Kubernetes, with Istio as service mesh |
| 111 | +- Export traces to [Jaeger](https://www.jaegertracing.io/) using HTTP |
| 112 | + |
| 113 | +### Install Jaeger |
| 114 | + |
| 115 | +First, start by installing the |
| 116 | +[Jaeger operator](https://www.jaegertracing.io/docs/1.57/operator/): |
| 117 | + |
| 118 | +```shell |
| 119 | +kubectl create namespace observability |
| 120 | +kubectl create -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.57.0/jaeger-operator.yaml -n observability |
| 121 | +``` |
| 122 | + |
| 123 | +Then deploy Jaeger `all-in-one`: |
| 124 | + |
| 125 | +```shell |
| 126 | +kubectl apply -f - <<EOF |
| 127 | +apiVersion: jaegertracing.io/v1 |
| 128 | +kind: Jaeger |
| 129 | +metadata: |
| 130 | + name: simplest |
| 131 | +EOF |
| 132 | +``` |
| 133 | + |
| 134 | +### Install and configure Istio |
| 135 | + |
| 136 | +Next, install Istio using |
| 137 | +[`istioctl`](https://istio.io/latest/docs/setup/install/istioctl/): |
| 138 | + |
| 139 | +```shell |
| 140 | +cat <<EOF | istioctl install -y -f - |
| 141 | +apiVersion: install.istio.io/v1alpha1 |
| 142 | +kind: IstioOperator |
| 143 | +spec: |
| 144 | + meshConfig: |
| 145 | + enableTracing: true |
| 146 | + extensionProviders: |
| 147 | + - name: otel-tracing |
| 148 | + opentelemetry: |
| 149 | + port: 4318 |
| 150 | + service: simplest-collector.default.svc.cluster.local |
| 151 | + http: |
| 152 | + path: "/v1/traces" |
| 153 | + timeout: 5s |
| 154 | + resource_detectors: |
| 155 | + environment: {} |
| 156 | +EOF |
| 157 | +``` |
| 158 | + |
| 159 | +This installs Istio and configures the OpenTelemetry tracing provider to use the |
| 160 | +`http` exporter over OTLP/HTTP, with the Jaeger collector as endpoint. This |
| 161 | +configuration also enables the environment resource detector in |
| 162 | +`resource_detectors`. |
| 163 | + |
| 164 | +Next, we need to enable the tracer using Istio's |
| 165 | +[Telemetry API](https://istio.io/latest/docs/tasks/observability/telemetry/): |
| 166 | + |
| 167 | +```shell |
| 168 | +kubectl apply -f - <<EOF |
| 169 | +apiVersion: telemetry.istio.io/v1alpha1 |
| 170 | +kind: Telemetry |
| 171 | +metadata: |
| 172 | + name: otel-demo |
| 173 | +spec: |
| 174 | + tracing: |
| 175 | + - providers: |
| 176 | + - name: otel-tracing |
| 177 | + randomSamplingPercentage: 100 |
| 178 | +EOF |
| 179 | +``` |
| 180 | + |
| 181 | +And finally, we configure the `OTEL_RESOURCE_ATTRIBUTES` environment variable |
| 182 | +for the Envoy proxies: |
| 183 | + |
| 184 | +```shell |
| 185 | +cat <<EOF | k apply -f - |
| 186 | +apiVersion: networking.istio.io/v1beta1 |
| 187 | +kind: ProxyConfig |
| 188 | +metadata: |
| 189 | + name: my-proxyconfig |
| 190 | + namespace: istio-system |
| 191 | +spec: |
| 192 | + concurrency: 0 |
| 193 | + environmentVariables: |
| 194 | + OTEL_RESOURCE_ATTRIBUTES: "host.name=abc-123" |
| 195 | +EOF |
| 196 | +``` |
| 197 | + |
| 198 | +### Deploy the application |
| 199 | + |
| 200 | +The final step is to deploy the bookinfo application |
| 201 | +([bookinfo.yaml](https://raw.githubusercontent.com/istio/istio/release-1.22/samples/bookinfo/platform/kube/bookinfo.yaml)): |
| 202 | + |
| 203 | +```shell |
| 204 | +kubectl label namespace default istio-injection=enabled |
| 205 | +kubectl apply -f bookinfo.yaml |
| 206 | +``` |
| 207 | + |
| 208 | +### Test it out |
| 209 | + |
| 210 | +To test your setup, make some requests to one of the services, for example: |
| 211 | + |
| 212 | +```shell |
| 213 | +kubectl exec "$(k get pod -l app=ratings -o jsonpath='{.items[0].metadata.name}')" -c ratings -- curl -sS productpage:9080/productpage | grep -o "<title>.*</title>" |
| 214 | +``` |
| 215 | + |
| 216 | +Then you can check it out on the Jaeger UI -- you should see some traces! |
| 217 | + |
| 218 | + |
| 219 | + |
| 220 | +From the spans produced by Envoy you can see (in order): |
| 221 | + |
| 222 | +1. Outgoing (egress) call from the `ratings` service to the `productpage` |
| 223 | + service. |
| 224 | +2. Incoming (ingress) call in the `productpage` service. |
| 225 | +3. `host-name` resource attribute we applied using the |
| 226 | + `OTEL_RESOURCE_ATTRIBUTES`. This attribute was picked up by the environment |
| 227 | + resource detector and added to all spans Envoy created. |
| 228 | + |
| 229 | +You can also see all the other downstream calls made, as all services have the |
| 230 | +Envoy sidecar injected by Istio. You have full observability of the calls |
| 231 | +between services, just by enabling the OTel tracer in Envoy! |
| 232 | + |
| 233 | +## Next steps and closing |
| 234 | + |
| 235 | +With the new features described in this post, users gain more flexibility in |
| 236 | +exporting their traces. They can enrich their data with resource attributes and |
| 237 | +establish the groundwork for more intelligent sampling techniques to be added in |
| 238 | +the future. |
| 239 | + |
| 240 | +The new features also unlock interesting use cases for other parties in the |
| 241 | +observability space, including cloud providers and observability vendors. With |
| 242 | +the resource detector and sampler APIs now available in Envoy, anyone can build |
| 243 | +support for custom samplers and detectors, enhancing the usefulness of the |
| 244 | +telemetry data generated by Envoy. |
| 245 | + |
| 246 | +Another exciting next step for Envoy and OpenTelemetry is the adoption of the |
| 247 | +now-stable |
| 248 | +[HTTP semantic conventions in Envoy](https://github.com/envoyproxy/envoy/issues/30821). |
| 249 | +This will align Envoy with all OTel SDKs that are also producing the spans |
| 250 | +following the stable HTTP semantic conventions. |
| 251 | + |
| 252 | +Collaborating with the Envoy and Istio community to bring more OTel features to |
| 253 | +these projects has been a great experience. The eagerness to adopt and the |
| 254 | +strong collaboration between OpenTelemetry and relevant CNCF projects, such as |
| 255 | +Istio and Envoy, helps solidify OpenTelemetry's position as the de facto |
| 256 | +standard for observability. |
0 commit comments