Skip to content

ADOT Collector/instrumentation not creating X-Ray spans on ECS Fargate, NodeJS app #946

Open
@AA-morganh

Description

@AA-morganh

Hi, I'm having an issue with ADOT on ECS Fargate. I'm seeing cloudwatch logs, metrics, and container insights metrics, as well as some start-up FS spans in X-RAY, but I'm not getting any application spans in X-Ray. My auto instrumentation code is as folows:

/*instrumentation.ts*/
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-proto';
import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { AWSXRayPropagator } from '@opentelemetry/propagator-aws-xray';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { AWSXRayIdGenerator } from '@opentelemetry/id-generator-aws-xray';

if (!process.env.DISABLE_TELEMETRY) {
  // For troubleshooting, set the log level to DiagLogLevel.DEBUG
  diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO);

  const traceExporter = process.env.OTLP_COLLECTOR_TRACE_URL
    ? new OTLPTraceExporter({
        url: process.env.OTLP_COLLECTOR_TRACE_URL,
      })
    : new OTLPTraceExporter({ url: 'http://127.0.0.1:4318/v1/traces' });

  const metricReader = new PeriodicExportingMetricReader({
    exporter: process.env.OTLP_COLLECTOR_METRICS_URL
      ? new OTLPMetricExporter({
          url: process.env.OTLP_COLLECTOR_METRICS_URL,
        })
      : new OTLPMetricExporter({ url: 'http://127.0.0.1:4318/v1/metrics' }),
  });

  const spanProcessor = new BatchSpanProcessor(traceExporter);

  const sdk = new NodeSDK({
    textMapPropagator: new AWSXRayPropagator(),
    traceExporter: traceExporter,
    metricReader: metricReader,
    spanProcessor: spanProcessor,
    idGenerator: new AWSXRayIdGenerator(),
    instrumentations: [getNodeAutoInstrumentations()],
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: 'MyService',
      [SemanticResourceAttributes.SERVICE_VERSION]: '1.0',
    }),
  });

  sdk.start();

  process.on('SIGTERM', () => {
    sdk
      .shutdown()
      .then(() => console.log('Tracing and Metrics terminated'))
      .catch((error) => console.log('Error terminating tracing and metrics', error))
      .finally(() => process.exit(0));
  });
}

export default {};

My TaskDef looks like this (my CI replaces a bunch of tokens in here):

{
  "family": "myService",
  "containerDefinitions": [
    {
      "name": "myService",
      "image": "REPLACE_REPOSITORY_URI:REPLACE_IMAGE_TAG",
      "healthCheck": {
            "command": ["CMD-SHELL", "wget -q -S -O - localhost:8080/healthcheck"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "portMappings": [
        {
            "containerPort": 8080,
            "hostPort": 8080,
            "protocol": "tcp"
        }
      ],
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-myService",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      },
      "dependsOn": [{
        "containerName": "aws-otel-collector",
        "condition": "HEALTHY"
      }],
      "environment": [
                {
                  "name": "ACCOUNT_ID",
                  "value": "REPLACE_AWS_ACCOUNT_ID"
                },
                {
                  "name": "REGION",
                  "value": "REPLACE_AWS_REGION"
                },
                {
                  "name": "STAGE",
                  "value": "REPLACE_STAGE"
                },
                {
                  "name": "NO_COLOR",
                  "value": "NO_COLOR"
                },
                {
                  "name": "LatestSchema",
                  "value": "REPLACE_LATEST_SCHEMA"
                },
                {
                  "name": "JWT_SECRET",
                  "value": "REPLACE_SECRET_ARN"
                }
        ]
    },
    {
      "name": "aws-otel-collector",
      "image": "REPLACE_AWS_ACCOUNT_ID.dkr.ecr.REPLACE_AWS_REGION.amazonaws.com/ecr-public/aws-observability/aws-otel-collector:latest",
      "essential": true,
      "command": [
                "--set=service.telemetry.logs.level=DEBUG", "--config=/etc/ecs/container-insights/otel-task-metrics-config.yaml"
      ],
      "user": "0:0",
      "healthCheck": {
            "command": ["/healthcheck"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-aws-otel-sidecar-collector",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      }
    },
    {
      "name": "aws-otel-emitter",
      "image": "REPLACE_AWS_ACCOUNT_ID.dkr.ecr.REPLACE_AWS_REGION.amazonaws.com/ecr-public/aws-otel-test/aws-otel-goxray-sample-app:latest",
      "essential": false,
      "healthCheck": {
            "command": ["CMD-SHELL", "curl -f http://localhost:5000 || exit 1"],
            "interval": 5,
            "retries": 10,
            "timeout": 3
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "/ecs/REPLACE_STAGE-aws-otel-sidecar-emitter",
            "awslogs-region": "REPLACE_AWS_REGION",
            "awslogs-stream-prefix": "ecs"
        },
        "secretOptions": []
      }
    }
  ],
  "taskRoleArn": "REPLACE_TASK_ROLE_ARN",
  "executionRoleArn": "REPLACE_EXECUTION_ROLE_ARN",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

I have verbose logging enabled on both the sdk and on the collector, and I'm not seeing anything that looks suspicious to me, other than that I don't see my expected automatic or manual spans.

On a local docker-compose setup with a simple mainline otel collector I do see my spans making it to a grafana/tempo instance, so I think the instrumentation is largely set up correctly. Any guidance would be a huge help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions