OpenTelemetry vs Datadog: Cost & Architecture (2026)

Two Philosophies for the Same Problem

Every engineering team eventually hits the same wall: the system is too complex to debug by reading logs on individual machines. You need observability -- metrics, traces, and logs correlated across services. At that point, you face a fundamental choice. OpenTelemetry (OTel) is a vendor-neutral instrumentation framework that gives you full control over your telemetry pipeline. Datadog is a fully managed observability platform that handles collection, storage, querying, and alerting under one roof. They are not direct competitors -- one is plumbing, the other is the entire house -- but the choice between them shapes your architecture, your costs, and your vendor independence for years.

This guide breaks down both approaches with real cost numbers, instrumentation comparisons, and a practical migration path. No marketing fluff -- just the tradeoffs as they play out in production.

What Are OpenTelemetry and Datadog?

Definition: OpenTelemetry is an open-source, vendor-neutral observability framework (CNCF project) that provides APIs, SDKs, and a Collector for generating, processing, and exporting telemetry data -- traces, metrics, and logs. It standardizes instrumentation but does not store or visualize data. Datadog is a commercial SaaS observability platform that provides its own agents, integrations, storage backend, dashboards, alerting, and APM -- a complete managed solution from instrumentation to incident response.

The key distinction: OTel handles data collection. Datadog handles collection and everything after it. You can use OTel to feed data into Datadog, use Datadog's own agent exclusively, or build a fully open-source stack with OTel plus Prometheus, Grafana, and Tempo. Understanding this layering is the first step toward making a sound decision.

Architecture Comparison

The architectural differences between these two approaches affect deployment, operations, and long-term flexibility.

Aspect	OpenTelemetry + OSS Stack	Datadog
Instrumentation	OTel SDKs (vendor-neutral APIs)	dd-trace libraries (proprietary) or OTel SDKs
Collection	OTel Collector (self-managed)	Datadog Agent (self-managed or serverless)
Storage	Prometheus, Tempo, Loki, ClickHouse (self-managed or cloud)	Datadog-managed (fully hosted)
Visualization	Grafana (self-hosted or Grafana Cloud)	Datadog dashboards (built-in)
Alerting	Alertmanager, Grafana Alerting	Datadog Monitors (built-in)
Data format	OTLP (open standard)	Proprietary + OTLP ingestion support
Operational burden	High -- you run the infrastructure	Low -- Datadog manages it

Total Cost of Ownership at Three Scales

Cost is where this decision gets concrete. I've modeled TCO at three scales based on real-world deployments, including infrastructure, licensing, and engineering time to operate the stack. These numbers assume a containerized environment on AWS with average telemetry volume per service.

10-Service Startup

Cost Component	OTel + Grafana Cloud	Datadog Pro
Platform/licensing	$0 (free tier covers it)	~$690/mo (23 hosts x $15 infra + APM)
Infrastructure (Collector, storage)	~$150/mo (small Collector + Grafana free tier)	$0 (Datadog-managed)
Engineering time (setup + maintenance)	~40 hours initial, 4 hrs/mo ongoing	~8 hours initial, 1 hr/mo ongoing
Estimated monthly TCO	~$400-600	~$700-900

At this scale, Datadog is competitive. The engineering time savings nearly offset the licensing cost, and you get a polished experience from day one. For a startup with limited ops capacity, Datadog often wins here.

50-Service Mid-Stage Company

Cost Component	OTel + Grafana Stack	Datadog Pro
Platform/licensing	~$800/mo (Grafana Cloud Pro)	~$5,500/mo (hosts + APM + log ingestion)
Infrastructure	~$600/mo (Collector cluster, storage)	$0
Engineering time	~80 hours initial, 12 hrs/mo ongoing	~20 hours initial, 4 hrs/mo ongoing
Estimated monthly TCO	~$2,500-3,500	~$6,000-8,000

At 50 services, OTel + Grafana starts pulling ahead significantly. The engineering overhead is real but manageable for a team that has a dedicated platform or SRE function. The cost delta funds a significant portion of an SRE salary.

200-Service Enterprise

Cost Component	OTel + Grafana Stack	Datadog Enterprise
Platform/licensing	~$4,000/mo (Grafana Cloud Advanced)	~$50,000+/mo (hosts + APM + logs + custom metrics)
Infrastructure	~$3,000/mo (HA Collector, object storage)	$0
Engineering time	1-2 dedicated SREs	0.5 SRE for agent management
Estimated monthly TCO	~$12,000-18,000	~$50,000-80,000

At enterprise scale, the gap is dramatic. Datadog's per-host pricing model compounds relentlessly. Custom metrics pricing alone can add five figures monthly. This is where large organizations either negotiate aggressively with Datadog or migrate to an OTel-based stack.

Instrumentation: OTel SDKs vs. dd-trace

Both approaches offer auto-instrumentation for common frameworks and manual instrumentation APIs for custom business logic. Here is how they compare in a Node.js application.

OpenTelemetry Instrumentation

// tracing.ts -- loaded before application code
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const sdk = new NodeSDK({
  serviceName: 'payment-service',
  traceExporter: new OTLPTraceExporter({
    url: 'http://otel-collector:4318/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-fs': { enabled: false },
    }),
  ],
});

sdk.start();

Datadog dd-trace Instrumentation

// tracing.ts -- loaded before application code
import tracer from 'dd-trace';

tracer.init({
  service: 'payment-service',
  env: 'production',
  version: '1.4.2',
  logInjection: true,
  runtimeMetrics: true,
  profiling: true,
});

The Datadog setup is undeniably simpler. Fewer packages, less configuration, and features like profiling and runtime metrics are built in. OTel requires more explicit configuration but gives you portability -- that same instrumentation code works with Jaeger, Tempo, Honeycomb, or any OTLP-compatible backend.

The OTel Collector: Your Telemetry Pipeline

The OpenTelemetry Collector is the architectural component that makes OTel powerful. It sits between your services and your backends, acting as a vendor-neutral telemetry router that can process, filter, sample, enrich, and fan out data to multiple destinations simultaneously.

# otel-collector-config.yaml
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 5s
    send_batch_size: 2048
  memory_limiter:
    check_interval: 1s
    limit_mib: 1024
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: errors-always
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-requests
        type: latency
        latency: { threshold_ms: 2000 }
      - name: baseline
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }
  attributes:
    actions:
      - key: deployment.environment
        value: production
        action: upsert

exporters:
  otlphttp/grafana:
    endpoint: https://otlp-gateway-prod-us-east.grafana.net/otlp
    headers:
      Authorization: "Basic ${GRAFANA_OTLP_TOKEN}"
  datadog:
    api:
      key: ${DD_API_KEY}
      site: datadoghq.com

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, batch, attributes]
      exporters: [otlphttp/grafana, datadog]
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp/grafana]

This configuration demonstrates the Collector's killer feature: multi-destination export. You can send traces to both Grafana Cloud and Datadog simultaneously, making migration incremental rather than all-or-nothing. The tail sampling processor keeps 100% of errors and slow traces while sampling 5% of routine traffic, drastically reducing storage costs.

Grafana Flexibility vs. Datadog Polish

On the visualization side, the tradeoff is customizability versus out-of-the-box experience.

Feature	Grafana	Datadog
Dashboard building	Extremely flexible, any data source	Polished templates, guided setup
Data sources	100+ plugins (Prometheus, Loki, Tempo, Postgres, etc.)	Datadog metrics/traces/logs only
Alerting	Multi-source, Alertmanager or Grafana-native	Integrated monitors with anomaly detection
Trace-to-log correlation	Manual config (Tempo + Loki linking)	Automatic, zero config
APM service map	Requires Tempo + service graph connector	Built-in, auto-generated
Learning curve	Steeper -- PromQL, LogQL, TraceQL	Lower -- unified query interface
Notebooks/collaboration	Basic annotations	Full notebooks, incident timelines

Datadog's strength is correlation. Click a spike on a metric dashboard, pivot to traces for that time window, drill into a specific trace, jump to the associated logs -- all without leaving the platform. Grafana can do this too with Tempo, Loki, and Prometheus, but the linking requires configuration and the experience is less seamless. For teams that value speed-to-insight during incidents, Datadog's polish is real.

Vendor Lock-In: The Hidden Cost

Vendor lock-in is the argument most cited for OTel, and it deserves a nuanced discussion rather than hand-waving.

Datadog lock-in is real and multifaceted:

Instrumentation lock-in: dd-trace libraries use proprietary span formats and tags. Migrating means re-instrumenting every service.
Dashboard lock-in: Datadog dashboards, monitors, and SLOs are defined in Datadog's proprietary format. They cannot be exported to Grafana or any other tool.
Custom metrics lock-in: DogStatsD metric naming conventions differ from Prometheus/OTel conventions. Migration requires renaming and re-alerting.
Workflow lock-in: Incident management, runbooks, and on-call workflows built in Datadog must be rebuilt elsewhere.

OTel avoids instrumentation lock-in by design:

OTLP is an open standard supported by every major backend.
Switching from Tempo to Honeycomb means changing one exporter config in the Collector.
Your application code never changes when you swap backends.
Grafana dashboards can be version-controlled as JSON and migrated between instances.

That said, OTel does not eliminate all lock-in. If you build heavily on Grafana Cloud's specific features (Adaptive Metrics, for example), you carry some platform dependency. The difference is that the instrumentation layer -- the part that touches every service -- remains portable.

The Hybrid Approach: OTel Instrumentation with Datadog Backend

You don't have to choose one or the other at every layer. The most pragmatic approach for many teams is a hybrid: instrument with OpenTelemetry, send to Datadog.

# Hybrid: OTel Collector sending to Datadog
exporters:
  datadog:
    api:
      key: ${DD_API_KEY}
      site: datadoghq.com
    traces:
      span_name_as_resource_name: true
    metrics:
      resource_attributes_as_tags: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [datadog]

This gives you Datadog's dashboards, APM, and alerting while keeping your instrumentation vendor-neutral. If you later decide to move off Datadog, you change the Collector's exporter -- not your application code. Datadog supports OTLP ingestion natively, so compatibility is solid.

Caveats of the hybrid approach:

Some Datadog-specific features (Continuous Profiler, Error Tracking deep integration) work better with dd-trace.
OTel metric naming conventions may not map perfectly to Datadog's expectations. Test your dashboards.
You still pay Datadog's pricing -- the hybrid approach saves you from instrumentation lock-in, not from licensing costs.

Migration Guide: Datadog to OTel + Grafana

If you're moving off Datadog, here is the phased approach that minimizes risk:

Phase 1 -- Deploy the OTel Collector alongside the Datadog Agent. Configure it to receive OTLP and export to both Datadog and your target backend (e.g., Grafana Cloud). This lets you validate data parity without disrupting existing dashboards.
Phase 2 -- Migrate instrumentation service by service. Replace dd-trace with OTel SDKs in non-critical services first. Verify traces and metrics appear correctly in both backends. Use feature flags to toggle between instrumentation libraries during the transition.
Phase 3 -- Rebuild dashboards and alerts. Recreate your most critical Datadog dashboards in Grafana. Start with SLO dashboards and on-call views. This is the most time-consuming step -- budget 2-4 weeks for a 50-service deployment.
Phase 4 -- Cut over and decommission. Once all services emit OTel telemetry and all critical dashboards exist in Grafana, remove the Datadog exporter from the Collector and cancel the contract. Keep Datadog read-only access for 30 days to handle any gaps.

Migration reality check: Plan for 3-6 months for a 50+ service deployment. The instrumentation swap is the easy part. Rebuilding institutional knowledge embedded in Datadog dashboards, monitors, and runbooks takes longer than anyone estimates. Do not underestimate phase 3.

Decision Framework

Use this framework to decide which approach fits your team:

Choose	When
Datadog	Small team (fewer than 5 engineers), fewer than 20 services, no dedicated SRE, need observability fast, budget is not the primary constraint
OTel + Grafana	Platform/SRE team available, 30+ services, cost-sensitive, multi-cloud or hybrid environments, vendor independence is a strategic priority
Hybrid (OTel + Datadog)	Currently on Datadog and want to reduce future lock-in, planning eventual migration, need Datadog features today but want portable instrumentation

Frequently Asked Questions

Can I use OpenTelemetry with Datadog?

Yes. Datadog natively supports OTLP ingestion for traces and metrics. You instrument with OTel SDKs, send data to the OTel Collector, and export to Datadog's OTLP endpoint. This gives you vendor-neutral instrumentation while using Datadog's platform. Some Datadog-specific features like Continuous Profiler work best with dd-trace, but core APM, dashboards, and alerting work well with OTel-sourced data.

Is OpenTelemetry really free?

The software is free and open source. The infrastructure to run it is not. You need compute for the OTel Collector (typically 2-4 vCPUs and 4-8 GB RAM for a mid-size deployment), a storage backend (Prometheus, Tempo, Loki -- either self-hosted or via Grafana Cloud), and engineering time to operate the pipeline. For small deployments, Grafana Cloud's free tier covers basic needs. At scale, the infrastructure and engineering costs are real but consistently lower than Datadog licensing.

What does Datadog cost for 100 hosts?

Datadog Pro pricing for 100 hosts with Infrastructure Monitoring ($15/host), APM ($31/host), and Log Management (estimated 100 GB/day at $0.10/GB) runs approximately $15,000-20,000 per month before custom metrics, Synthetics, or other add-ons. Enterprise pricing includes additional features at higher per-host rates. Custom metric pricing ($0.05 per custom metric per host) is the cost that surprises most teams. Negotiate annual contracts for 20-40% discounts on list price.

How does tail sampling in the OTel Collector reduce costs?

Tail sampling evaluates complete traces before deciding whether to store them. You configure policies to keep 100% of error traces and slow traces (which you always want for debugging) while sampling a small percentage (e.g., 5-10%) of successful, fast traces. This typically reduces trace storage volume by 80-95% with minimal loss of debugging capability. The OTel Collector's tail_sampling processor handles this natively. Datadog offers similar ingestion controls, but since you pay per indexed span, the savings mechanism differs.

How long does it take to migrate from Datadog to OpenTelemetry?

For a 10-service deployment, expect 4-6 weeks. For 50+ services, plan 3-6 months. The instrumentation swap (replacing dd-trace with OTel SDKs) is straightforward -- typically a day per service. The bottleneck is rebuilding dashboards, alerts, SLOs, and operational runbooks in the new stack. Parallel-run both systems during migration to validate data parity. The OTel Collector's multi-exporter capability makes this dual-write pattern easy.

Does Datadog support OpenTelemetry natively?

Datadog added native OTLP ingestion in 2023 and has steadily improved compatibility. The Datadog Agent can act as an OTLP receiver, and Datadog's backend maps OTel spans and metrics to its internal data model. However, some translations are imperfect -- OTel resource attributes may not map cleanly to Datadog tags, and metric naming conventions differ. Test your specific use cases. The Datadog exporter in the OTel Collector (contrib distribution) provides the best compatibility.

When should I avoid OpenTelemetry?

Avoid building an OTel-based stack if you have no platform engineering capacity, fewer than 10 services, or need production-ready observability within days rather than weeks. OTel's flexibility comes with operational complexity -- running the Collector at high availability, managing storage backends, configuring Grafana datasources, and troubleshooting pipeline issues all require engineering investment. If your team's strength is product development and you have budget for Datadog, the managed platform may be the right tradeoff.

Conclusion

OpenTelemetry and Datadog are not interchangeable alternatives -- they operate at different layers of the observability stack. OTel is an instrumentation standard and telemetry pipeline. Datadog is a complete managed platform. The right choice depends on your team size, service count, budget constraints, and how much operational complexity you're willing to absorb.

For most teams, the answer evolves over time. Start with Datadog if you need observability fast and have the budget. Instrument with OTel from day one if you can, using the hybrid approach to keep your options open. As you grow past 30-50 services, reassess -- the cost gap between Datadog and an OTel-based stack widens with every host you add, and that savings compounds month after month.

OpenTelemetry vs Datadog: Open Standard or Managed Platform?