Over the past few years, our platform engineering team at ChistaDATA has been rethinking how we approach observability infrastructure at scale. The conversation keeps circling back to the same frustration: most organizations are sitting on enormous volumes of telemetry data that they technically “own” but can’t fully access, can’t economically store long-term, and can’t feed into the analytical pipelines they actually need. The vendor lock-in isn’t at the instrumentation layer anymore — it’s buried deep in the storage layer, where data quietly accumulates in proprietary formats until a renewal conversation makes the true cost visible.
This post is our attempt to lay out the architecture we believe represents the practical next step for observability: a fully open, SQL-queryable telemetry stack built on ClickHouse, Apache Iceberg, OpenTelemetry, and S3 object storage. We’ve been stress-testing pieces of this in our own environment, and we think it’s time to share what we’ve learned — including where the rough edges still are. With the rise of clickhouse observability, organizations can harness the power of telemetry data like never before.
Why “Open” Has to Mean More Than Open Instrumentation
When most engineers talk about vendor-neutral observability, they usually mean OpenTelemetry SDKs. And yes, the OTLP specification and the ecosystem of instrumentation libraries it spawned are a genuine step forward. You can instrument your Go service, your Python microservice, your Java monolith all with the same API, and in theory swap the backend without touching your application code. That’s real value.
But here’s where the conversation tends to stall: open instrumentation without open storage is like being handed a portable container that you’re not allowed to move. Your code artifacts are neutral, your pipeline might even be neutral, but your telemetry data ultimately ends up locked inside one vendor’s columnar store, indexed in a proprietary format, and accessible only through that vendor’s query interface. The moment you want to run a long-range correlation, train a predictive model on historical data, or simply keep more than 90 days of traces without paying per-GB egress fees, you hit a wall.
There are actually three distinct layers where openness matters in an observability stack:
Open instrumentation gives you portable code and vendor-neutral SDKs. You can point your OTLP exporter at any compatible endpoint without rewriting your application. This is the layer we’ve had the longest, and it works well.
Open pipelines mean your telemetry can flow through interoperable components — the OpenTelemetry Collector being the canonical example. You can fan out to multiple backends, run bake-offs between vendors, do filtering, sampling, and enrichment at the pipeline layer without vendor-specific agents. This layer has matured significantly in the last two years.
Open storage is the missing piece. If your telemetry lives in an open, queryable format that any tool can read — without egress fees, without API rate limits, without the permission of the original vendor — you gain something qualitatively different: the ability to use your operational data as a first-class analytical asset. You can query it with ClickHouse for subsecond aggregations, attach Apache Spark for batch ML jobs, feed it to an MCP server for agent-driven analysis, or export it to a downstream data warehouse. All from the same underlying files on S3.
Understanding ClickHouse Observability in Modern Infrastructure
The Economics of Telemetry at Scale Are Broken
Let’s be concrete about why this matters beyond architectural principle. At real production scale — and we’re talking organizations handling millions of spans per minute, petabytes of logs per day — the storage costs on block storage like EBS become genuinely painful. The difference between NVMe-backed storage and S3 object storage is roughly a 10–12x cost gap per GB, and that gap compounds as your data retention requirements grow.
The observability market solved this problem for a while by aggressive data sampling and short retention windows. But sampling trades accuracy for economics, and short retention means you lose the historical depth needed for capacity planning, SLA trend analysis, and post-incident forensics. The industry has been stuck in this tradeoff partly because it assumed telemetry data had to live in fast, expensive storage to be queryable in real time.
That assumption is no longer technically correct. Apache Parquet‘s columnar encoding combined with Iceberg’s metadata layer makes it possible to run analytical queries directly against S3-resident data with performance characteristics that are acceptable for observability use cases — and ClickHouse’s native Iceberg support means you can query that data using the same SQL you’d use against a local MergeTree table. We use ClickHouse extensively at ChistaDATA and have documented several ClickHouse query optimization strategies that apply directly to this kind of telemetry workload.
How the Stack Actually Works
The architecture we’re converging on combines five components, each of which is independently open source and substitutable:
OpenTelemetry Collector sits at the ingestion edge. It accepts OTLP over gRPC or HTTP, handles batching (critical — you do not want to write millions of tiny Parquet files), applies any sampling or filtering rules, and routes data downstream. The batch processor is your friend here; tune it to produce reasonably sized batches that balance write latency against file count.
Apache Parquet is the on-disk format. Think of it as the columnar equivalent of what JSON is for row-oriented data — a widely readable, self-describing file format that stores all values for a given column contiguously. This makes aggregation queries dramatically faster because you only read the columns you actually need. Parquet files land on S3, not on block storage, which is where the economics start to shift in your favor.
Apache Iceberg provides the metadata layer that turns a directory of Parquet files into a queryable table. Without Iceberg, querying S3 would mean scanning file listings and guessing schemas. With Iceberg, you get ACID-compliant snapshots, partition pruning, schema evolution, and time-travel queries. An Iceberg catalog — whether you run Apache Polaris, LakeKeeper, or a cloud-native equivalent — tracks exactly which files belong to which snapshot, so your query engine only reads what it needs. For teams already familiar with ClickHouse’s MergeTree engine, think of Iceberg as providing the equivalent of data-skipping indexes but for files on object storage.
ClickHouse is the query engine. It speaks SQL, it has excellent support for reading Iceberg tables natively, and its columnar architecture means it can tear through billions of log rows in seconds when the query is structured correctly. For teams already running ClickHouse for analytics, adding observability data to the same cluster is operationally straightforward. We’ve written about building observability pipelines with ClickHouse before, and the patterns translate well to this lakehouse model.
S3 object storage closes the loop. All Parquet files live here. In a cloud environment this means AWS S3 or a compatible endpoint; on-premises teams can use MinIO or RustFS. The key insight is that S3 is not just a backup tier — it is the primary storage layer. ClickHouse reads from it directly via the Iceberg catalog, and since S3 is essentially infinitely scalable, you stop worrying about disk provisioning entirely.
Hot Data vs. Cold Data: The Hybrid Table Model
There’s one scenario this architecture doesn’t handle cleanly out of the box: sub-second query latency on data that arrived in the last few seconds. Object storage, even with Iceberg metadata, has higher latency than local NVMe. For incident response and live dashboards, that gap matters.
The solution — still maturing, but already viable — is a hybrid table approach that maintains a ClickHouse MergeTree partition for recent data (the last hour, the last day, whatever your SLA requires) and automatically tiers older data to Iceberg-backed S3. Queries span both tiers transparently; the user writes one SQL statement and gets results that include real-time data from MergeTree and historical data from Iceberg without knowing the difference.
This is architecturally similar to what we already do with ClickHouse TTL policies and tiered storage, which we’ve covered in our post on ClickHouse disk and memory management. The difference with hybrid tables is that the cold tier is queryable via SQL rather than just archived — you don’t lose query access when data ages out of hot storage.
The tradeoff is operational complexity: you’re now managing two storage backends with different performance profiles, and your write path needs to handle the periodic commit cycle that Iceberg’s catalog requires for new file registration. In practice, we’ve found that a 60-second commit interval is workable for most observability use cases, since you’re rarely alerting on data that’s under a minute old.
Apache Arrow and What Comes Next
One component that deserves more attention than it typically gets in observability discussions is Apache Arrow. Arrow defines both an in-memory columnar format and a wire protocol (ADBC) for transmitting columnar data between systems. The relevance to this stack is twofold.
First, Arrow eliminates the row-to-column serialization overhead that makes ODBC and JDBC painful for analytical workloads. When ClickHouse sends query results to a downstream tool via ADBC, the data stays columnar throughout the entire transfer, which translates to meaningfully lower CPU overhead and higher throughput on large result sets.
Second, the OpenTelemetry Arrow project is building a bridge between the OTLP pipeline layer and the Arrow/Parquet/Iceberg storage layer. The current state allows OpenTelemetry Collectors to communicate with each other using Arrow’s compressed columnar format — particularly valuable for cross-region data transfer, where reducing wire size translates directly to lower transit costs. The roadmap includes the ability for the OTel Collector to write directly to Iceberg-formatted Parquet files, which would remove several intermediate components from the stack and make the architecture meaningfully simpler to operate.
When that capability lands, your observability pipeline could look like: application emits OTLP → OTel Collector batches and writes Arrow-encoded Parquet to S3 → Iceberg catalog commits the new snapshot → ClickHouse queries the data. That’s a lean four-component stack with no proprietary elements and no vendor-specific agents.
PromQL, SQL, and the Query Language Question
A fair objection to a ClickHouse-centric observability stack is the query language question. SRE teams are often heavily invested in PromQL — Grafana dashboards, alerting rules, runbooks. Switching to SQL isn’t free, and not every team will want to.
There are two honest responses here. First, ClickHouse’s PromQL compatibility is now in a functional beta state, meaning you can point a Prometheus-compatible client at ClickHouse and run PromQL queries against it. It’s not perfect, but for teams with existing PromQL investments it significantly lowers the switching cost. Second, the Iceberg storage layer is query-engine-agnostic. You can attach Apache Spark, DuckDB, Trino, or any other engine that supports the Iceberg spec to the same catalog and run whatever query language that engine speaks. The data doesn’t care — it’s just Parquet files.
This is actually one of the strongest architectural arguments for the open lakehouse model: it decouples the storage decision from the query tool decision. Your SRE team can keep using PromQL against VictoriaMetrics while your data engineering team runs analytical SQL against ClickHouse, and both are reading from the same underlying telemetry data on S3.
Operational Realities: What We’ve Learned Running This
A few lessons from actually operating this kind of stack rather than just designing it on paper.
Batching configuration is more important than most people realize. The OpenTelemetry Collector’s batch processor needs to be tuned per workload. Too small and you produce thousands of tiny Parquet files that make Iceberg’s file-listing operations slow. Too large and your commit latency increases. For a moderate-volume service emitting a few thousand spans per second, we’ve found batch sizes in the range of 5,000–10,000 rows with a 30–60 second flush interval produce well-sized files without significant query overhead.
Schema consistency across services is a chronic pain point. OpenTelemetry’s semantic conventions help, but adoption is uneven in practice. Services that follow semantic conventions produce data that’s immediately queryable and joinable. Services that don’t tend to dump everything into catch-all attribute maps that require preprocessing before they’re analytically useful. This is a governance problem as much as a technical one, and it’s worth investing in tooling that validates telemetry quality at the pipeline layer before data hits storage.
ClickHouse’s MergeTree engine performance on log data is excellent when the sorting key is well-chosen. For typical observability workloads — filtering by service name, time range, trace ID, severity level — a sorting key of (service_name, timestamp) or (timestamp, service_name) depending on your query patterns gives you significant data-skipping benefits. We’ve covered the mechanics of ClickHouse indexing in depth in our guide to ClickHouse parts, partitions, and primary index design, and the same principles apply here.
The Iceberg catalog itself needs operational attention. Snapshot accumulation, orphan file cleanup, and compaction are maintenance tasks that don’t happen automatically in all catalog implementations. Apache Polaris and LakeKeeper both have table maintenance APIs, but you need to build the scheduling around them. If you let compaction slide, query performance degrades as the number of small files grows.
The Ecosystem Is Catching Up
One encouraging signal is that the open-source ecosystem around this architecture is developing quickly. eBPF-based instrumentation agents like those from ODIGOS and Coroot can capture telemetry from uninstrumented services at the kernel level and emit it in OTLP format — meaning you don’t need to modify application code to get coverage. This is particularly valuable for legacy services or third-party components where SDK-level instrumentation isn’t practical.
Pipeline management tooling is also improving. Managing a fleet of OpenTelemetry Collectors at scale — deploying config changes, monitoring collector health, handling back-pressure — is still rougher than it should be, but dedicated tools are emerging to address this. The OpAMP protocol defines a standard for remote collector management that several vendors and open-source projects are implementing.
On the visualization side, Grafana has first-class support for ClickHouse as a data source, which means existing Grafana users can point their dashboards at a ClickHouse Iceberg-backed table with minimal friction. For teams that want structured query-driven exploration rather than pre-built dashboards, MCP server integration allows AI agents to query ClickHouse directly — which turns out to be a surprisingly powerful pattern for ad hoc incident investigation, since the agents can write and iterate on SQL without a human needing to know the exact schema.
What Still Needs Work
We want to be clear-eyed about the current limitations. This is not a drop-in replacement for a mature SaaS observability platform today. There’s real operational overhead in managing the catalog, tuning the pipeline, and maintaining the ClickHouse cluster. Teams that don’t have engineering capacity to own their observability infrastructure will find a managed platform more practical regardless of the cost-per-GB math.
Real-time alerting on sub-second data is still a challenge. The Iceberg commit cycle introduces inherent latency. Hybrid tables partially address this, but they’re still in early stages. If you need to alert on a metric within two seconds of it being emitted, this architecture in its current state will require supplementary infrastructure for the hot path.
The tooling around telemetry data quality is underdeveloped. ClickHouse will happily ingest malformed or inconsistently structured data without complaint, and the downstream cost of that inconsistency shows up in analytics that require extensive preprocessing. Investing in pipeline-layer validation is time well spent, but it’s not turnkey today.
None of these are fundamental blockers. They’re gaps that the ecosystem is actively closing, and the trajectory is clear. For teams that are willing to be slightly ahead of the curve, the architectural investment made now will compound significantly as the tooling matures.
What This Means for Teams Running ClickHouse Today
If your organization is already running ClickHouse — whether on the ChistaDATA platform or self-managed — you’re in a strong position to adopt this architecture incrementally. The query patterns are familiar, the SQL skills transfer directly, and ClickHouse’s native Iceberg table engine means you can start reading Iceberg-formatted data without a major infrastructure change.
The practical starting point is usually to add Iceberg as a secondary cold tier for an existing log table, test that queries span both tiers correctly, and validate that the cost reduction justifies the added complexity for your specific workload. From there, you can extend coverage to traces and metrics, tune the batching and compaction parameters for your volume, and gradually build toward a unified observability store that handles all three signal types through one query interface.
We’ve worked with engineering teams across various scale points — from 50-person startups to large enterprises — and the observability cost conversation follows a consistent pattern. The organizations that get ahead of it are the ones that treat telemetry infrastructure as a data engineering problem, not just a monitoring checkbox. The open lakehouse architecture gives you the primitives to do exactly that.
If you’re evaluating this for your own environment or have questions about how to adapt these patterns to your ClickHouse deployment, reach out to our engineering team. We’re actively helping customers navigate this transition and are happy to share more specific guidance based on your workload profile.
Written by the ChistaDATA Platform Engineering Team. ChistaDATA Inc. is not affiliated with ClickHouse Inc. ClickHouse® is a registered trademark of ClickHouse, Inc.