The Five Principles of Customer-Aligned Pricing in OLAP Database Systems

Introduction

For any software company, the pursuit of LTV, an acronym for the Life Time Value of a customer, is the ultimate north star. Database systems companies in & of themselves tend to have the highest LTVs amongst their software peer set, being the primordial system of record upon which all software is created. Yet, this natural advantage should not obscure the foundation stone upon which LTV rests: customer satisfaction. If customer satisfaction is high, no matter the category of software, customers will continue to use and pay for a company’s software & services. If it deteriorates beyond a point, even the natural advantages of a category may not be able to prevent customer churn, which (when it accelerates at scale) almost always sounds the death knell for a once-iconic software company.

In our view, customer satisfaction has two primary levers, which may be expressed by the following two questions:

  • Will you deliver what I need, in the light of my ever-evolving needs, with a minimum quantum of effort at my end?
  • Will you charge me a fair price for your product or service, now & later?

If a software company is forever able to satisfy these two criteria, there is no logical reason for a customer to churn away from them. This implies that no matter the ACV, as long as the customer sticks with you (and you stay in business), LTV is theoretically infinite. We find that early in a company’s lifecycle, there is usually strong product-market fit, which means these two questions are well-answered by the software company in question for their ICP (ideal customer persona). At scale, as the pursuit of profit takes over, service degrades on one or both of these dimensions (more often than not on pricing). We discussed that in detail in the previous article, where the cloud data warehouses led by Snowflake have evolved in a direction where the pricing model has become entirely unsustainable for the largest enterprises, and the signs of corporate greed are manifesting in four design choices that have the potential to bleed the enterprise dry:

  • The 20x Compute Markup
  • The Runaway Bill
  • The Unbundled Ecosystem
  • The Layers of Lock-In

Jeff Bezos famously said, “Their margin is my opportunity”. The same principle applies here. Unless there is significant long-term course correction by said $3B revenue battleship (which is frankly quite hard to do when your stock price is a function of current quarterly revenue growth and profits), there is a notable chance of dislocation with largest enterprises who are the current and prospective customers of cloud data warehouse technologies, who may seek better price-performance. Note that we are also entering the technology maturity curve for OLAP database systems, a period that is always marked by increasing cost-efficiency and optimal price-performance across the ecosystem.

In this article, we lay out what we believe are the five principles of customer-aligned pricing for OLAP database systems. We strive to live by these design choices ourselves, and believe with conviction that any software vendor adapting them for their niche has the best shot of achieving long-term customer satisfaction due to a lasting convergence of interests. Read on.

(1) Bring Software to your Data

As entities in the steady state of their existence with billions of dollars of revenue and a well-defined business model, enterprises already have large data infrastructure investments on their infrastructure platform of choice (be it own datacenter, virtual private cloud, or public cloud) and with software vendors of choice. Further, enterprises increasingly prefer a hybrid mode of operation, where historical and regulated datasets are stored on-premise, while the more recent cloud-native (and usually less mission-critical) workloads are in VPC or public cloud. Lastly, data tends to have a gravity of its own that compounds at scale, making it an operational nightmare (not to mention highly expensive) to move a few petabytes from one location to another.

Which is why the current cloud data warehouse strategy offering only servers configured with OLAP DBMS software in the public cloud (or at best in VPC) to customers seems particularly poorly designed. It creates the need to move data into the public cloud in the name of modernisation (which over time only serves to increase data lock-in while entities in the stack extract their pounds of flesh). It significantly alters the landscape of existing enterprise data infrastructure investments, often nullifying cost efficiencies that enterprises may have achieved over many years on their infrastructure. As an enterprise prospect recently told us on call, “I’ve already spent 5 years negotiating with AWS to get the best possible s3, EC2, and EBS costs for my company. If I need to now move my data out to someone else’s cloud, I’m going to be spending a lot more money than I am today. If there was a way I could have a great managed service for OLAP software installed on my infra, I’d choose in a heartbeat.” Further, these vendors offer only analytics for cloud-native workloads, which implies enterprises need to use a separate vendor for their on-premise data workloads, increasing management complexity and adding to analytics costs.

Our worldview is starkly different: instead of taking data to where the software is, which is almost always going to be a costly and resource-intensive affair, an OLAP DBMS vendor should take software to where the data is housed. Note also that software has much lower gravity than data: in the right hands it can be deployed anywhere, and it is flexible by definition. This simple design choice enables three strong outcomes for the enterprise which are aligned to their existing data strategy:

  1. There is no need to move data anywhere.
  2. There is virtually no need to alter existing storage & compute infrastructure investments.
  3. There is single-pane-of-glass data analytics across the org with a single OLAP vendor across all modes of deployment.

This is precisely what ClickHouse does. As open source software it can be provisioned anywhere data is housed, indeed across multiple existing modes of deployment simultaneously to create a single pane of glass for OLAP in the enterprise. Our company ChistaDATA Inc. specialises in bespoke production engineering on ClickHouse to fashion custom ClickHouse builds and systems, taking away the friction of deployment from the customer’s hands, and subsequently providing fixed-price support & managed services on the deployment to guarantee uptime & performance.

(2) Predictable Infrastructure

As discussed in the previous post, the remarkable outcome achieved by the cloud data warehouse, that of democratised access to OLAP systems for all enterprise users, was achieved on the back of the democratised provisioning paradigm, where anyone with the appropriate access in the enterprise could spin up a virtual data warehouse billed by the minute in a usage-based billing model. Democratised access meant that not only could one spin up infrastructure, one could also run any SQL query, whether well authored (resource-efficient) or poorly authored (resource-intensive). Given the majority enterprise users aren’t well-versed with the nuances of SQL efficiency, it has increased the propensity of poorly-written batch jobs chugging away for hours on newly provided infrastructure, creating a runaway analytics bill that is nigh impossible for the CXO suite or the data platform engineering team to control.

In our view, there is a simple way to achieve the same end of democratised OLAP access without the end result being a runaway bill: with centralised provisioning and foolproof access.

(2a) Centralised Provisioning

Enterprise data workloads within defined use cases are reasonably well-defined given enterprises broadly operate in a steady state. Consequently, it is possible to predict with a fair degree of accuracy the expected storage and compute need of the hour, and its growth over the foreseeable future (say 6-12 months). This infrastructure can be centrally provisioned, and once provisioned, does not need to be reviewed for the next few quarters till there is a step change in workload. Small scale-ups and scale-downs can always be done internally for ad-hoc needs. The beauty of this model is that the operational cost of data analytics infrastructure becomes fixed, well-defined, and most critically, entirely controlled by the DevOps & data platforms teams, who alone can modify the number of OLAP compute nodes or storage in use at any point.

This is how ChistaDATA Inc. engages with its enterprise customers. Key to our work is to define the precise infrastructure needed across modes of deployment, optimise that to the minimum with ClickHouse’s trademark resource efficiency while achieving the desired query performance, and help the enterprise deploy ClickHouse to this infrastructure. By returning control of infrastructure back to the stakeholders with the best viewpoint of cost control, we help our customers keep this pillar of OLAP cost in close check.

(2b) Foolproof Access

Once OLAP infrastructure is centrally provisioned, the data platforms teams can then onboard any number of enterprise users to run SQL queries, preserving the desired democratisation of OLAP access. The nuance here is that every bad (i.e., resource intensive) SQL needs to be identified and  corrected, in order to conserve resources and ensure the optimal utilisation of the system.  

This is where ChistaDATA’s ClickHouse comes in with a unique solution in the Asabru Proxy Server, a proxy layer above the ClickHouse database layer which is where any SQL written to the system first lands before execution. This layer offers 2 key features:

  • SQL blacklisting: Enterprises can configure the proxy server to prevent certain types of queries from running: for instance a query without a LIMIT or a SELECT * can be blacklisted, ensuring it never hits the database and starts wasting resources.
  • SQL optimisation: The proxy layer automatically rewrites resource-intensive SQL into resource-efficient SQL. It uses the EXPLAIN tool in ClickHouse to decode the query execution plan, and can prune irrelevant columns, push the filter predicate to the storage layer, optimise JOINs, etc. to limit the resources consumed by a single query on the fly.

The proxy layer works in conjunction with the ChistaDATA managed services team, who works closely with the enterprise teams to optimise frequent query patterns, create materialized views, denormalize schemas, implement custom caching systems, tune the Linux kernel and ClickHouse server, and up-skill enterprise teams in ClickHouse SQL to limit the occurrence of resource-intensive queries.

Combining centralised provisioning with democratised yet foolproof access, ChistaDATA’s ClickHouse eliminates the runaway analytics bill, instead co-architecting highly efficient and light OLAP infrastructure that meets precisely the performance needs of the customer, in the lowest possible cost outlay.

(3) Fixed-price Billing

In the last post, we predicted that the instrument of Snowflake’s glorious success, the usage-based database infrastructure billing model, is also the lever of its future disruption. We also explored that while the usage-based billing model promises transparent pricing based on utilisation, it is a core reason for the runaway analytics bill in conjunction with democratised provisioning & access. It is also leveraged to innocuously disguise predatory 20x+ compute markups under the guise of compute credits priced at a few dollars per machine per hour.

At ChistaDATA, we do away with the concept of usage-based billing entirely. We believe it is a great hack for the startup world with fluctuating analytical workloads but it is sub-optimal for stable and well-defined enterprise workloads. Our modus operandi is to work with enterprises to size their storage & compute requirements (and their growth trajectory) with precision and co-deploy centrally provisioned OLAP systems across datacenter and cloud with democratised and foolproof organisation-wide access for SQL querying.

Our billing model is also in line with our aspiration for true transparency and predictability for our customers. It is a fixed-price billing model for two discrete services, (a) 24*7 enterprise-class support and (b) end-to-end managed services.

(3a) Flat-Price Support

At ChistaDATA Inc., our core value proposition is bespoke production engineering and 24*7 enterprise-class support for ClickHouse. This means that for every new customer, we understand their precise workloads and requirements via a detailed performance audit, fine-tune the ClickHouse server to their needs along with the associated Linux kernel optimisations. We also perform schema and SQL engineering to achieve high performance, and architect a stable, resilient, fault-tolerant, secure and highly scalable system replete with shards, replicated, DR, backup, etc. We then work with enterprises to deploy it on infrastructure of their choice, start ingesting data, along with testing & scaling the system in production. After the heavy lifting is done, we are available for any ClickHouse-related remote DBA and SRE support with a predefined service level agreement (SLA), and jump in to resolve tickets linked to issues with performance, scalability, etc.

Our pricing model is based on the SLA. For our largest enterprise customers, our SLA is strict: for Severity-1 incidents turnaround time (TAT) is within 15 minutes of a raised ticket, while for Severity-2 incidents, TAT is 1-3 hours after the ticket is raised. For our mid-market and startup customers, the SLA tends to be lower in accordance with their requirements.

In this pricing model, for a defined enterprise-class support SLA as chosen by a ChistaDATA customer, the price is a flat annual rate, forever. We support any number of ClickHouse nodes or instances and any number of tickets raised by the customer in a year within the same flat price. Note that this price doesn’t change year-on-year, unless there is change in SLA as requested by the customer.

(3b) Per-node Managed Services

In addition to the bespoke production engineering and 24*7 enterprise-class support model, we also offer a service in which we take end-to-end-responsibility for ClickHouse uptime on the customer’s behalf. 

In this end-to-end managed services model, our billing model is a flat price per node. This price takes into account all ClickHouse-related nodes managed by us, from shards to replicas to Zookeeper nodes, etc., and is agnostic to the size of the node. The centralised provisioning model plays a key role in making the infrastructure predictable, and with a flat per-node fee, estimating the entire operational outlay on ClickHouse is quite as simple as multiplying the number of managed nodes with the aforementioned per-node fee.

Further we are entirely agnostic to the mode of deployment:

  • If the managed service is offered on-premise or in VPC, the customer effectively brings their own infrastructure, and can continue to deepen the cost efficiency they are enjoying with their cloud provider or datacenter.
  • If the managed service is offered on the public cloud, our fee includes the cost of cloud infrastructure. We are always happy to transparently share our gross margins with the customer, so they needn’t worry about draconian 20x compute markups!

This transparency and predictability is a unique design choice that is fully aligned with enterprise interests and is a key driver of customer satisfaction.

(4) Open Source Bundled Ecosystem

In the previous post we explored the economic sprawl created by a VC-funded point-vendor sprawl around the data warehouse. With ETL pipes, data quality monitoring, cataloging, observability, etc. all being served by independent companies with pricing indexed to your CDW bill, customers indicate that the net spend on the ecosystem is as much if not more than your core data warehousing bill! The tax collectors on your data are numerous and growing in count every single year.

The open-source analytics infrastructure ecosystem offers a neat solution to this conundrum. At ChistaDATA Inc, for instance, we charge only for the enterprise-class support or managed services provided on ClickHouse. All of the software as well as tooling, be it Kafka, Debezium, Redpanda, Pulsar, et al for data pipes, CDC (change data capture) for real-time OLTP data streaming or archiving into ClickHouse, observability for the ClickHouse cluster, business intelligence & dashboarding, the Anansi query profiler, the Asabru proxy server, etc., are either built into the product, or offered at no cost via integrations with open source software tools (such as Superset for BI, Grafana for visualization, etc.) Further, if a customer needs an integration with an open-source or closed-source software/tool to operate with minimal change to their workflows, we custom-build it in a matter of weeks as a part of the deployment, as part of the migration service itself at no additional cost. Lastly, the ClickHouse ecosystem, being of the community, for the community, and by the community, grows and expands constantly with contributions from ecosystem vendors and the community itself, embracing greater functionality within its fold every year.

This creates a compelling value proposition, where an entirely custom and bundled ecosystem of 100% open-source analytics software is available to customers at no additional cost to their primary investment in the database infrastructure.

(5) Resource-efficient Data Operations

ClickHouse offers a few key advantages over incumbent batch-processed OLAP technologies that makes it not just the fastest, but also the most resource-efficient analytics database on the planet. We’ve covered it in detail here, and summarized it below:

  • Best-in-class data compression with 10s of compression algorithms and codecs over columnar storage implies industry-lowest storage cost.
  • Columnar storage in addition to multiple layers of indexes & caching implies the lowest I/O cost of moving data from disk to RAM.
  • Vectorized query execution in addition to massively parallel processing across all shards and cores implies high hardware utilisation and consequently the lowest cost of query execution.
  • Innovations such as materialized views, 10+ unique storage engines for varying use cases, support for high concurrency, efficient implementation of GROUP BY and JOINs, implies lowest compute costs per query for complex or frequent/recurring query operations.

The consequence of being one of the most resource-efficient database systems in the world is that ClickHouse can match the performance of incumbents such as Hadoop, Snowflake, Elastic, Redshift, etc. in a fraction of the compute nodes and disk size (often as low as 40-60% of incumbent hardware requirements). The ChistaDATA Managed Services team works closely with the customer during the performance audit to find such avenues to optimise infrastructure and plan the most resource-efficient deployment.

Conclusion

Every foundational technology goes through an adoption cycle. As it approaches maturity, enterprises increasingly seek optimal price-performance. This may cause the incumbents in the market to adapt to the needs of the time, or create new leaders who capitalise on displacement opportunities. We think OLAP DBMS has entered technological maturity, and we strongly believe that ClickHouse represents the envisioned end-state of superior price-performance that will make it the choice of the enterprise for real-time analytical workloads, as well as AI & ML workloads with similar low-latency high-concurrency characteristics. At ChistaDATA Inc., our design choices  embody the above five principles of customer-aligned OLAP DBMS pricing. We hope it will help us deliver the benefits of ClickHouse’s unmatched price-performance to the global enterprise.

We look forward to serving you with ClickHouse and the ChistaDATA Data Fabric. In the next set of posts, we will share more about how enterprises can quickly derive benefit from ChistaDATA’s set of products and services around ClickHouse, most notably our Data Fabric. Stay tuned.