Architecture & Design of ChistaDATA Cloud for ClickHouse – Part 2

The Engineering Marvel of ChistaDATA DBaaS

Introduction

In part 2 of this 2 part series, we detail the architectural and design choices behind ChistaDATA Cloud for ClickHouse, an end-to-end managed cloud service for ClickHouse, the world’s fastest real-time analytics DBMS.

Commence your reading from: Architecture of ChistaDATA Cloud for ClickHouse – Part 1.

Technology Stack Overview

This section delves into our key design choices and the rationale behind each decision, highlighting how these decisions have shaped our infrastructure and services.

Kubernetes for Compute Infrastructure

Our decision to adopt Kubernetes as our compute infrastructure platform early on was significantly influenced by its array of powerful built-in features. Kubernetes stands out for its exceptional ability to dynamically scale applications in response to fluctuating demand, a feature crucial for maintaining efficiency and managing resources. Furthermore, it offers robust solutions for rescheduling and recovery, such as automatically rescheduling and restarting containers in the event of crashes, thereby ensuring high availability. The monitoring capabilities of Kubernetes are also noteworthy, especially with its liveness and readiness probes that provide comprehensive monitoring options.

Additionally, Kubernetes simplifies the complexities of service discovery and integrates seamlessly with load balancing solutions, enhancing the overall performance and reliability of services. The adoption of the Operator Pattern within Kubernetes allows for a high degree of automation in responding to various events within the cluster, thereby boosting operational efficiency. Another significant advantage of Kubernetes is its facilitation of smooth upgrades, both for applications and node/OS, ensuring that our systems remain up-to-date with minimal downtime. Lastly, its cloud agnosticism is a key feature that offers much-needed flexibility and prevents vendor lock-in, allowing us to choose the best tools and services without being restricted to a single provider.

Kubernetes ClickHouse Operator 

To streamline the management of ClickHouse clusters on Kubernetes, we are making us of the Altinity’s ClickHouse Operator. This powerful tool abstracts away the intricacies of cluster configuration and maintenance. It simplifies deployment, scaling, and updates while ensuring high availability through automatic failover and replication. Additionally, seamless integration with monitoring tools like Prometheus provides valuable insights into cluster performance.

Managed Kubernetes Services

Choosing managed Kubernetes services such as EKS (Elastic Kubernetes Service) in AWS and similar offerings in other cloud environments was a strategic decision for us. The primary appeal of these services lies in their ability to simplify the management of Kubernetes cluster infrastructure. By delegating the complex tasks associated with cluster management to these managed services, we are able to focus more on our core operations and less on the intricacies of Kubernetes. Additionally, this approach significantly accelerates our time to market, a crucial factor especially when operating with a lean team. The ability to quickly deploy and scale our applications without the overhead of extensive infrastructure management is invaluable in maintaining a competitive edge.

Network Isolation with AWS CNI

Our network strategy effectively leverages the AWS Container Network Interface (CNI), chosen for its seamless integration and robust capabilities within the AWS ecosystem. The key aspect of AWS CNI is its ability to directly plug Kubernetes pods into the AWS virtual network. This integration is crucial as it facilitates direct and efficient interactions with various AWS services, enhancing the overall performance and scalability of our applications. As a Container Network Interface solution, AWS CNI excels in managing network interfaces for Linux containers specifically in the AWS environment. This management is essential for ensuring that each container has the necessary network resources to perform optimally.

Moreover, AWS CNI stands out for supporting the high-performance networking features of AWS. These features are integral to maintaining optimal communication standards and robust security within our network. By utilizing these advanced networking capabilities, we ensure that our network infrastructure is not only highly efficient but also adheres to the highest standards of security and performance.

Deployment Automation

FluxCD is an open-source tool essential for the continuous delivery of applications to Kubernetes clusters. It leverages Git as the source of truth for infrastructure and application code, ensuring that the cluster’s state always matches the configurations in the Git repository. This approach greatly streamlines deployment, bolsters security, and facilitates easy tracking and rollback of changes.

Key features of FluxCD include its declarative configuration, aligning with Infrastructure as Code (IaC) principles. It continuously monitors the Git repository, automatically applying changes to the Kubernetes cluster, which streamlines the deployment process and maintains consistency. Version control in Git provides an audit trail and easy rollback options. FluxCD also supports automated synchronization with Git, multi-tenancy, and integrates well with other Kubernetes ecosystem tools. Its customizable and extensible design allows it to fit various workflows and environments, enhancing its utility in diverse deployment scenarios.

AWS Network Load Balancer (NLB)

At the core of our architecture is the AWS NLB, deployed for each service. It’s engineered to handle millions of requests per second while maintaining low latencies, making it ideal for managing volatile traffic patterns and high-throughput applications. Its automatic scaling capability is a game-changer, adeptly adjusting to traffic fluctuations to ensure consistent performance. Additionally, the NLB’s health checks contribute to the overall reliability of our services by directing traffic only to healthy instances.

Integration with Istio

Istio, a service mesh, brings advanced traffic management capabilities to our setup. It allows for precise control over traffic through features like canary deployments, circuit breaking, and fault injection, enabling us to fine-tune our network responses dynamically. Security is another domain where Istio shines, offering strong features like mutual TLS for encrypted communication between services, coupled with fine-grained access policies. Moreover, Istio’s observability features provide detailed insights into traffic flow and performance metrics, which are invaluable for effective monitoring and troubleshooting. The use of NLB with Istio offers a high degree of flexibility and control over how traffic is managed and routed within our Kubernetes environment, aligning with our commitment to providing reliable and secure services.

Load Balancer and Asabru Proxy

In conjunction with NLB and Istio, we have the Asabru Proxy, developed by ChistaDATA. This high-performance SQL Proxy is designed to elevate the scalability and availability of database servers such as ClickHouse, PostgreSQL, and MySQL. Asabru allows for streamlined configuration and management of database connections, thereby enhancing database performance. Currently, it supports TCP/IP, HTTP/HTTPS, and TLS/SSL protocols for ClickHouse, with ongoing efforts to extend TLS/SSL support to MySQL and PostgreSQL.

This triad of AWS NLB, Istio, and Asabru Proxy forms the backbone of our robust network architecture. Each component brings unique strengths, from handling high traffic loads and ensuring security to enhancing database performance. Together, they create a network environment that is efficient, scalable, secure, and easy to manage.

Storage

In our quest to optimize data storage, we have developed a strategy that combines the strengths of both network-attached storage, particularly block storage, and object storage solutions.

Network-Attached Storage (Block Storage)

For our Clickhouse cluster, we have strategically chosen block storage as the primary storage option. This includes using various mediums like HDDs, SSDs, and Provisioned IOPS, echoing the reliability and efficiency of AWS Elastic Block Store (EBS). A significant advantage of block storage is its flexibility in vertical scaling. We can easily scale our storage capacity by adding new volumes or extending existing ones, ensuring that our storage infrastructure can effortlessly keep pace with increasing data demands. This capability allows specific storage policies to be configured for single or multiple volumes, optimizing performance and reliability.

Object Storage

Complementing our primary storage, object storage is our go-to solution for backup needs, utilizing the cost-effective and scalable nature of AWS S3 or GCP GCS. Object storage is renowned for its unlimited scalability, unmatched durability, and considerable cost advantages, particularly for storing large data volumes. Our methodical approach involves setting up a dedicated S3 bucket for each organization and further organizing the data through subpaths for different clusters within the same organization, enhancing data clarity and accessibility.

We rigorously implement role separation and use separate service accounts to bolster our data security. These measures ensure that data access and management are tightly controlled and monitored, providing robust protection for our valuable data assets. This combination of innovative storage strategies and stringent security protocols demonstrates our commitment to managing and safeguarding data efficiently and securely.

Authentication and Authorization

Our authentication solution is powere by Keycloak which is a mature open-source tool within our self-hosted environment. Keycloak is a remarkable open-source Identity and Access Management (IAM) solution nurtured by Red Hat. Authorization is performed with our inbuilt mechanism which allows access controls to be defined in a fine grained and flexible manner. Various functionalities in the platform are organized in a hierarchical tree based structure and a user can be granted access to all or specific elements within each level.

Other in-house Integrations

In our pursuit of excellence, we have seamlessly integrated several essential tools into our Database as a Service (DBaaS) platform, enhancing its capabilities and utility.

Benchmarking Suite – Empowering Performance Insights

Our Benchmarking Suite is thoughtfully configured with an Apache Superset dashboard, ensuring intuitive visual representations of critical performance metrics. As part of our commitment to the community, we are on the verge of open-sourcing this comprehensive end-to-end benchmarking toolkit. Its impending release promises to be a valuable asset for the wider community, simplifying rigorous benchmarking procedures while providing profound insights. This, in turn, will streamline the benchmarking process, reducing the need for manual intervention and making it more accessible to all.

ChistaDATA Anansi – A Profound Query Profiler for ClickHouse

In our relentless pursuit of excellence, our ChistaDATA Team has developed ‘ChistaDATA Anansi,’ a specialized query profiler tailored for ClickHouse and PostgreSQL. This powerful log analysis tool has been seamlessly integrated into our DBaaS offering and is also made open source for the benefit of others. The reports generated by ChistaDATA Anansi offer a deep dive into various query characteristics, including execution time, memory usage, bytes read, and other fundamental details. This wealth of information empowers you to pinpoint bottlenecks and optimize your application’s performance, contributing to a more efficient and effective database experience.”

Billing Model

ChistaDATA DBaaS: This innovative platform is designed to streamline the customer experience by eliminating the need for any initial investment in infrastructure. With ChistaDATA Cloud, customers can easily set up and manage their databases, all hosted on a secure and reliable external infrastructure.

Understanding the varied needs of our clients, ChistaDATA Cloud offers a range of subscription plans tailored to meet specific requirements. In addition, the platform is versatile, supporting hosting on popular cloud infrastructures like AWS, GCP, and Azure, giving clients various database hosting options. Key features such as data replication, fail-over, and backup are integrated into each subscription plan, ensuring comprehensive and robust data management.

Our billing model for ChistaDATA DBaaS is transparent and user-friendly, with charges based on actual resource utilization. This includes the combined cost of cloud compute, block storage, object storage, and a service fee for each active server, with billing conducted monthly based on the previous month’s usage.

This clear, flexible, and predictable billing model is especially advantageous for SaaS businesses, as it provides a straightforward understanding of costs. At ChistaDATA, we value this approach, reflecting our commitment to offering customer-centric services that are both transparent and easy to understand.

In-the-Pipeline Add-ons

We’re excited to share our exciting upcoming features that our dedicated team is actively working on:

1.  Bring Your Own Kubernetes: Whether you’re operating within your on-premise infrastructure or utilizing a public/private cloud, we’re introducing ChistaDATA DBaaS Anywhere. This innovation offers a convenient avenue for cloud management of ChistaDATA DBaaS while empowering users to retain control over their data within their own cloud Virtual Private Clouds (VPCs) and private data centers. All the while, you can enjoy the benefits of running managed ClickHouse within your personally curated Kubernetes clusters. The ChistaDATA Cloud Manager UI simplifies the management of ChistaDATA DBaaS Anywhere environments, mirroring the experience of fully hosted ChistaDATA DBaaS environments. You’ll have the ability to oversee multiple environments from a single ChistaDATA DBaaS account and even mix and match different environment types. The best part? ClickHouse management remains consistent and user-friendly, regardless of the environment you’re in.

2.  Add ChistaDATA Cloud to Cloud Marketplace: We’re also excited to announce that you’ll soon find ChistaDATA DBaaS on the Cloud Marketplace. This addition will provide you with even more accessibility to our powerful database-as-a-service solution, making it easier than ever to integrate ChistaDATA DBaaS into your cloud ecosystem. Stay tuned for further updates!”

In summary, we have delved deep into the inner workings of this successful DBaaS platform. Our engineering team has meticulously attended to every facet, from the foundational cluster initialization to the automated scaling, ensuring high availability with Kubernetes and multi-layered security measures. Additionally, we explored the evolution of the control and data planes, poised for further enhancements and new features.

We have also examined the seamless integration of DBaaS tools and the tech stack within our SaaS platform while shedding light on the functional and pricing modules. Looking ahead, we glimpse the exciting developments on the horizon. Our dedicated team continually strives to enhance the user experience and make it even better.

Conclusion

This concludes our deep dive into the architecture and design of our managed cloud service ChistaDATA Cloud for ClickHouse. I hope this was helpful to explain how this can help deliver the benefits of ClickHouse’s blazing-fast query performance to the enterprise.