Behind the Scenes : ChistaDATA DBaaS – PART |

Deep Dive into Architecture of ChistaDATA's ClickHouse DBaaS

In my recent collaboration with one of our esteemed channel partners, an intriguing insight emerged: there’s a growing curiosity among people about the intricacies of Database as a Service (DBaaS). They are keen to understand not just the visual aspect of DBaaS, but also the nuances of hosting, the cost factors involved, and the internal orchestration that makes it all tick. This surge in interest has inspired me to pen a detailed blog post, aiming to demystify these elements and provide a clearer picture of how ChistaDATA DBaaS operates in the modern tech landscape.

Introduction

ChistaDATA Cloud, a DBaaS offering from ChistaDATA Inc, is built on ClickHouse – a managed service utilizing one of the world’s most popular OLAP databases. It enables quick and easy database provisioning on hosted infrastructure, eliminating the need for upfront infrastructure investment. Customers have the flexibility to choose from various subscription plans and host their databases on leading cloud services like AWS, GCP, or Azure. Key features such as data replication, fail-over, and backup are integral parts of the subscription.

The Architecture of the ChistaDATA Cloud

At ChistaDATA, we believe powerful analytics shouldn’t come at the cost of complexity. That’s why we crafted the ChistaDATA Cloud with a dual-pronged architectural approach, catering to both ease of use and enterprise-grade performance.

For proof-of-concept testing and development environments, we utilize a shared computing module. This streamlined approach allows developers to quickly spin up their projects without getting bogged down in infrastructure complexities. It’s the perfect stepping stone for exploring the power of ChistaDATA before scaling up.

When it comes to mission-critical deployments, we leverage the robust power of a shared-nothing architecture. This means each node in the cluster houses its own data, independent of the others. This not only bolsters security by eliminating single points of failure but also optimizes resource utilization.

Benefits of Shared-Nothing:

  • Enhanced Fault Tolerance: Node failures are isolated, ensuring the system continues to function uninterrupted.
  • Seamless Scalability: Adding or removing nodes is straightforward, adapting to your evolving data needs.
  • Uninterrupted Upgrades: Individual nodes can be upgraded without impacting the entire system.

By embracing this dual approach, ChistaDATA Cloud offers the best of both worlds:

  • Developer-friendly: Quick and easy setup for exploration and testing.
  • Enterprise-ready: Secure, scalable, and reliable for high-performance analytics.

ChistaDATA Cloud Component

The diagram below shows the DBaaS portal’s high-level architecture and the supported data plane infrastructure.

DBaaS Portal:- Consists of the UI component, API, and Control Plane Data Plane: – The cloud infrastructure for deploying ClickHouse database instances.

  1. Platform: This is the user-facing layer, incorporating both the User Interface (UI) and Application Programming Interface (API). It empowers users to execute operations on the cloud, provides access to their ClickHouse services, and facilitates interaction with the data. The control plane manages the orchestration within the data plane.
  2. Data Plane: Serving as the infrastructure-facing component, the Data Plane is responsible for the management and orchestration of physical ClickHouse clusters. Its functions encompass resource allocation, provisioning, updates, scaling, load balancing, and isolating services for different tenants. Additionally, it handles backup and recovery, observability, and metering, which involves collecting usage data.

Our Control Plane is currently operational on AWS. However, we have a Data Plane across all major cloud providers, including Google Cloud and Microsoft Azure. The Data Plane is designed to encapsulate and abstract the cloud service provider (CSP) specific logic, thereby relieving the Control Plane from these complexities.

On the Data Plane, customer ClickHouse Clusters are segregated by namespace, a measure taken for compliance reasons. For large enterprise customers, containers are hosted on separate worker nodes, ensuring additional isolation and security.

Platform Overview

ClickHouse clusters managed through ChistaDATA cloud can be organized logically under the following hierarchy, with organization at the top.

Organization – The organization is a customer/client

Workspace – A Logical grouping of database clusters

Cluster – A cluster will be the reference to a ClickHouse installation.

Our core backend systems are written in Golang, for its efficiency and performance. Complementing this, we utilize Python for certain specific tasks, such as running backup operations and monitoring jobs, leveraging its versatility and ease of use in these areas.

The architecture is highly decoupled where various layers such as control plane and data plane and various subsystems such as the metrics reporting system can be independently scaled or deployed. This decoupled API architecture plays a pivotal role in our system, providing us with the flexibility and efficiency. This strategic combination of technologies and architecture ensures that our system is not only reliable and efficient but also scalable and adaptable to future enhancements.

When a client initiates an action involving the Data Plane be it creating a new cluster or inquiring about the current status of a cluster the process is set in motion through a call made from the Control Plane to the Data Plane API. This interaction exemplifies the seamless and efficient communication between the two planes, ensuring that client requests are promptly and accurately addressed.

In the reverse scenario, where events occurring within the Data Plane need to be communicated back to the Control Plane for instance, notifications about a cluster being successfully provisioned, monitoring data events, or system alerts these are transmitted using a callback mechanism. This method ensures a consistent and reliable flow of information from the Data Plane to the Control Plane, keeping the latter updated on the operational status and any critical events within the system. A real time database management solution calls for event driven architecture and this system of callbacks is currently being evolved into it.

Data plane APIs are standardized to make it decoupled from the logical entities of cloud UI and the platform. It abstracts out the complexities of managing ClickHouse hosted within a kubernetes ecosystem. Despite its complexity, this implementation remains entirely transparent to users, ensuring a seamless and intuitive experience. The API is tasked with a spectrum of essential functions, each meticulously designed to enhance the functionality and user interaction with our ClickHouse service. These functions encompass:

  1. Controlling the ClickHouse Service: Empowering users to start, stop, and pause their ClickHouse service, offering effective operational management.
  2. Customizing Service Configuration: Allowing users to tailor the service to their specific requirements by modifying configuration settings.
  3. Managing Exposed Endpoints: Handling various types of endpoints, including HTTP and TCP, offering versatile ways for users to engage with their service.
  4. Configuring Endpoint Details: This involves setting up Fully Qualified Domain Names (FQDNs), ensuring not only accessibility but also standardization and identification.
  5. Enhancing Endpoint Security: Implementation of security measures like TLS to safeguard communication channels.
  6. Establishing and Managing Customer Database Accounts: Creation of primary database accounts for customers, along with features like password reset functionality.
  7. Providing Comprehensive Service Information: Offering detailed insights into the ClickHouse service, encompassing endpoint details such as FQDNs and ports.
  8. Delivering Real-time Status Updates: Keeping users informed about the current state of the ClickHouse service, whether it’s in provisioning, ready, running, paused, or experiencing degraded performance.
  9. Handling Data Backups and Restorations: Ensuring data integrity and availability through efficient backup and restoration processes.

Each of these vital functionalities plays a pivotal role in shaping the overall operation and user experience of the ClickHouse Cloud service, underscoring the paramount importance of the API within the Data Plane architecture.

Functional Modules

Five Functional Modules Driving Unmatched Performance:

Scalability  – Our goal was to enable our product to effortlessly adapt to surges in user traffic while maintaining optimal service performance. Kubernetes emerged as the ideal solution, allowing us to scale our compute resources seamlessly. It guarantees the high availability of our applications through automatic failover and self-healing mechanisms. Moreover, Kubernetes offers the invaluable advantage of portability and effortless integration with various cloud services, including storage and network solutions, ensuring our product’s robust and resilient infrastructure.

Reliability  – Data is paramount in today’s business landscape, and infrastructure services must remain uninterrupted. Recognizing this, we’ve engineered ChistaDATA Cloud for high availability, swiftly recovering from internal component failures without affecting overall system reliability. Our cluster spans three availability zones for production services and two for development, ensuring resilience in zone-specific issues. Soon, we’ll expand to support multiple regions, isolating outages and safeguarding service continuity.

Auto-Scaling – We’re elevating our system by implementing auto-scaling capabilities through AWS Auto Scaling Group, allowing us to adapt to diverse workload patterns effortlessly. By decoupling storage from computing resources, we’ve gained the agility to dynamically add or remove CPU and memory resources in response to the unique requirements of each workload. This level of adaptability guarantees that our system efficiently allocates resources, tailoring them to meet changing demands, ultimately optimizing performance and resource utilization.

Security and Compliance – To enhance security and compliance within our system, we’ve implemented a Single Sign-On (SSO) login, allowing seamless and secure access. We’ve also incorporated additional layers of protection through GitHub and Google authentication methods. Furthermore, we eagerly anticipate the SOC 2 compliance certification as we are currently in the final phase of obtaining it, which will further validate our commitment to maintaining robust security and compliance standards.

Monitoring and Alerting – We adhere to best industry practices, guided by our meticulously crafted ChistaDATA policy. Leveraging automation, we continuously monitor our system, offering real-time suggestions and comprehensive reports to ensure optimal performance. Additionally, we diligently monitor usage patterns and have established alarm thresholds for critical and warning scenarios, empowering our user administrators to take timely and informed actions. Furthermore, In our cluster launch process, we’ve included an option for users to enable an exporter, facilitating indirect integration. This feature is particularly useful for customers with a Grafana server, as it allows them to conveniently pull and display their cluster data as a dashboard on Grafana and Superset without the need for direct integration. This implementation provides a straightforward and efficient way for users to monitor and manage their cluster metrics.

Performance Benchmarking

While slightly off-topic, I’d like to shed some light on the performance aspect of ChistaDATA DBaaS.

In our performance benchmarking efforts, we conducted tests on a substantial dataset of approximately 56GB, consisting of around 100 million rows, and the results were truly remarkable. Notable highlights include sub-second median query latency achieved across 40 complex grouping and sorting tasks on an XL instance. The query throughput soared to an impressive 46 million rows per second, all while maintaining a compression factor of about 0.24 on the dataset. For a more in-depth look at these findings, you can explore our comprehensive report here.

Furthermore, in an independent benchmark conducted by Difinative, we compared the performance of ChistaDATA ClickHouse to Google BigQuery. The results unequivocally demonstrate that ChistaDATA ClickHouse outperforms BigQuery across various aspects, particularly in batch ingests and complex analytical queries. You can delve deeper into this comparison in the article available here Google BigQuery v/s ChistaDATA ClickHouse..

For a more extensive analysis of ClickHouse’s performance in comparison to other leading databases, I invite you to visit ClickHouse’s official site at https://benchmark.clickhouse.com/.

Please continue your reading in Part 2: Behind the Scenes: ChistaDATA DBaaS – PART ||