Leveraging ClickHouse to Build Real-time Credit Card Fraud Detection in Modern Banking

Introduction

Credit card fraud analytics systems have migrated from traditional OLAP to ClickHouse based real-time analytics systems because traditional OLAP systems have limitations in processing and analyzing large volumes of data in real-time.

Limitations of OLAP systems

These limitations include:

  1. Slow processing times: Traditional OLAP systems are designed to process large volumes of data, but they are not optimized for real-time processing. As a result, they may not be able to process and analyze data fast enough to detect fraudulent transactions in real-time.
  2. Limited scalability: Traditional OLAP systems may not be able to scale to handle the increasing volumes of data generated by credit card transactions. This can lead to slower processing times and longer query execution times, making it difficult to detect fraud in real-time.
  3. Inability to handle complex data: Credit card transactions generate complex data with multiple variables, such as location, transaction type, and time of day. Traditional OLAP systems may not be able to handle this complexity, leading to inaccurate or incomplete analysis.
  4. Cost: Traditional OLAP systems can be expensive to implement and maintain, making it difficult for smaller banks and credit card companies to invest in these systems.

ClickHouse addresses these limitations more efficiently by providing a high-performance, scalable, and cost-effective real-time analytics system that can process and analyze large volumes of complex data in real-time. ClickHouse is designed to handle high volumes of data and complex queries with low latency, making it well-suited for real-time fraud detection and prevention.

  • Compact data storage – Ten billion UInt8-type values should exactly consume 10GB uncompressed to efficiently use the available CPU. Optimal storage even when uncompressed benefits performance and resource management. ClickHouse is built is store data efficiently without any garbage.
  • CPU efficient – Whenever possible, ClickHouse operations are dispatched on arrays, rather than on individual values. This is called “vectorized query execution,” and it helps lower the cost of actual data processing.
  • Data compression – ClickHouse supports two kinds of compression LZ4 and ZSTD. LZ4 is faster than ZSTD but the compression ratio is smaller.ZSTD is faster and compresses better than traditional Zlib but slower than LZ4.  We recommend customers LZ4 when I/O is fast enough so decompression speed will become a bottleneck. When using super ultra-fast disk subsystems you have an option to specify “none” compression. ZSTD is recommended when I/O is the bottleneck in queries with large range scans.
  • Can store data in disk – The columnar database systems like SAP HANA and Google PowerDrill can only work in the RAM.
  • Massively Parallel Processing – ClickHouse is capable of Massively Parallel Processing very large/complex SQL(s) optimally and cost-efficiently
  • Built for web-scale data analytics – ClickHouse supports sharding and distributed processing, This makes ClickHouse the most preferred columnar database system for web-scale. Each shard in ClickHouse can be a group of replicas addressing maximum reliability and fault tolerance.
  • ClickHouse support Primary Key – ClickHouse permits real-time data updates with a primary key (there will be no locking when adding data). Data is sorted incrementally using the merge tree to perform queries on the range of primary key values.
  • Built for statistical analysis and supporting partial aggregation – ClickHouse is a statistical query analysis-ready columnar database store supporting aggregate functions for approximated calculation of the number of various values, medians, and quantiles. ClickHouse supports aggregation for a limited number of random keys, instead of for all the keys. You can query on a part (sample) of data and generate approximate results reducing disk I/O operations considerably.
  • Supports SQL – ClickHouse supports SQL, Subqueries are supported in FROM, IN, and JOIN clauses, as well as scalar subqueries. Dependent subqueries are not supported.
  • Supports data replication – ClickHouse supports asynchronous multi-master and master-slave replication.

ClickHouse Features for credit card fraud analytics

ClickHouse also provides a number of features that make it particularly useful for credit card fraud detection and analytics, including:

  1. Columnar storage: ClickHouse uses a columnar storage format that is optimized for analytical queries. This enables faster query execution times and reduces I/O overhead, making it possible to process and analyze large volumes of data in real-time.
  2. Real-time query processing: ClickHouse is designed to handle real-time queries with low latency. This makes it possible to detect and prevent fraudulent transactions in real-time.
  3. Scalability: ClickHouse is highly scalable and can handle large volumes of data, making it well-suited for credit card fraud analytics systems that process large volumes of data.
  4. Cost-effectiveness: ClickHouse is open-source and free to use, making it a cost-effective option for credit card companies and banks that want to invest in real-time analytics without incurring high costs.
  5. Real-time aggregation: ClickHouse supports real-time aggregation of data, which makes it possible to analyze large volumes of data quickly and efficiently. This is particularly important for credit card fraud analytics, where the ability to quickly aggregate and analyze data in real-time is critical.
  6. SQL support: ClickHouse supports SQL, which makes it easy for analysts and data scientists to work with the data. This is important for credit card fraud analytics, where the ability to quickly and easily analyze and visualize the data is critical.
  7. Data compression: ClickHouse supports data compression, which reduces the amount of storage required for large volumes of data. This is important for credit card fraud analytics, where large volumes of data are generated and storage costs can quickly become prohibitive.
  8. High availability: ClickHouse supports high availability and replication, which ensures that the system is always available and that data is never lost. This is important for credit card fraud analytics, where the ability to detect and prevent fraudulent transactions in real-time is critical.
  9. Security: ClickHouse provides robust security features, including role-based access control and data encryption, which ensures that sensitive credit card data is protected from unauthorized access.

Why ChistaDATA

Why do successful companies work with ChistDATA for ClickHouse Consultative Support and Managed Services?

  • ChistaDATA provides full-stack ClickHouse Optimization. We deliver elite-class Consultative Support (24*7) and Managed Services for both on-premises ClickHouse infrastructure and Serverless/Cloud/ClickHouse DBaaS operations.
  • ChistaDATA Server for ClickHouse (and all tools essential for Data Ops. @ Scale) will be Open Source (100% GPL forever) and free. We are committed to helping corporations in building Open Source ColumnStore for high-performance Data Analytics.
  • Global Team available 24*7 for ClickHouse Consultative Support and Managed Services.
  • Our team has built and managed Data Ops. Infrastructure of some of the largest internet properties. We know very well the best practices for building optimal, scalable, highly reliable and secured Database Infrastructure @ scale.
  • Lean Team Culture: Startup-friendly and specialists in DevOps. and Automation for Database Systems Maintenance Operations.
  • Transparent pricing and no hidden charges – We have both fixed-priced and flexible subscription plans.
  • Based out of San Francisco Bay Area. But, we have global teams operating from 11 cities worldwide to deliver 24*7 Consultative Support and Managed Services for ClickHouse.

Conclusion

In summary, ClickHouse is a powerful real-time analytics system that is well-suited for credit card fraud analytics. It provides fast processing times, scalability, and the ability to handle complex data, while also being cost-effective and providing robust security features.

To know more about how banking organization can benefit from Clickhouse, do consider reading the following article:

About Shiv Iyer 235 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.