Migrating Oracle Database to ClickHouse

In previous article we talked about migrating from OLTP databases to the ClickHouse database. This migration technology tool will soon be available in the ChistaDATA DBaaS and you can enjoy the advantages of ClickHouse database with only a single click. Today we’ll start by showing you how to migrate from an Oracle database to Kafka, and then we’ll talk about migrating to the ClickHouse database. But first;

Table of Contents

Why choose ChistaDATA DBaaS for migration?

Serverless header connector
Simply authenticate your OLTP database and ClickHouse, and your ClickHouse database data integration will automatically adapt to schema and API changes on ChistaDATA DBaaS environment.
No more security compliance issues.
To test your data pipeline without relying on third-party services, use ChistaDATA DBaaS’ open source edition. Your security team will love it.
Being open source, it is extensible.
With ChistaDATA DBaaS you can easily customise the open source OLTP database connector to your specific needs. All connectors are freely available.
Monitoring & Alerts on your terms
Sometimes delays happen. ChistaDATA log everything and let you know when problems occur. Get alerts the way you want.
Normalised schemas
Users can choose raw data, analysts can choose normalized schemas. ChistaDATA DBaaS offers several options you can use.

Let’s demonstrate the migrating from Oracle DB to ClickHouse

To get started, we need an Oracle database system that is up and running. One of the easiest ways to do this is to use Docker to deploy a container that runs the Oracle database. Oracle has made such containers available in its Oracle Container Registry, allowing anyone to run and test the database. First, we need to sign in here. After this step, we need to authenticate with the Oracle Container Registry credentials using the following command:

docker login container-registry.oracle.com

The prerequisites

The Oracle JDBC driver is not included with the Debezium Kafka Connect image. The JDBC driver must be manually downloaded and mounted into the Debezium Kafka Connect image before using Debezium for Oracle. You can download this official Oracle Database JDBC driver for 21c from here and this downloaded jar will be used in the following part to add the driver to Debezium’s Kafka Connect container’s base image.
Also let’s take a closer look at the setup before registering the connector with Kafka Connect. In this example, we will utilize the following configuration. Let’s download this file next to the ojdbc8.jar as oracle.json

oracle.json:

{
  "name": "customers-connector",
  "config": {
    "connector.class": "io.debezium.connector.oracle.OracleConnector",
    "tasks.max": "1",
    "database.hostname": "desktop-oracle-1",
    "database.port": "1521",
    "database.user": "c##dbzuser",
    "database.password": "dbz",
    "database.dbname": "ORCLCDB",
    "database.pdb.name": "ORCLPDB1",
    "database.server.name": "server1",
    "table.include.list": "C##DBZUSER.CUSTOMERS",
    "topic.prefix": "dbz",
    "database.history.kafka.bootstrap.servers": "kafka:9092",
    "database.history.kafka.topic": "schema-changes",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-changes.customers"
  }
}

Here is the our docker-compose file and now we can up all the containers with docker-compose up -d

version: "3.9"
services:
  oracle:
    image: container-registry.oracle.com/database/enterprise:21.3.0.0
    ports:
      - 1521:1521
    environment:
      - ORACLE_SID=ORCLCDB
      - ORACLE_PDB=ORCLPDB1
      - ORACLE_PWD=Welcome1
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
    ports:
      - 2181:2181
  kafka:
    image: confluentinc/cp-kafka:latest
    user: root
    environment:
      - KAFKA_BROKER_ID=1
      - KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181
      - KAFKA_ADVERTISED_LISTENERS= PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:9093
      - KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      - KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
      - KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1
    ports:
      - 9092:9092
      - 9093:9093
    depends_on: [zookeeper]
  kafka_connect:
    image: debezium/connect:latest
    ports:
      - 8083:8083
    environment:
      - BOOTSTRAP_SERVERS=kafka:9092
      - GROUP_ID=1
      - CONFIG_STORAGE_TOPIC=my_connect_configs
      - OFFSET_STORAGE_TOPIC=my_connect_offsets
      - STATUS_STORAGE_TOPIC=my_connect_statuses
    volumes:
      - ./ojdbc8.jar:/kafka/libs/ojdbc8.jar
      - ./oracle.json:/kafka/oracle.json
    depends_on: [zookeeper, kafka]

When a new container begins and no initial configuration or database exists, the database will be configured and installed inside the container. Client applications can connect to the database after the Oracle Database server signals that the container has begun and the STATUS column displays (healthy)

Configuring Oracle DB

Let’s connect to the Oracle database container through a terminal. We wish to connect to the database container with SQL*Plus so that we may utilize a client that allows us to easily unmount and restart the database. So, in a new terminal, type:

docker exec -it desktop-oracle-1 sqlplus sys as sysdba
Enter password: Welcome1

To set these settings, we’ll use the SQL*Plus terminal window, where we’ve previously connected to the database as user SYS, to run the following SQL queries.

ALTER SYSTEM SET db_recovery_file_dest_size = 10G;
ALTER SYSTEM SET db_recovery_file_dest = '/opt/oracle/oradata/ORCLCDB' scope=spfile;
SHUTDOWN IMMEDIATE
STARTUP MOUNT
ALTER DATABASE ARCHIVELOG;
ALTER DATABASE OPEN;
ARCHIVE LOG LIST; #Please make sure that the Database log mode Archive Mode with this step.
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA; #Database supplemental logging must be set to a minimum level for Debezium to communicate with LogMiner and work with chained rows and varied storage structures.

We are almost finished with configuring. Only the preparation for a new user procedure remains.

docker exec -it desktop-oracle-1 /bin/bash

When using a multi-tenant architecture, such as the Oracle container registry image, we have to deal with two databases: ORCLCDB (the container or root database) and ORCLPDB1 (the pluggable database). All captured tables are created and stored in the PDB, but the connector may occasionally need to contact the root database to read certain system tables.

That’s why we need to create separate tablespaces:

sqlplus /nolog;
CONNECT sys/Welcome1@ORCLCDB as sysdba;
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
CONNECT sys/Welcome1@ORCLPDB1 as sysdba;
CREATE TABLESPACE logminer_tbs DATAFILE '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/logminer_tbs.dbf' SIZE 25M REUSE AUTOEXTEND ON MAXSIZE UNLIMITED;
CONNECT sys/Welcome1@ORCLCDB as sysdba;
CREATE USER c##dbzuser IDENTIFIED BY dbz DEFAULT TABLESPACE LOGMINER_TBS QUOTA UNLIMITED ON LOGMINER_TBS CONTAINER=ALL;

The user account requires several permissions. At the time of publication, the following privileges were on the list:

GRANT CREATE SESSION TO c##dbzuser CONTAINER=ALL;
GRANT SET CONTAINER TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$DATABASE TO c##dbzuser CONTAINER=ALL;
GRANT FLASHBACK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE_CATALOG_ROLE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY TRANSACTION TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ANY DICTIONARY TO c##dbzuser CONTAINER=ALL;
GRANT LOGMINING TO c##dbzuser CONTAINER=ALL;
GRANT CREATE TABLE TO c##dbzuser CONTAINER=ALL;
GRANT LOCK ANY TABLE TO c##dbzuser CONTAINER=ALL;
GRANT CREATE SEQUENCE TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR TO c##dbzuser CONTAINER=ALL;
GRANT EXECUTE ON DBMS_LOGMNR_D TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOG_HISTORY TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_LOGS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_CONTENTS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGMNR_PARAMETERS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$LOGFILE TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVED_LOG TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$ARCHIVE_DEST_STATUS TO c##dbzuser CONTAINER=ALL;
GRANT SELECT ON V_$TRANSACTION TO c##dbzuser CONTAINER=ALL;

Creating a new user is done. Let’s open a new terminal and use SQL*Plus to connect to the database with new user:

docker exec -it -e ORACLE_SID=ORCLPDB1 desktop-oracle-1 sqlplus c##dbzuser@ORCLPDB1
Enter password: dbz

And create a new table with some sample data:

CREATE TABLE customers (id number(9,0) primary key, name varchar2(40));
INSERT INTO customers VALUES (9000, 'MongoDB');
INSERT INTO customers VALUES (9001, 'OracleDB');
INSERT INTO customers VALUES (9002, 'PostgreSQL');
INSERT INTO customers VALUES (9003, 'MariaDB');
INSERT INTO customers VALUES (9004, 'MySQL');
INSERT INTO customers VALUES (9005, 'ClickHouse');
COMMIT;

ALTER TABLE customers ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

We’re done with Oracle side. Just need to deploy Kafka Connect. So we can connect the Kafka Connect container:

docker exec -it desktop-kafka_connect-1 /bin/bash

The above configuration is stored in a file called oracle.json in the /kafka directory. To register the connection with Kafka Connect, open a new terminal window and run the curl command to start worker:

curl -H 'Content-Type: application/json' kafka_connect:8083/connectors --data "@oracle.json"

By using the Kafka console consumer tool and reading the contents of the topic to the local terminal, we can ensure that the data exists in Kafka:

docker exec -it desktop-kafka-1 /bin/bash

Let’s check the topics and topic messages:

/usr/bin/kafka-topics --list  --bootstrap-server kafka:9092
/usr/bin/kafka-console-consumer  --bootstrap-server kafka:9092  --topic dbz.C__DBZUSER.CUSTOMERS --from-beginning

After migrating our data to the Kafka, we can use with ChistaDATA Python Connector to conclude migration process. The ChistaDATA Python Connector, which enables the migration from Kafka to ClickHouse, is the most robust tool in the market in terms of restart issues and supported data types. The most current data that was sent to ClickHouse is known to the ChistaDATA Python Connector. As a consequence, you may be certain that migration will continue where it left off in the event that your machine needs to be restarted.

To use this connector, you can examine this Python container in the docker-compose file mentioned earlier and apply for Oracle database schema information.

An important note

This Oracle DB image will only work on the linux/amd64 architecture platform. Otherwise it may not work. Just to make sure you’re on that. To check this, you can type uname -m in a terminal.

Conclusion

Debezium is a fantastic tool for capturing row-level changes that occur on a database and streaming those changes to a broker of our choosing. Moreover, the ChistaDATA Python Connector is a powerful tool for applying messages to ClickHouse and we have demonstrated this already. Also, it will possible to migrating from MySQL/MariaDB/PostgreSQL/OracleDB/MongoDB to ChistaDATA DBaaS very soon. Stay tuned.

To know more about migrating to ClickHouse, do visit the following articles:

ChistaDATA Inc.

Enterprise-class 24*7 ClickHouse Consultative Support and Managed Services