Why we need to provide a shard section using the internal replication configuration parameter.
Does it make sense if you only have instances in multiple data centres with high latency? In this situation, using ZooKeeper to maintain replication can be challenging. When we actually enter data into a distributed table, the information is written to all underlying copies. However, you should generally avoid using this default setting. Here’s why:
Replicas are not guaranteed to be consistent and can drift apart over time. Also, data is not copied when you place it in the underlying table, such as in MergeTree. Also, if you need to add a new replica, you need to consider how to load data into a new replica. The distributed table puts data into all copies, whereas the underlying replicated tables also replicate data, so if you have internal_replication = false and the replicated table, you will get duplicate data.
internal_replication = true
Only one of the underlying table copies receives data inserts from the distributed table; the remaining replicas receive data replications from the replicated table. If you want good consistency, you should only use replicated tables. When configuring replication in ClickHouse, it is strongly recommended to set internal_replication = true. Note that the default value of this parameter is false.