Skip to main content
Skip to main content

Technical Reference

Setup details

User and role management

Consider not using the default user; instead, create a dedicated one to use it with this Fivetran destination only. The following commands, executed with the default user, will create a new fivetran_user with the required privileges.

CREATE USER fivetran_user IDENTIFIED BY '<password>'; -- use a secure password generator

GRANT CURRENT GRANTS ON *.* TO fivetran_user;

Additionally, you can revoke access to certain databases from the fivetran_user. For example, by executing the following statement, we restrict access to the default database:

REVOKE ALL ON default.* FROM fivetran_user;

You can execute these statements in the ClickHouse SQL console.

Advanced configuration

The ClickHouse Cloud destination supports an optional JSON configuration file for advanced use cases. This file allows you to fine-tune destination behavior by overriding the default settings that control batch sizes, parallelism, connection pools, and request timeouts.

Note

This configuration is entirely optional. If no file is uploaded, the destination uses sensible defaults that work well for most use cases.

The file must be valid JSON and conform to the schema described below.

If you need to modify the configuration after the initial setup, you can edit the destination configurations in the Fivetran dashboard and upload an updated file.

The configuration file has a top-level section:

{
  "destination_configurations": { ... }
}

Inside of it you can specify the following configurations that control the internal behavior of the ClickHouse destination connector itself. These configurations affect how the connector processes data before sending it to ClickHouse.

SettingTypeDefaultAllowed RangeDescription
write_batch_sizeinteger1000005,000 – 100,000Number of rows per batch for insert, update, and replace operations.
select_batch_sizeinteger1500200 – 1,500Number of rows per batch for SELECT queries used during updates.
mutation_batch_sizeinteger1500200 – 1,500Number of rows per batch for ALTER TABLE UPDATE mutations in history mode. Lower it if you are experiencing large SQL statements.
hard_delete_batch_sizeinteger1500200 – 1,500Number of rows per batch for hard delete operations in history mode. Lower it if you are experiencing large SQL statements.

All fields are optional. If a field is not specified, the default value is used. If a value is outside the allowed range, the destination will report an error during sync. Unknown fields are silently ignored (a warning is logged) and do not cause errors, which allows forward compatibility when new settings are added.

Example:

{
  "destination_configurations": {
    "write_batch_size": 50000,
    "select_batch_size": 200
  }
}

Type transformation mapping

The Fivetran ClickHouse destination maps Fivetran data types to ClickHouse types as follows:

Fivetran typeClickHouse type
BOOLEANBool
SHORTInt16
INTInt32
LONGInt64
BIGDECIMALDecimal(P, S)
FLOATFloat32
DOUBLEFloat64
LOCALDATEDate
LOCALDATETIMEDateTime
INSTANTDateTime64(9, 'UTC')
STRINGString
BINARYString *
XMLString *
JSONString *
Note

* BINARY, XML, and JSON are stored as String because ClickHouse's String type can represent an arbitrary set of bytes. The destination adds a column comment to indicate the original data type. The ClickHouse JSON data type is not used as it was marked as obsolete and never recommended for production usage.

Destination tables

The ClickHouse Cloud destination uses Replacing engine type of SharedMergeTree family (specifically, SharedReplacingMergeTree), versioned by the _fivetran_synced column.

Every column except primary (ordering) keys and Fivetran metadata columns is created as Nullable(T), where T is a ClickHouse Cloud type based on the data types mapping.

Every destination table includes the following metadata columns:

ColumnTypeDescription
_fivetran_syncedDateTime64(9, 'UTC')Timestamp when the record was synced by Fivetran. Used as the version column for SharedReplacingMergeTree.
_fivetran_deletedBoolSoft delete marker. Set to true when the source record is deleted.
_fivetran_idStringAuto-generated unique identifier. Only present when the source table has no primary keys.

Single primary key in the source table

For example, source table users has a primary key column id (INT) and a regular column name (STRING). The destination table will be defined as follows:

CREATE TABLE `users`
(
    `id`                Int32,
    `name`              Nullable(String),
    `_fivetran_synced`  DateTime64(9, 'UTC'),
    `_fivetran_deleted` Bool
) ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}', _fivetran_synced)
ORDER BY id
SETTINGS index_granularity = 8192

In this case, the id column is chosen as a table sorting key.

Multiple primary keys in the source table

If the source table has multiple primary keys, they are used in order of their appearance in the Fivetran source table definition.

For example, there is a source table items with primary key columns id (INT) and name (STRING), plus an additional regular column description (STRING). The destination table will be defined as follows:

CREATE TABLE `items`
(
    `id`                Int32,
    `name`              String,
    `description`       Nullable(String),
    `_fivetran_synced`  DateTime64(9, 'UTC'),
    `_fivetran_deleted` Bool
) ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}', _fivetran_synced)
ORDER BY (id, name)
SETTINGS index_granularity = 8192

In this case, id and name columns are chosen as table sorting keys.

No primary keys in the source table

If the source table has no primary keys, a unique identifier will be added by Fivetran as a _fivetran_id column. Consider an events table that only has the event (STRING) and timestamp (LOCALDATETIME) columns in the source. The destination table in that case is as follows:

CREATE TABLE events
(
    `event`             Nullable(String),
    `timestamp`         Nullable(DateTime),
    `_fivetran_id`      String,
    `_fivetran_synced`  DateTime64(9, 'UTC'),
    `_fivetran_deleted` Bool
) ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}', _fivetran_synced)
ORDER BY _fivetran_id
SETTINGS index_granularity = 8192

Since _fivetran_id is unique and there are no other primary key options, it is used as a table sorting key.

Selecting the latest version of the data without duplicates

SharedReplacingMergeTree performs background data deduplication only during merges at an unknown time. However, selecting the latest version of the data without duplicates ad-hoc is possible with the FINAL keyword and select_sequential_consistency setting:

SELECT *
FROM example FINAL
LIMIT 1000 
SETTINGS select_sequential_consistency = 1;

See also Duplicate records with ReplacingMergeTree in the troubleshooting guide.

Retries on network failures

The ClickHouse Cloud destination retries transient network errors using the exponential backoff algorithm. This is safe even when the destination inserts the data, as any potential duplicates are handled by the SharedReplacingMergeTree table engine, either during background merges, or when querying the data with SELECT FINAL.