How to avoid duplicates in clickhouse table?
I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?
CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
clickhouse
add a comment |
I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?
CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
clickhouse
add a comment |
I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?
CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
clickhouse
I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?
CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
clickhouse
clickhouse
asked Nov 23 at 7:47
user3383468
111
111
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Most likely ReplacingMergeTree
is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree
engines for more actions when replicate record is encountered. FINAL
keyword can be used when doing queries to ensure uniquity.
add a comment |
If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.
Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Most likely ReplacingMergeTree
is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree
engines for more actions when replicate record is encountered. FINAL
keyword can be used when doing queries to ensure uniquity.
add a comment |
Most likely ReplacingMergeTree
is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree
engines for more actions when replicate record is encountered. FINAL
keyword can be used when doing queries to ensure uniquity.
add a comment |
Most likely ReplacingMergeTree
is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree
engines for more actions when replicate record is encountered. FINAL
keyword can be used when doing queries to ensure uniquity.
Most likely ReplacingMergeTree
is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree
engines for more actions when replicate record is encountered. FINAL
keyword can be used when doing queries to ensure uniquity.
answered Nov 23 at 13:37
Amos
1,40421029
1,40421029
add a comment |
add a comment |
If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.
Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.
add a comment |
If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.
Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.
add a comment |
If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.
Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.
If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.
Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.
answered Dec 10 at 8:48
Ivan Blinkov
1,6081016
1,6081016
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown