How to avoid duplicates in clickhouse table?












0














I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



select * from sample.tmp_api_logs;
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘
┌─id─┬──EventDate─┐
│ 1 │ 2018-11-23 │
│ 2 │ 2018-11-23 │
└────┴────────────┘










share|improve this question



























    0














    I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



    CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



    insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
    insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



    select * from sample.tmp_api_logs;
    ┌─id─┬──EventDate─┐
    │ 1 │ 2018-11-23 │
    │ 2 │ 2018-11-23 │
    └────┴────────────┘
    ┌─id─┬──EventDate─┐
    │ 1 │ 2018-11-23 │
    │ 2 │ 2018-11-23 │
    └────┴────────────┘










    share|improve this question

























      0












      0








      0







      I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



      CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



      select * from sample.tmp_api_logs;
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘










      share|improve this question













      I have created table and trying to insert the values multiple time to check the duplicates. I can see duplicates are inserting. Is there a way to avoid duplicates in clickhouse table?



      CREATE TABLE sample.tmp_api_logs ( id UInt32, EventDate Date) ENGINE = MergeTree(EventDate, id, (EventDate,id), 8192);



      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');
      insert into sample.tmp_api_logs values(1,'2018-11-23'),(2,'2018-11-23');



      select * from sample.tmp_api_logs;
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘
      ┌─id─┬──EventDate─┐
      │ 1 │ 2018-11-23 │
      │ 2 │ 2018-11-23 │
      └────┴────────────┘







      clickhouse






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 at 7:47









      user3383468

      111




      111
























          2 Answers
          2






          active

          oldest

          votes


















          0














          Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






          share|improve this answer





























            0














            If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



            Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






              share|improve this answer


























                0














                Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






                share|improve this answer
























                  0












                  0








                  0






                  Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.






                  share|improve this answer












                  Most likely ReplacingMergeTree is what you need as long as duplicate records duplicate primary keys. You can also try out other MergeTree engines for more actions when replicate record is encountered. FINAL keyword can be used when doing queries to ensure uniquity.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 23 at 13:37









                  Amos

                  1,40421029




                  1,40421029

























                      0














                      If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                      Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                      share|improve this answer


























                        0














                        If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                        Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                        share|improve this answer
























                          0












                          0








                          0






                          If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                          Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.






                          share|improve this answer












                          If raw data does not contain duplicates and they might appear only during retries of INSERT INTO, there's a deduplication feature in ReplicatedMergeTree. To make it work you should retry inserts of exactly the same batches of data (same set of rows in same order). You can use different replica for these retries and data block will still be inserted only once as block hashes are shared between replicas via ZooKeeper.



                          Otherwise, you should deduplicate data externally before inserts to ClickHouse or clean up duplicates asynchronously with ReplacingMergeTree or ReplicatedReplacingMergeTree.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 10 at 8:48









                          Ivan Blinkov

                          1,6081016




                          1,6081016






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53442559%2fhow-to-avoid-duplicates-in-clickhouse-table%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                              Calculate evaluation metrics using cross_val_predict sklearn

                              Insert data from modal to MySQL (multiple modal on website)