Cassandra: Why do I not have to include all partition keys in query?












0















Currently, I am dealing with Cassandra.



While reading a blog post, it is said:




When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)




However, in my database it seems like it possible without including all partition keys. Here the table:



CREATE TABLE usertable (
personid text,
name text,
"timestamp" timestamp,
active boolean,
PRIMARY KEY ((personid, name), timestamp)
) WITH
CLUSTERING ORDER BY ("timestamp" DESC)
AND comment=''
AND read_repair_chance=0
AND dclocal_read_repair_chance=0.1
AND gc_grace_seconds=864000
AND bloom_filter_fp_chance=0.01
AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold':'32',
'min_threshold':'4' }
AND compression={ 'chunk_length_in_kb':'64',
'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
AND caching={ 'keys':'ALL',
'rows_per_partition':'NONE' }
AND default_time_to_live=0
AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
AND min_index_interval=128
AND max_index_interval=2048
AND memtable_flush_period_in_ms=0
AND speculative_retry='99PERCENTILE';


So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.



Can someone explain this?










share|improve this question



























    0















    Currently, I am dealing with Cassandra.



    While reading a blog post, it is said:




    When issuing a CQL query, you must include all partition key columns,
    at a minimum.
    (https://shermandigital.com/blog/designing-a-cassandra-data-model/)




    However, in my database it seems like it possible without including all partition keys. Here the table:



    CREATE TABLE usertable (
    personid text,
    name text,
    "timestamp" timestamp,
    active boolean,
    PRIMARY KEY ((personid, name), timestamp)
    ) WITH
    CLUSTERING ORDER BY ("timestamp" DESC)
    AND comment=''
    AND read_repair_chance=0
    AND dclocal_read_repair_chance=0.1
    AND gc_grace_seconds=864000
    AND bloom_filter_fp_chance=0.01
    AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
    'max_threshold':'32',
    'min_threshold':'4' }
    AND compression={ 'chunk_length_in_kb':'64',
    'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
    AND caching={ 'keys':'ALL',
    'rows_per_partition':'NONE' }
    AND default_time_to_live=0
    AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
    AND min_index_interval=128
    AND max_index_interval=2048
    AND memtable_flush_period_in_ms=0
    AND speculative_retry='99PERCENTILE';


    So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.



    Can someone explain this?










    share|improve this question

























      0












      0








      0








      Currently, I am dealing with Cassandra.



      While reading a blog post, it is said:




      When issuing a CQL query, you must include all partition key columns,
      at a minimum.
      (https://shermandigital.com/blog/designing-a-cassandra-data-model/)




      However, in my database it seems like it possible without including all partition keys. Here the table:



      CREATE TABLE usertable (
      personid text,
      name text,
      "timestamp" timestamp,
      active boolean,
      PRIMARY KEY ((personid, name), timestamp)
      ) WITH
      CLUSTERING ORDER BY ("timestamp" DESC)
      AND comment=''
      AND read_repair_chance=0
      AND dclocal_read_repair_chance=0.1
      AND gc_grace_seconds=864000
      AND bloom_filter_fp_chance=0.01
      AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
      'max_threshold':'32',
      'min_threshold':'4' }
      AND compression={ 'chunk_length_in_kb':'64',
      'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
      AND caching={ 'keys':'ALL',
      'rows_per_partition':'NONE' }
      AND default_time_to_live=0
      AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
      AND min_index_interval=128
      AND max_index_interval=2048
      AND memtable_flush_period_in_ms=0
      AND speculative_retry='99PERCENTILE';


      So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.



      Can someone explain this?










      share|improve this question














      Currently, I am dealing with Cassandra.



      While reading a blog post, it is said:




      When issuing a CQL query, you must include all partition key columns,
      at a minimum.
      (https://shermandigital.com/blog/designing-a-cassandra-data-model/)




      However, in my database it seems like it possible without including all partition keys. Here the table:



      CREATE TABLE usertable (
      personid text,
      name text,
      "timestamp" timestamp,
      active boolean,
      PRIMARY KEY ((personid, name), timestamp)
      ) WITH
      CLUSTERING ORDER BY ("timestamp" DESC)
      AND comment=''
      AND read_repair_chance=0
      AND dclocal_read_repair_chance=0.1
      AND gc_grace_seconds=864000
      AND bloom_filter_fp_chance=0.01
      AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
      'max_threshold':'32',
      'min_threshold':'4' }
      AND compression={ 'chunk_length_in_kb':'64',
      'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
      AND caching={ 'keys':'ALL',
      'rows_per_partition':'NONE' }
      AND default_time_to_live=0
      AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
      AND min_index_interval=128
      AND max_index_interval=2048
      AND memtable_flush_period_in_ms=0
      AND speculative_retry='99PERCENTILE';


      So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.



      Can someone explain this?







      database cassandra






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 28 '18 at 8:21









      kola32_kola32_

      121




      121
























          1 Answer
          1






          active

          oldest

          votes


















          1














          In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.



          One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.



          For further information take a look at blog post:




          The purpose of a partition key is to split the data into partitions
          where an entire partition is stored on a single node in the cluster
          (with each node storing many partitions). When data is read or written
          from the cluster, a function called Partitioner is used to compute the
          hash value of the partition key. This hash value is used to determine
          the node/partition which contains that row. The clustering key is used
          further to search for a row within a given partition.



          Select queries in Apache Cassandra look a lot like select queries from
          a relational database. However, they are significantly more
          restricted. The attributes allowed in ‘where’ clause of Cassandra
          query must include the full partition key and additional clauses may
          only reference the clustering key columns or a secondary index of the
          table being queried.



          Requiring the partition key attributes in the ‘where’ helps Cassandra
          to maintain constant result-set retrieval time as the cluster is
          scaled-out by allowing Cassandra to determine the partition, and thus
          the node (and even data files on disk), that the query must be
          directed to.



          If a query does not specify the values for all the columns from the
          primary key in the ‘where’ clause, Cassandra will not execute it and
          give the following warning :



          ‘InvalidRequest: Error from server: code=2200 [Invalid query]
          message=”Cannot execute this query as it might involve data filtering
          and thus may have unpredictable performance. If you want to execute
          this query despite the performance unpredictability, use ALLOW
          FILTERING” ‘




          https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/



          https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53514995%2fcassandra-why-do-i-not-have-to-include-all-partition-keys-in-query%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.



            One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.



            For further information take a look at blog post:




            The purpose of a partition key is to split the data into partitions
            where an entire partition is stored on a single node in the cluster
            (with each node storing many partitions). When data is read or written
            from the cluster, a function called Partitioner is used to compute the
            hash value of the partition key. This hash value is used to determine
            the node/partition which contains that row. The clustering key is used
            further to search for a row within a given partition.



            Select queries in Apache Cassandra look a lot like select queries from
            a relational database. However, they are significantly more
            restricted. The attributes allowed in ‘where’ clause of Cassandra
            query must include the full partition key and additional clauses may
            only reference the clustering key columns or a secondary index of the
            table being queried.



            Requiring the partition key attributes in the ‘where’ helps Cassandra
            to maintain constant result-set retrieval time as the cluster is
            scaled-out by allowing Cassandra to determine the partition, and thus
            the node (and even data files on disk), that the query must be
            directed to.



            If a query does not specify the values for all the columns from the
            primary key in the ‘where’ clause, Cassandra will not execute it and
            give the following warning :



            ‘InvalidRequest: Error from server: code=2200 [Invalid query]
            message=”Cannot execute this query as it might involve data filtering
            and thus may have unpredictable performance. If you want to execute
            this query despite the performance unpredictability, use ALLOW
            FILTERING” ‘




            https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/



            https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause






            share|improve this answer




























              1














              In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.



              One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.



              For further information take a look at blog post:




              The purpose of a partition key is to split the data into partitions
              where an entire partition is stored on a single node in the cluster
              (with each node storing many partitions). When data is read or written
              from the cluster, a function called Partitioner is used to compute the
              hash value of the partition key. This hash value is used to determine
              the node/partition which contains that row. The clustering key is used
              further to search for a row within a given partition.



              Select queries in Apache Cassandra look a lot like select queries from
              a relational database. However, they are significantly more
              restricted. The attributes allowed in ‘where’ clause of Cassandra
              query must include the full partition key and additional clauses may
              only reference the clustering key columns or a secondary index of the
              table being queried.



              Requiring the partition key attributes in the ‘where’ helps Cassandra
              to maintain constant result-set retrieval time as the cluster is
              scaled-out by allowing Cassandra to determine the partition, and thus
              the node (and even data files on disk), that the query must be
              directed to.



              If a query does not specify the values for all the columns from the
              primary key in the ‘where’ clause, Cassandra will not execute it and
              give the following warning :



              ‘InvalidRequest: Error from server: code=2200 [Invalid query]
              message=”Cannot execute this query as it might involve data filtering
              and thus may have unpredictable performance. If you want to execute
              this query despite the performance unpredictability, use ALLOW
              FILTERING” ‘




              https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/



              https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause






              share|improve this answer


























                1












                1








                1







                In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.



                One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.



                For further information take a look at blog post:




                The purpose of a partition key is to split the data into partitions
                where an entire partition is stored on a single node in the cluster
                (with each node storing many partitions). When data is read or written
                from the cluster, a function called Partitioner is used to compute the
                hash value of the partition key. This hash value is used to determine
                the node/partition which contains that row. The clustering key is used
                further to search for a row within a given partition.



                Select queries in Apache Cassandra look a lot like select queries from
                a relational database. However, they are significantly more
                restricted. The attributes allowed in ‘where’ clause of Cassandra
                query must include the full partition key and additional clauses may
                only reference the clustering key columns or a secondary index of the
                table being queried.



                Requiring the partition key attributes in the ‘where’ helps Cassandra
                to maintain constant result-set retrieval time as the cluster is
                scaled-out by allowing Cassandra to determine the partition, and thus
                the node (and even data files on disk), that the query must be
                directed to.



                If a query does not specify the values for all the columns from the
                primary key in the ‘where’ clause, Cassandra will not execute it and
                give the following warning :



                ‘InvalidRequest: Error from server: code=2200 [Invalid query]
                message=”Cannot execute this query as it might involve data filtering
                and thus may have unpredictable performance. If you want to execute
                this query despite the performance unpredictability, use ALLOW
                FILTERING” ‘




                https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/



                https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause






                share|improve this answer













                In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.



                One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.



                For further information take a look at blog post:




                The purpose of a partition key is to split the data into partitions
                where an entire partition is stored on a single node in the cluster
                (with each node storing many partitions). When data is read or written
                from the cluster, a function called Partitioner is used to compute the
                hash value of the partition key. This hash value is used to determine
                the node/partition which contains that row. The clustering key is used
                further to search for a row within a given partition.



                Select queries in Apache Cassandra look a lot like select queries from
                a relational database. However, they are significantly more
                restricted. The attributes allowed in ‘where’ clause of Cassandra
                query must include the full partition key and additional clauses may
                only reference the clustering key columns or a secondary index of the
                table being queried.



                Requiring the partition key attributes in the ‘where’ helps Cassandra
                to maintain constant result-set retrieval time as the cluster is
                scaled-out by allowing Cassandra to determine the partition, and thus
                the node (and even data files on disk), that the query must be
                directed to.



                If a query does not specify the values for all the columns from the
                primary key in the ‘where’ clause, Cassandra will not execute it and
                give the following warning :



                ‘InvalidRequest: Error from server: code=2200 [Invalid query]
                message=”Cannot execute this query as it might involve data filtering
                and thus may have unpredictable performance. If you want to execute
                this query despite the performance unpredictability, use ALLOW
                FILTERING” ‘




                https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/



                https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 28 '18 at 8:46









                Emre SavcıEmre Savcı

                2,0961820




                2,0961820
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53514995%2fcassandra-why-do-i-not-have-to-include-all-partition-keys-in-query%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

                    Calculate evaluation metrics using cross_val_predict sklearn

                    Insert data from modal to MySQL (multiple modal on website)