Cassandra: Why do I not have to include all partition keys in query?
Currently, I am dealing with Cassandra.
While reading a blog post, it is said:
When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)
However, in my database it seems like it possible without including all partition keys. Here the table:
CREATE TABLE usertable (
personid text,
name text,
"timestamp" timestamp,
active boolean,
PRIMARY KEY ((personid, name), timestamp)
) WITH
CLUSTERING ORDER BY ("timestamp" DESC)
AND comment=''
AND read_repair_chance=0
AND dclocal_read_repair_chance=0.1
AND gc_grace_seconds=864000
AND bloom_filter_fp_chance=0.01
AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold':'32',
'min_threshold':'4' }
AND compression={ 'chunk_length_in_kb':'64',
'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
AND caching={ 'keys':'ALL',
'rows_per_partition':'NONE' }
AND default_time_to_live=0
AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
AND min_index_interval=128
AND max_index_interval=2048
AND memtable_flush_period_in_ms=0
AND speculative_retry='99PERCENTILE';
So I can do select * from usertable where personid = 'ABC-02';
. However, according to the blog post, I have to include timestamp
as well.
Can someone explain this?
database cassandra
add a comment |
Currently, I am dealing with Cassandra.
While reading a blog post, it is said:
When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)
However, in my database it seems like it possible without including all partition keys. Here the table:
CREATE TABLE usertable (
personid text,
name text,
"timestamp" timestamp,
active boolean,
PRIMARY KEY ((personid, name), timestamp)
) WITH
CLUSTERING ORDER BY ("timestamp" DESC)
AND comment=''
AND read_repair_chance=0
AND dclocal_read_repair_chance=0.1
AND gc_grace_seconds=864000
AND bloom_filter_fp_chance=0.01
AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold':'32',
'min_threshold':'4' }
AND compression={ 'chunk_length_in_kb':'64',
'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
AND caching={ 'keys':'ALL',
'rows_per_partition':'NONE' }
AND default_time_to_live=0
AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
AND min_index_interval=128
AND max_index_interval=2048
AND memtable_flush_period_in_ms=0
AND speculative_retry='99PERCENTILE';
So I can do select * from usertable where personid = 'ABC-02';
. However, according to the blog post, I have to include timestamp
as well.
Can someone explain this?
database cassandra
add a comment |
Currently, I am dealing with Cassandra.
While reading a blog post, it is said:
When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)
However, in my database it seems like it possible without including all partition keys. Here the table:
CREATE TABLE usertable (
personid text,
name text,
"timestamp" timestamp,
active boolean,
PRIMARY KEY ((personid, name), timestamp)
) WITH
CLUSTERING ORDER BY ("timestamp" DESC)
AND comment=''
AND read_repair_chance=0
AND dclocal_read_repair_chance=0.1
AND gc_grace_seconds=864000
AND bloom_filter_fp_chance=0.01
AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold':'32',
'min_threshold':'4' }
AND compression={ 'chunk_length_in_kb':'64',
'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
AND caching={ 'keys':'ALL',
'rows_per_partition':'NONE' }
AND default_time_to_live=0
AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
AND min_index_interval=128
AND max_index_interval=2048
AND memtable_flush_period_in_ms=0
AND speculative_retry='99PERCENTILE';
So I can do select * from usertable where personid = 'ABC-02';
. However, according to the blog post, I have to include timestamp
as well.
Can someone explain this?
database cassandra
Currently, I am dealing with Cassandra.
While reading a blog post, it is said:
When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)
However, in my database it seems like it possible without including all partition keys. Here the table:
CREATE TABLE usertable (
personid text,
name text,
"timestamp" timestamp,
active boolean,
PRIMARY KEY ((personid, name), timestamp)
) WITH
CLUSTERING ORDER BY ("timestamp" DESC)
AND comment=''
AND read_repair_chance=0
AND dclocal_read_repair_chance=0.1
AND gc_grace_seconds=864000
AND bloom_filter_fp_chance=0.01
AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold':'32',
'min_threshold':'4' }
AND compression={ 'chunk_length_in_kb':'64',
'class':'org.apache.cassandra.io.compress.LZ4Compressor' }
AND caching={ 'keys':'ALL',
'rows_per_partition':'NONE' }
AND default_time_to_live=0
AND id='23ff16b0-c400-11e8-55c7-2b453518a213'
AND min_index_interval=128
AND max_index_interval=2048
AND memtable_flush_period_in_ms=0
AND speculative_retry='99PERCENTILE';
So I can do select * from usertable where personid = 'ABC-02';
. However, according to the blog post, I have to include timestamp
as well.
Can someone explain this?
database cassandra
database cassandra
asked Nov 28 '18 at 8:21
kola32_kola32_
121
121
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.
One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.
For further information take a look at blog post:
The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.
Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.
Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.
If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :
‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53514995%2fcassandra-why-do-i-not-have-to-include-all-partition-keys-in-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.
One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.
For further information take a look at blog post:
The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.
Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.
Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.
If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :
‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
add a comment |
In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.
One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.
For further information take a look at blog post:
The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.
Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.
Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.
If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :
‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
add a comment |
In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.
One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.
For further information take a look at blog post:
The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.
Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.
Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.
If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :
‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.
One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.
For further information take a look at blog post:
The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.
Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.
Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.
If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :
‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘
https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/
https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause
answered Nov 28 '18 at 8:46
Emre SavcıEmre Savcı
2,0961820
2,0961820
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53514995%2fcassandra-why-do-i-not-have-to-include-all-partition-keys-in-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown