Cassandra: Why do I not have to include all partition keys in query?

Currently, I am dealing with Cassandra.

While reading a blog post, it is said:

When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)

However, in my database it seems like it possible without including all partition keys. Here the table:

CREATE TABLE usertable (

    personid text,

    name text,

    "timestamp" timestamp,

    active boolean,

    PRIMARY KEY ((personid, name), timestamp)

) WITH

  CLUSTERING ORDER BY ("timestamp" DESC)

  AND comment=''

  AND read_repair_chance=0

  AND dclocal_read_repair_chance=0.1

  AND gc_grace_seconds=864000

  AND bloom_filter_fp_chance=0.01

  AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',

  'max_threshold':'32',

  'min_threshold':'4' }

  AND compression={ 'chunk_length_in_kb':'64',

  'class':'org.apache.cassandra.io.compress.LZ4Compressor' }

  AND caching={ 'keys':'ALL',

  'rows_per_partition':'NONE' }

  AND default_time_to_live=0

  AND id='23ff16b0-c400-11e8-55c7-2b453518a213'

  AND min_index_interval=128

  AND max_index_interval=2048

  AND memtable_flush_period_in_ms=0

  AND speculative_retry='99PERCENTILE';

So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.

Can someone explain this?

asked Nov 28 '18 at 8:21

kola32_

121

add a comment |

Currently, I am dealing with Cassandra.

While reading a blog post, it is said:

When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)

However, in my database it seems like it possible without including all partition keys. Here the table:

CREATE TABLE usertable (

    personid text,

    name text,

    "timestamp" timestamp,

    active boolean,

    PRIMARY KEY ((personid, name), timestamp)

) WITH

  CLUSTERING ORDER BY ("timestamp" DESC)

  AND comment=''

  AND read_repair_chance=0

  AND dclocal_read_repair_chance=0.1

  AND gc_grace_seconds=864000

  AND bloom_filter_fp_chance=0.01

  AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',

  'max_threshold':'32',

  'min_threshold':'4' }

  AND compression={ 'chunk_length_in_kb':'64',

  'class':'org.apache.cassandra.io.compress.LZ4Compressor' }

  AND caching={ 'keys':'ALL',

  'rows_per_partition':'NONE' }

  AND default_time_to_live=0

  AND id='23ff16b0-c400-11e8-55c7-2b453518a213'

  AND min_index_interval=128

  AND max_index_interval=2048

  AND memtable_flush_period_in_ms=0

  AND speculative_retry='99PERCENTILE';

So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.

Can someone explain this?

asked Nov 28 '18 at 8:21

kola32_

121

add a comment |

Currently, I am dealing with Cassandra.

While reading a blog post, it is said:

When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)

However, in my database it seems like it possible without including all partition keys. Here the table:

CREATE TABLE usertable (

    personid text,

    name text,

    "timestamp" timestamp,

    active boolean,

    PRIMARY KEY ((personid, name), timestamp)

) WITH

  CLUSTERING ORDER BY ("timestamp" DESC)

  AND comment=''

  AND read_repair_chance=0

  AND dclocal_read_repair_chance=0.1

  AND gc_grace_seconds=864000

  AND bloom_filter_fp_chance=0.01

  AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',

  'max_threshold':'32',

  'min_threshold':'4' }

  AND compression={ 'chunk_length_in_kb':'64',

  'class':'org.apache.cassandra.io.compress.LZ4Compressor' }

  AND caching={ 'keys':'ALL',

  'rows_per_partition':'NONE' }

  AND default_time_to_live=0

  AND id='23ff16b0-c400-11e8-55c7-2b453518a213'

  AND min_index_interval=128

  AND max_index_interval=2048

  AND memtable_flush_period_in_ms=0

  AND speculative_retry='99PERCENTILE';

So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.

Can someone explain this?

asked Nov 28 '18 at 8:21

kola32_

121

Currently, I am dealing with Cassandra.

While reading a blog post, it is said:

When issuing a CQL query, you must include all partition key columns,
at a minimum.
(https://shermandigital.com/blog/designing-a-cassandra-data-model/)

However, in my database it seems like it possible without including all partition keys. Here the table:

CREATE TABLE usertable (

    personid text,

    name text,

    "timestamp" timestamp,

    active boolean,

    PRIMARY KEY ((personid, name), timestamp)

) WITH

  CLUSTERING ORDER BY ("timestamp" DESC)

  AND comment=''

  AND read_repair_chance=0

  AND dclocal_read_repair_chance=0.1

  AND gc_grace_seconds=864000

  AND bloom_filter_fp_chance=0.01

  AND compaction={ 'class':'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',

  'max_threshold':'32',

  'min_threshold':'4' }

  AND compression={ 'chunk_length_in_kb':'64',

  'class':'org.apache.cassandra.io.compress.LZ4Compressor' }

  AND caching={ 'keys':'ALL',

  'rows_per_partition':'NONE' }

  AND default_time_to_live=0

  AND id='23ff16b0-c400-11e8-55c7-2b453518a213'

  AND min_index_interval=128

  AND max_index_interval=2048

  AND memtable_flush_period_in_ms=0

  AND speculative_retry='99PERCENTILE';

So I can do select * from usertable where personid = 'ABC-02';. However, according to the blog post, I have to include timestamp as well.

Can someone explain this?

database cassandra

asked Nov 28 '18 at 8:21

kola32_

121

asked Nov 28 '18 at 8:21

kola32_

121

asked Nov 28 '18 at 8:21

kola32_

121

asked Nov 28 '18 at 8:21

kola32_

121

asked Nov 28 '18 at 8:21

kola32_

121

add a comment |

1 Answer
1

active

oldest

votes

In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.

One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.

For further information take a look at blog post:

The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.

Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.

Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.

If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :

‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53514995%2fcassandra-why-do-i-not-have-to-include-all-partition-keys-in-query%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.

One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.

For further information take a look at blog post:

The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.

Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.

Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.

If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :

‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

add a comment |

In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.

One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.

For further information take a look at blog post:

The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.

Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.

Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.

If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :

‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

add a comment |

In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.

One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.

For further information take a look at blog post:

The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.

Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.

Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.

If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :

‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

In cassandra, partition key spreads data around cluster. It computes the hash of partition key and determine the location of data in the cluster.

One exception is, if you use ALLOW FILTERING or secondary index it does not require you too include all partition keys in where query.

For further information take a look at blog post:

The purpose of a partition key is to split the data into partitions
where an entire partition is stored on a single node in the cluster
(with each node storing many partitions). When data is read or written
from the cluster, a function called Partitioner is used to compute the
hash value of the partition key. This hash value is used to determine
the node/partition which contains that row. The clustering key is used
further to search for a row within a given partition.

Select queries in Apache Cassandra look a lot like select queries from
a relational database. However, they are significantly more
restricted. The attributes allowed in ‘where’ clause of Cassandra
query must include the full partition key and additional clauses may
only reference the clustering key columns or a secondary index of the
table being queried.

Requiring the partition key attributes in the ‘where’ helps Cassandra
to maintain constant result-set retrieval time as the cluster is
scaled-out by allowing Cassandra to determine the partition, and thus
the node (and even data files on disk), that the query must be
directed to.

If a query does not specify the values for all the columns from the
primary key in the ‘where’ clause, Cassandra will not execute it and
give the following warning :

‘InvalidRequest: Error from server: code=2200 [Invalid query]
message=”Cannot execute this query as it might involve data filtering
and thus may have unpredictable performance. If you want to execute
this query despite the performance unpredictability, use ALLOW
FILTERING” ‘

https://www.instaclustr.com/apache-cassandra-scalability-allow-filtering-partition-keys/

https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

answered Nov 28 '18 at 8:46

Emre Savcı

2,0961820

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Btukfyl