Spark drivers predicate pushdown
up vote
0
down vote
favorite
To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.
Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?
apache-spark
add a comment |
up vote
0
down vote
favorite
To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.
Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?
apache-spark
1
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
1
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set totrue, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
– user6910411
Nov 22 at 14:04
2
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.
Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?
apache-spark
To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.
Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?
apache-spark
apache-spark
asked Nov 22 at 13:31
user976850
4361615
4361615
1
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
1
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set totrue, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
– user6910411
Nov 22 at 14:04
2
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59
add a comment |
1
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
1
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set totrue, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
– user6910411
Nov 22 at 14:04
2
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59
1
1
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
1
1
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to
true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).– user6910411
Nov 22 at 14:04
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to
true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).– user6910411
Nov 22 at 14:04
2
2
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432125%2fspark-drivers-predicate-pushdown%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432125%2fspark-drivers-predicate-pushdown%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45
Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58
1
For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to
true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).– user6910411
Nov 22 at 14:04
2
In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09
I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59