Spark drivers predicate pushdown











up vote
0
down vote

favorite
1












To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.



Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?










share|improve this question


















  • 1




    There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
    – user6910411
    Nov 22 at 13:45












  • Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
    – user976850
    Nov 22 at 13:58








  • 1




    For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
    – user6910411
    Nov 22 at 14:04








  • 2




    In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
    – user6910411
    Nov 22 at 14:09










  • I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
    – eliasah
    Nov 22 at 15:59















up vote
0
down vote

favorite
1












To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.



Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?










share|improve this question


















  • 1




    There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
    – user6910411
    Nov 22 at 13:45












  • Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
    – user976850
    Nov 22 at 13:58








  • 1




    For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
    – user6910411
    Nov 22 at 14:04








  • 2




    In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
    – user6910411
    Nov 22 at 14:09










  • I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
    – eliasah
    Nov 22 at 15:59













up vote
0
down vote

favorite
1









up vote
0
down vote

favorite
1






1





To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.



Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?










share|improve this question













To my understanding, when using Spark to read via JDBC, applying a where clause on the result (lazy) DataFrame and then collecting will push the where predicate to the database query itself.



Does the same happen with, say, the ElasticSearch driver? Does it merge my query with additional where predicates? If not, will the where clause filtering happen on the JVM even if I use PySpark for example?







apache-spark






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 22 at 13:31









user976850

4361615




4361615








  • 1




    There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
    – user6910411
    Nov 22 at 13:45












  • Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
    – user976850
    Nov 22 at 13:58








  • 1




    For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
    – user6910411
    Nov 22 at 14:04








  • 2




    In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
    – user6910411
    Nov 22 at 14:09










  • I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
    – eliasah
    Nov 22 at 15:59














  • 1




    There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
    – user6910411
    Nov 22 at 13:45












  • Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
    – user976850
    Nov 22 at 13:58








  • 1




    For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
    – user6910411
    Nov 22 at 14:04








  • 2




    In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
    – user6910411
    Nov 22 at 14:09










  • I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
    – eliasah
    Nov 22 at 15:59








1




1




There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45






There is no universal answer to your question. This will depend on the source and it's properties, API version used to implement source, type of a predicate, external types used (if any) and the rest of the pipeline. With all these pieces the only answer you can get is maybe, maybe not (of course the same thing applies to JDBC source).
– user6910411
Nov 22 at 13:45














Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58






Let me rephrase then: 1) Can it ever happen with the ElasticSearch driver? 2) If not, will filtering always be at least on the JVM level and never on the Python level?
– user976850
Nov 22 at 13:58






1




1




For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
– user6910411
Nov 22 at 14:04






For the latter one the answer is always positive. For the former one, it might, if pushdown option is set to true, although I don't know what are the practical limitations, not accounting for standard things like caching and unexpected type conversion (you can check Push-Down operations section of its docs to see what operations are theoretically supported).
– user6910411
Nov 22 at 14:04






2




2




In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09




In general, if you are interested in ES source, I would recommend rewriting your question, though I fairly confident there is a duplicate somewhere around, and if you want to be sure, it is easier to check the logs anyway :)
– user6910411
Nov 22 at 14:09












I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59




I concur with @user6910411. You ought rewriting your question. Beside what have already been said in comments, the rest remains quite broad. It feels like XY problems to me.
– eliasah
Nov 22 at 15:59

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432125%2fspark-drivers-predicate-pushdown%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53432125%2fspark-drivers-predicate-pushdown%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Futebolista

Lallio

Jornalista