Spark Structured Streaming getting messages for last Kafka partition












0















I am using Spark Structured Streaming to read from Kafka topic.



Without any partition, Spark Structired Streaming consumer can read data.



But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.



I am using latest samples and binaries from Spark Structured Streaming website.



    DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()


Am I missing anything?










share|improve this question

























  • Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

    – cricket_007
    Nov 25 '18 at 0:06













  • I am using Kafka console producer for pushing data manually in topic.

    – Ashish Nijai
    Nov 25 '18 at 5:10











  • Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

    – cricket_007
    Nov 25 '18 at 16:14













  • I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

    – Ashish Nijai
    Nov 25 '18 at 16:42













  • You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

    – cricket_007
    Nov 25 '18 at 17:55


















0















I am using Spark Structured Streaming to read from Kafka topic.



Without any partition, Spark Structired Streaming consumer can read data.



But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.



I am using latest samples and binaries from Spark Structured Streaming website.



    DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()


Am I missing anything?










share|improve this question

























  • Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

    – cricket_007
    Nov 25 '18 at 0:06













  • I am using Kafka console producer for pushing data manually in topic.

    – Ashish Nijai
    Nov 25 '18 at 5:10











  • Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

    – cricket_007
    Nov 25 '18 at 16:14













  • I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

    – Ashish Nijai
    Nov 25 '18 at 16:42













  • You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

    – cricket_007
    Nov 25 '18 at 17:55
















0












0








0








I am using Spark Structured Streaming to read from Kafka topic.



Without any partition, Spark Structired Streaming consumer can read data.



But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.



I am using latest samples and binaries from Spark Structured Streaming website.



    DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()


Am I missing anything?










share|improve this question
















I am using Spark Structured Streaming to read from Kafka topic.



Without any partition, Spark Structired Streaming consumer can read data.



But when I added partitions to topic, the client is showing messages from last partition only.
I.e. if there are 4 partitions in topic and I.am pushing numbers like 1,2,3,4 in topic,then client printing only 4 not other values.



I am using latest samples and binaries from Spark Structured Streaming website.



    DataFrame<Row> df = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()


Am I missing anything?







java apache-spark apache-kafka spark-structured-streaming






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 18:03









cricket_007

80.8k1142110




80.8k1142110










asked Nov 24 '18 at 22:35









Ashish NijaiAshish Nijai

15019




15019













  • Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

    – cricket_007
    Nov 25 '18 at 0:06













  • I am using Kafka console producer for pushing data manually in topic.

    – Ashish Nijai
    Nov 25 '18 at 5:10











  • Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

    – cricket_007
    Nov 25 '18 at 16:14













  • I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

    – Ashish Nijai
    Nov 25 '18 at 16:42













  • You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

    – cricket_007
    Nov 25 '18 at 17:55





















  • Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

    – cricket_007
    Nov 25 '18 at 0:06













  • I am using Kafka console producer for pushing data manually in topic.

    – Ashish Nijai
    Nov 25 '18 at 5:10











  • Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

    – cricket_007
    Nov 25 '18 at 16:14













  • I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

    – Ashish Nijai
    Nov 25 '18 at 16:42













  • You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

    – cricket_007
    Nov 25 '18 at 17:55



















Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

– cricket_007
Nov 25 '18 at 0:06







Pushing messages how? Including the key? How do you know those messages are even going to other partitions? Also, Spark will not automatically pick up new partitions until you restart the app

– cricket_007
Nov 25 '18 at 0:06















I am using Kafka console producer for pushing data manually in topic.

– Ashish Nijai
Nov 25 '18 at 5:10





I am using Kafka console producer for pushing data manually in topic.

– Ashish Nijai
Nov 25 '18 at 5:10













Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

– cricket_007
Nov 25 '18 at 16:14







Sure, but are you using --parse-keys=true? If not, how are you checking which partitions that your messages are going into?

– cricket_007
Nov 25 '18 at 16:14















I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

– Ashish Nijai
Nov 25 '18 at 16:42







I am unable to check for any particular partition. I have 4 partitions.If I am sending 4 messages to topic, spark consumer is able to print only 4th message. The consumer is printing messages only in table of 4 i.e. 4th,8th, 12th messages.

– Ashish Nijai
Nov 25 '18 at 16:42















You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

– cricket_007
Nov 25 '18 at 17:55







You can use GetOffsetShell of Kafka to list the latest offsets of each partition. That'll tell you if messages are being sent to any/all partitions... Otherwise, if you have only one Spark executor, then it'll only consume from one Kafka partition, so you'll need to have more

– cricket_007
Nov 25 '18 at 17:55














1 Answer
1






active

oldest

votes


















0














Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.



Found reference here Spark Structured Stream get messages from only one partition of Kafka






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462941%2fspark-structured-streaming-getting-messages-for-last-kafka-partition%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.



    Found reference here Spark Structured Stream get messages from only one partition of Kafka






    share|improve this answer




























      0














      Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.



      Found reference here Spark Structured Stream get messages from only one partition of Kafka






      share|improve this answer


























        0












        0








        0







        Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.



        Found reference here Spark Structured Stream get messages from only one partition of Kafka






        share|improve this answer













        Issue is resolved by changing kafka-clients-0.10.1.1.jar to kafka-clients-0.10.0.1.jar.



        Found reference here Spark Structured Stream get messages from only one partition of Kafka







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 27 '18 at 17:01









        Ashish NijaiAshish Nijai

        15019




        15019






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53462941%2fspark-structured-streaming-getting-messages-for-last-kafka-partition%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            A CLEAN and SIMPLE way to add appendices to Table of Contents and bookmarks

            Calculate evaluation metrics using cross_val_predict sklearn

            Insert data from modal to MySQL (multiple modal on website)